Skip to main content

Table 7 Metrics per label using the top-k retrieved categories

From: Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Model

{P,R,MAP}@1 (%)

P@3 (%)

R@3 (%)

MAP@3 (%)

RoBERTabase

65.99

27.10

81.29

72.69

RoBERTalarge

67.29

28.12

84.37

74.86

BioBERT

68.55

28.63

85.89

76.16

PubMedBERT

68.33

28.47

85.42

75.92

COVID-Twitter-BERT

64.98

27.88

83.64

73.14

Ensemble

70.57

29.69

89.07

78.92

  1. P precision, R recall, MAP mean average precision. As this is a single-label task, the max value for P@3 is 1/3 (33%)