Skip to main content

Table 5 F1-score performance for both the models and ensemble across all the sub-subclasses

From: Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Label

F1-score (%)

RoBERTa base

RoBERTa large

BioBERT

PubMedBERT

COVID-Twitter

Ensemble

EPI: Case report

83.91

84.70

86.55

84.65

81.97

86.85

EPI: Case series

62.76

62.30

65.12

63.42

58.60

65.37

EPI: Case–control study

31.79

40.98

35.51

36.80

32.65

39.02

EPI: Cohort study

51.26

53.18

52.85

56.33

48.68

54.10

EPI: Cross-sectional study

59.89

65.46

66.19

64.10

62.01

65.46

EPI: Diagnostic study

67.01

66.32

65.81

63.83

64.77

69.61

EPI: Ecological study

41.27

41.51

46.53

46.81

42.33

46.46

EPI: Guidelines

57.28

60.32

59.01

60.65

56.26

62.52

EPI: Modelling study

87.61

86.51

87.78

87.05

88.15

88.43a

EPI: Other

21.34

19.33

17.82

17.54

17.61

21.33

EPI: Outbreak or surveillance report

32.81

30.71

30.30

32.28

33.99

38.30

EPI: Qualitative study

20.41

31.75

35.29

40.00

33.33

36.73

EPI: Review

66.44

65.94

67.59

66.22

63.77

70.78a

EPI: Trial

56.76

60.76

73.68

68.35

55.70

71.60

BASIC: Animal experiment

65.12

71.91

57.53

57.89

57.78

72.29

BASIC: Basic research review

19.92

24.60

16.67

13.10

18.64

23.15

BASIC: Biochemical/protein structure studies

60.72

63.48

62.39

64.03

58.13

65.67

BASIC: In vitro experiment

36.36

48.75

41.61

44.05

42.77

46.36

BASIC: Sequencing and phylogenetics

68.68

66.94

72.06

69.64

67.33

70.08

BASIC: Within-host modelling

0.00

11.76

0.00

10.53

13.64

11.11

OTHER: Other

17.39

16.95

20.56

20.11

15.25

19.32

OTHER: Comment, editorial, …, non-original

78.28

79.22

79.54

80.79

76.83

82.03a

micro avg

65.85

66.89

67.38

67.40

64.69

69.50a

macro avg

49.41

52.43

51.84

52.19

49.55

54.84a

  1. aStatistically significant improvement