Skip to main content

Table 2 Evaluation of the workflow performance using the recommended practice as the reference standard

From: Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow

Case study

Type of analysis

Precision (%)

Sensitivity* (%)

F1-score (%)

Specificity (%)

Accuracy (%)

NWF, NS, ΔN (# of eligible abstracts)

# missed studies

Workload reduction %

Hours saved

SR — diabetes (14, 314 abstracts)

Main analysisa

71

88

79

99.3

98

655, 743, 88

0

63%

91 h

SA: k-NN2 = 25

64

94

76

99.7

97

700, 743, 43

0

49%

70 h

 

SA: r = 300

70

89

78

99.4

97

660, 743, 83

0

62%

89 h

 

SA: k-NN1 = 15

68

89

77

99.4

97

664, 743, 79

0

61%

88 h

 

SA: ϕ = 80%

72

88

79

99.3

98

653, 743, 90

0

63%

91 h

 

SA: ϕ = 90%

68

88

76

99.3

97

653, 743, 90

0

64%

91 h

 

SA: 2 distance measuresb

77

84

80

99.1

98

623, 743, 120

0

74%

105 h

Scoping — KS methods (17, 200 abstracts)

Main analysisa

72

89

79

99.3

97

852, 957, 105

6

55%

95 h

SA: k-NN2 = 25

65

95

77

99.7

97

907, 957, 50

3

39%

68 h

 

SA: r = 300

72

90

80

99.4

97

858, 957, 99

5

54%

92 h

 

SA: k-NN1 = 15

73

89

80

99.4

98

853, 957, 104

5

54%

94 h

 

SA: ϕ = 80%

72

89

79

99.4

98

847, 957, 110

8

55%

95 h

 

SA: ϕ = 90%

73

88

80

99.3

98

842, 957, 115

8

56%

96 h

 

SA: 2 distance measuresb

79

82

80

98.9

98

785, 957, 172

17

70%

119 h

  1. *Sensitivity or recall; results of the sensitivity analyses are displayed in decreasing sensitivity of the workflow’s performance. Person-hours that were tallied across reviewers. aThe main analysis was conducted with distance definitions from three feature representations (SVD-based, LDA-based and word-embedding features), a threshold ϕ = 70%, k-nearest-neighbor (k-NN1) for phase 1 of 8, k-NN for phase 2 (k-NN2) of 15, and initial sample size r = 600 (Table A1). SVD: singular value decomposition. LDA latent Dirichlet allocation. bThis sensitivity analysis used 2 distance measures from the SVD-based and word-embedding-based features. SR systematic review. SS scoping review. KS knowledge synthesis. SA sensitivity analysis. NN nearest-neighbors. Workload reduction: the number of abstracts saved with the workflow, relative to the recommended practice of screening all abstracts by 2 reviewers. NWF—Number of eligible abstracts identified by the workflow. NS—Number of eligible abstracts identified via screening by 2 human reviewers (recommended practice). ΔN—The number of eligible studies missed by the workflow: NS − NWF. The number of missed studies due to the full-text screening of the NWF eligible abstracts instead of full-text screening the NS eligible abstracts