Machine learning algorithms to identify cluster randomized trials from MEDLINE and EMBASE

Table 2 Model metrics for the internal and external validation datasets

Dataset	AUC, % (95% CI)	True positive rate sensitivity, % (95% CI)	False positive rate 1-specificity, % (95% CI)	Number needed to screen (95% CI)
Internal validation This dataset had 600 articles, with ~ 15% being CRTs Number needed to read: 6.8^a
Convolutional neural network—Word2Vec	98.2 (96.9, 99.5)	96.6 (92.0, 100)	13.9 (10.7, 17.0)	1.8 (1.6, 2.1)
Convolutional neural network—FastText	98.4 (97.3, 99.5)	89.8 (83.0, 96.6)	3.5 (2.0, 5.1)	1.2 (1.1, 1.3)
Support vector machines	97.2 (95.7, 98.8)	97.7 (94.3, 100)	19.9 (16.4, 23.2)	2.2 (1.9, 2.6)
Ensemble	98.6 (97.8, 99.4)	97.7 (94.3, 100)	15.0 (11.9, 18.2)	1.9 (1.7, 2.2)
External validation This dataset had 1916 articles, with ~ 35% being CRTs Number needed to read: 2.9^a
Convolutional neural network—Word2Vec	97.9 (97.2, 98.6)	97.0 (95.6, 98.2)	20.8 (18.5, 23.0)	1.4 (1.3, 1.5)
Convolutional neural network—FastText	97.7 (97.0, 98.4)	91.7 (89.8, 93.8)	4.8 (3.7, 6.0)	1.1 (1.1, 1.1)
Support vector machines	96.8 (96.0, 97.6)	97.3 (96.1, 98.5)	32.2 (29.7, 34.9)	1.6 (1.6, 1.7)
Ensemble	97.8 (97.0, 98.5)	97.6 (96.4, 98.6)	21.8 (19.6, 24.1)	1.4 (1.4, 1.5)

^aThe number needed to read was calculated as one divided by the % of articles that are CRTs
AUC Area under the receiver operating characteristic curve, CI Confidence interval