Data set (classifier development stage) | Size | Number of eligible records (%) | Number of title-only records (%) | Number of title-only records that were eligible (%) | Provenance of records |
---|---|---|---|---|---|
Data set 1 (Training) | 59,513 | 20,878 (35.1%) | 18,669 (31.4%) | 4495 (21.5%) | 3229 (5.4%)—Embase 2083 (3.5%)—preprint 54201 (91.1%)—PubMed |
Data set 2 (Calibration) | 16,123 | 6005 (37.2%) | 3626 (22.5%) | 821 (13.7%) | 1994 (12.4%)—Embase 287 (1.8%)—pre-print 13842 (85.8%)—PubMed |
Data set 3 (Evaluation) | 4722 | 2310 (48.9%) | 896 (19.0%) | 285 (12.3%) | 89 (1.9%)—Embase 202 (4.3%)—pre-print 4431 (93.8%)—PubMed |