From: Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study
Sensitivity (95% CI) | Specificity (95% CI) | Area under the curve (95% CI) | N of missed studies (proportion) | N of included abstracts (proportion) | N of conflicts (proportion) | N of included studies in training set | |
---|---|---|---|---|---|---|---|
Team 1 | |||||||
Machine-assisted screening | 0.78 (0.59 to 0.90) | 0.96 (0.96 to 0.97) | 0.87 (0.80 to 0.95) | 7/32 (22%) | 97/2172 (4%) | 126/2172 (6%) | 10/300 |
Single-reviewer screening | 0.78 (0.59 to 0.90) | 0.96 (0.95 to 0.97) | 0.87 (0.80 to 0.94) | 7/32 (22%) | 110/2172 (5%) | ||
DistillerAI screening | 0.03 (0.00 to 0.21) | 0.99 (0.98 to 0.99) | 0.51 (0.48 to 0.54) | 31/32 (97%) | 27/2172 (1%) | ||
Team 2 | |||||||
Machine-assisted screening | 0.89 (0.70 to 0.97) | 0.92 (0.91 to 0.93) | 0.90 (0.84 to 0.96) | 3/27 (11%) | 232 /2172 (11%) | 226/2172 (10%) | 15/300 |
Single-reviewer screening | 0.89 (0.69 to 0.97) | 0.91 (0.89 to 0.92) | 0.90 (0.84 to 0.96) | 3/27 (11%) | 221/2172 (10%) | ||
DistillerAI screening | 0.00 | 0.99 (0.99 to 0.99) | 0.50 (0.49 to 0.50) | 27/27 (100%) | 18/2172 (1%) | ||
Team 3 | |||||||
Machine-assisted screening | 0.65 (0.44 to 0.82) | 0.96 (0.95 to 0.97) | 0.81 (0.71 to 0.90) | 9/26 (35%) | 130/2172 (6%) | 100/2172 (5%) | 16/300 |
Single-reviewer screening | 0.65 (0.44 to 0.82) | 0.96 (0.95 to 0.97) | 0.81 (0.71 to 0.90) | 9/26 (35%) | 104/2172 (5%) | ||
DistillerAI screening | 0.23 (0.10 to 0.44) | 0.99 (0.98 to 0.99) | 0.61 (0.53 to 0.69) | 20/26 (77%) | 30/2172 (1%) | ||
Team 4 | |||||||
Machine-assisted screening | 0.86 (0.66 to 0.95) | 0.94 (0.93 to 0.95) | 0.90 (0.83 to 0.96) | 4/28 (14%) | 199/2172 (9%) | 194/2172 (9%) | 14/300 |
Single-reviewer screening | 0.82 (0.62 to 0.93) | 0.93 (0.92 to 0.94) | 0.88 (0.80 to 0.95) | 5/28 (18%) | 165/2172 (8%) | ||
DistillerAI screening | 0.32 (0.17 to 0.52) | 0.97 (0.96 to 0.98) | 0.65 (0.56 to 0.73) | 19/28 (68%) | 69/2172 (3%) | ||
Team 5 | |||||||
Machine-assisted screening | 0.74 (0.55 to 0.87) | 0.95 (0.94 to 0.96) | 0.84 (0.77 to 0.92) | 8/31 (26%) | 187/2172 (9%) | 181/2172 (8%) | 11/300 |
Single-reviewer screening | 0.74 (0.55 to 0.87) | 0.95 (0.94 to 0.95) | 0.84 (0.77 to 0.92) | 8/31 (26%) | 138/2172 (6%) | ||
DistillerAI screening | 0.13 (0.05 to 0.31) | 0.97 (0.96 to 0.98) | 0.55 (0.49 to 0.61) | 27/31 (87%) | 65/2172 (3%) | ||
Combined | |||||||
Machine-assisted screening | 0.78 (0.66 to 0.90) | 0.95 (0.92 to 0.97) | 0.87 (0.83 to 0.90) | 6/30 (22%) | 8% | 165/2172 (8%) | 13/300 |
Single-reviewer screening | 0.78 (0.66 to 0.89) | 0.94 (0.91 to 0.97) | 0.86 (0.82 to 0.89) | 6/30 (22%) | 7% | ||
DistillerAI screening | 0.14 (0.00 to 0.31) | 0.98 (0.97 to 1.00) | 0.56 (0.53 to 0.59) | 25/30 (86%) | 2% |