From: Semi-automating abstract screening with a natural language model pretrained on biomedical literature
Measure | Definition | Estimate |
---|---|---|
Recall/sensitivity | \(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{Nu\mathrm{mber}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}}\)Â Â | 37.7% |
Precision/positive predictive value | \(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{Nu\mathrm{mber}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{pBERT}}\)Â Â | 37.7% |
F1 | \(2 \times \frac{precision \times recall }{precision+recall}\) | 37.7% |
Accuracy | \(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{screened}}\)Â Â | 70.2% |
Disagreement | \(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{with}\;\mathrm{different}\;\mathrm{decisions}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{screened}}\)Â Â | 3.0% |