Systematic Reviews

Table 3 F1-score performance for both the models and ensemble across all the classes

From: Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Label	F1-score (%)
Label	RoBERTa base	RoBERTa large	BioBERT	PubMedBERT	COVID-Twitter	Ensemble
ORIGINAL	91.06	91.33	91.44	91.94	90.61	92.35
NON-ORIGINAL	78.46	79.19	79.64	80.52	76.72	81.66^a
micro avg	87.30	87.70	87.92	88.53	86.46	89.16^a
macro avg	84.76	85.26	85.54	86.23	83.66	87.00^a

^aStatistically significant improvement

Back to article page

ISSN: 2046-4053

Contact us

Submission enquiries: Access here and click Contact Us
General enquiries: info@biomedcentral.com