Systematic Reviews

Table 4 F1-score performance for both the models and ensemble across all the subclasses

From: Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature

Label	F1-score (%)
Label	RoBERTa base	RoBERTa large	BioBERT	PubMedBERT	COVID-Twitter	Ensemble
EPI	88.17	88.05	88.38	88.70	87.26	89.47
BASIC	78.15	78.85	79.20	78.13	78.36	80.47
OTHER	78.44	79.22	79.86	80.72	76.71	81.97^a
micro avg	84.01	84.26	84.68	84.99	82.99	86.10^a
macro avg	81.59	82.04	82.48	82.51	80.77	83.97^a

^aStatistically significant improvement

Back to article page

ISSN: 2046-4053

Contact us

Submission enquiries: Access here and click Contact Us
General enquiries: info@biomedcentral.com