Fig. 5From: Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literatureF1-score (A)/precision (B)/recall (C) for the ORIGINAL class with respect to a probability threshold per vote when using the voting strategy across the predictions on the class level. Using different thresholds improves considerably performance while reducing the number of predicted publicationsBack to article page