Skip to main content

Table 2 Definitions of variables in data extraction

From: Automation of literature screening using machine learning in medical evidence synthesis: a diagnostic test accuracy systematic review protocol

Variable Definitions
Study characteristics  
 Year Year of publication
 Authors Last name of authors
 Study type Article, abstract, or systematic review
 Journal, conference Name of journal or conference
Training set information
 Training set Name of dataset used for training
 Area General medicine, detailed disease, or specific intervention
 Source Name of electronic databases searched for building training set
 Time range Time range of training set
 Type of publication Abstract, or full-text
 Number of all literatures Number of all literatures in training set
 Number of included literatures Number of included literatures identified by the step of screening in training set
 Training method Supervised, semi-supervised, or unsupervised
Validation set information
 Validation set Name of dataset used for validation
 Area General, disease, or intervention
 Source Name of electronic database searched for building validation set
 Time range Time range of validation set
 Type of publication Abstract, or full-text
 Number of all literatures Number of all literatures in validation set
 Number of included literatures Number of included literatures identified by the step of screening in validation set
 Golden standard Process of screening by human investigators
AI algorithm information
 Model name Name of model
 Model type Classification, regression, ranking, or others
 Model performance Including but not limited to sensitivity, specificity, precision, NPV, PPV, NLR, PLR, DOR, F-measure, accuracy, and AUC
 Cost saving Decreased number of screened literatures by human investigators
  1. Abbreviations: AUC, area under curve; DOR, diagnostic odds ratio; NLR, negative likelihood ratio; NPV, negative predictive value; PLR, positive likelihood ratio; PPV, positive predictive value