Automating data extraction in systematic reviews: a systematic review

Jonnalagadda, Siddhartha R.; Goyal, Pawan; Huffman, Mark D.

doi:10.1186/s13643-015-0066-7

Systematic Reviews

Table 2 A summary of included extraction methods and their evaluation

From: Automating data extraction in systematic reviews: a systematic review

Study	Extracted elements	Dataset	Method	Sentence/Concept/Neither	Full text (F)/Abstract (A)	Results
Dawes et al. (2007) [12]	PECODR	20 evidence-based medicine journal synopses (759 extracts from the corresponding PubMed abstracts)	Proposed potential lexical patterns and assessed using NVIvo software	Neither	Abstract	Agreement among the annotators was 86.6 and 85 %, which rose up to 98.4 and 96.9 % after consensus. No automated system.
Kim et al. (2011) [13]	PIBOSO	1000 medical abstracts (PIBOSO corpus)	Conditional random fields with various features based on lexical, semantic, structural and sequential information	Sentence	Abstract	Micro-averaged F-scores on structured and unstructured: 80.9 and 66.9 %, 63.1 % on an external dataset
Boudin et al. (2010) [16]	PICO (I and C were combined together)	26,000 abstracts from PubMed, first sentences from the structured abstract	Combination of multiple supervised classification algorithms: random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP)	Sentence	Abstract	F-score of 86.3 % for P, 67 % for I (and C), and 56.3 % for O
Huang et al. (2011) [17]	PICO (except C)	23,472 sentences from the structured abstracts	naïve Bayes	Sentence	Abstract	F-measure of 0.91 for patient/problem, 0.75 for intervention, and 0.88 for outcome
Verbeke et al. (2012) [18]	PIBOSO	PIBOSO corpus	Statistical relational learning with kernels, kLog	Sentence	Abstract	Micro-averaged F of 84.29 % on structured abstracts and 67.14 % on unstructured abstracts
Huang et al. (2013) [19]	PICO (except C)	19,854 structured abstracts of randomized controlled trials	First sentence of the section or all sentences in the section, NB classifier	Sentence	Abstract	First sentence of the section: F-scores for P: 0.74, I: 0.66, and O: 0.73
Huang et al. (2013) [19]	PICO (except C)	19,854 structured abstracts of randomized controlled trials		Sentence	Abstract	All sentences in the section: F-scores for P: 0.73, I: 0.73, and O: 0.74
Hassanzadeh et al. (2014) [20]	PIBOSO (Population-Intervention-Background-Outcome-Study Design-Other)	PIBOSO corpus, 1000 structured and unstructured abstracts	CRF with discriminate set of features	Sentence	Abstract	Micro-averaged F-score: 91
Robinson (2012) [21]	Patient-oriented evidence: morbidity, morality, symptom severity, quality of life	1356 PubMed abstracts	SVM, NB, multinomial NB, logistic regression	Sentence	Abstract	Best results achieved via SVM: F-measure of 0.86
Chung (2009) [22]	Intervention, comparisons	203 RCT abstracts for training and 124 for testing	Coordinating constructs are identified using a full parser, which are further classified as positive or not using CRF	Sentence	Abstract	F-score: 0.76
Hara and Matsumoto (2007) [23]	Patient population, comparison	200 abstracts labeled as ‘Neoplasms’ and ‘Clinical Trial, Phase III’	Categorizing noun phrases (NPs) into classes such as ‘Disease’, ‘Treatment’ etc. using CRF and use regular expressions on the sentence with classified Noun Phrases	Sentence	Abstract	F-measure of 0.91 for the task of noun phrase classification. Results of sentence classification: F-,measure of 0.8 for patient population and 0.81 for comparisons
Davis-Desmond and Molla (2012) [42]	Detecting statistical evidence	194 randomized controlled trial abstracts from PubMed	Rule-based classifier using negation expressions	Sentence	Abstract	Accuracy: between 88 and 98 % at 95 % CI
Zhao et al. (2012) [24]	Patient, result, Intervention, Study Design, Research Goal	19,893 medical abstracts and full text articles from 17 journal websites	Conditional random fields	Sentence	Full text	F-scores for sentence classification: patient: 0.75, intervention: 0.61, result: 0.91, study design: 0.79, research goal: 0.76
Hsu et al. (2012) [25]	Hypothesis, statistical method, outcomes and generalizability	42 full-text papers	Regular expressions	Sentence	Full text	For classification task, F-score of 0.86 for hypothesis, 0.84 for statistical method, 0.9 for outcomes, and 0.59 for generalizability
Song et al. (2013) [26]	Analysis (statistical facts), general (generally accepted facts), recommend (recommendations about interventions), rule (guidelines)	346 sentences from three clinical guideline document	Maximum entropy (MaxEnt), SVM, MLP, radial basis function network (RBFN), NB as classifiers and information gain (IG), genetic algorithm (GA) for feature selection	Sentence	Full text	F-score of 0.98 for classifying sentences
Demner-Fushman and Lin (2007) [28]	PICO (I and C were combined)	275 manually annotated abstracts	Rule-based approach to identify sentence containing PICO and supervised classifier for Outcomes	Concept	Abstract	Precision of 0.8 for population, 0.86 for problem, 0.80 for intervention, 0.64–0.95 for outcome
Kelly and Yang (2013) [29]	Age of subjects, duration of study, ethnicity of subjects, gender of subjects, health status of subjects, number of subjects	386 abstracts from PubMed obtained with the query ‘soy and cancer’	Regular expressions, gazetteer	Concept	Abstract	F-scores for age of subjects: 1.0, duration of study: 0.911, ethnicity of subjects: 0.949, gender of subjects: 1.0, health status of subjects: 0.874, number of subjects: 0.963
Hansen et al. (2008) [30]	Number of trial participants	233 abstracts from PubMed	Support vector machines	Concept	Abstract	F-measure: 0.86
Xu et al. (2007) [32]	Subject demographics such as subject descriptors, number of participants and diseases/symptoms and their descriptors	250 randomized controlled trial abstracts	Text classification augmented with hidden Markov models was used to identify sentences; rules over parse tree to extract relevant information	Sentence, concept	Abstract	Precision for subject descriptors: 0.83 %, number of trial participants: 0.923, diseases/symptoms: 51.0 %, descriptors of diseases/symptoms: 92.0 %
Summerscales et al. (2009) [34]	Treatments, groups and outcomes	100 abstracts from BMJ	Conditional random fields	Concept	Abstract	F-scores for treatments: 0.49, groups: 0.82, outcomes: 0.54
Summerscales et al. (2011) [35]	Groups, outcomes, group sizes, outcome numbers	263 abstracts from BMJ between 2005 and 2009	CRF, MaxEnt, template filling	Concept	Abstract	F-scores for groups: 0.76, outcomes: 0.42, group sizes: 0.80, outcome numbers: 0.71
Kiritchenko et al. (2010) [36]	Eligibility criteria, sample size, drug dosage, primary outcomes	50 full-text journal articles with 1050 test instances	SVM classifier to recover relevant sentences, extraction rules for correct solutions	Concept	Full text	P5 precision for the classifier: 0.88, precision and recall of the extraction rules: 93 and 91 %, respectively
Lin et al. (2010) [39]	Intervention, age group of the patients, geographical area, number of patients, time duration of the study	93 open access full-text literature documenting oncological and cardio-vascular studies from 2005 to 2008	Linear chain, conditional random fields	Concept	Full text	Precision of 0.4 for intervention, 0.63 for age group, 0.44 for geographical area, 0.43 for number of patients and 0.83 for time period
Restificar et al. (2012) [37]	Eligibility criteria	44,203 full-text articles with clinical trials	Latent Dirichlet allocation along with logistic regression	Concept	Full text	75 and 70 % accuracy based on similarity for inclusion and exclusion criteria, respectively.
De Bruijn et al. (2008) [40]	Eligibility criteria, sample size, treatment duration, intervention, primary and secondary outcomes	88 randomized controlled trials full-text articles from five medical journals	SVM classifier to identify the most promising sentences; manually crafted weak extraction rules for the information elements	Sentence, concept	Full text	Precision for eligibility criteria: 0.69, sample size: 0.62, treatment duration: 0.94, intervention: 0.67, primary outcome: 1.00, secondary outcome: 0.67
Zhu et al. (2012) [41]	Subject demographics: patient age, gender, disease and ethnicity	50 randomized controlled trials full-text articles	Manually crafted rules for extraction from the parse tree	Concept	Full text	Disease extraction: for exact matching, the F-score was 0.64. For partially matched, it was 0.85.
Marshall et al. (2014) [27]	Risk of bias concerning sequence generation, allocation concealment and blinding	2200 clinical trial reports	Soft-margin SVM for a joint model of risk of bias prediction and supporting sentence extraction	Sentence	Full text	For sentence identification: F-score of 0.56, 0.48, 0.35 and 0.38 for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment

Back to article page

ISSN: 2046-4053

Contact us

Submission enquiries: Access here and click Contact Us
General enquiries: info@biomedcentral.com