Study ID (first author, year) | ||||
---|---|---|---|---|
Pieper 2014e [34] | Whiting 2016 [15] | Parmelli 2011 [19] | Pieper 2014b [35] | |
Characteristics of the studies | ||||
Title | Impact of choice of quality appraisal tool for systematic reviews in overviews | ROBIS: A new tool to assess risk of bias in systematic reviews was developed | Using AMSTAR to assess the methodological quality of systematic reviews: An external validation study | Systematic review finds overlapping reviews were not mentioned in every other overview |
Primary objective | To examine reliability, validity and feasibility of four quality appraisal tools in an SR and explore how the choice of tool impacts the findings of the evidence synthesis | To develop ROBIS, a new tool for assessing the risk of bias in systematic reviews (rather than in primary studies) | To measure the reliability, construct validity and feasibility of AMSTAR on a sample of SRs in different medical fields | To develop two measures to quantify the degree of overlap of primary studies across SRs and evaluate the validity of the measures |
Name of the included tools or measures | ROBIS [15] | CA and CCA [35] | ||
Type of assessment | Assess reliability/ construct validity of the tool | Assess content validity/reliability | Assess reliability/ construct validity of the tool | Construct validity testing of the measures |
Content validity—methods of item generation | Not applicable—existing tool | Content (domains and items) was based on a reporting standard for SRs (i.e. MECIR [118]) and an SR of 40 tools designed to assess the quality of SRs or meta-analyses | Not applicable—existing tool | Not applicable—not a tool |
Content validity—comprehensiveness | Not applicable—existing tool | Content experts (methodologists, systematic reviewers, guideline developers) reviewed the draft ROBIS tool in a face-to-face meeting and Delphi process | Not applicable—existing tool | Not applicable—not a tool |
Reliability—description of reliability testing | Inter-rater reliability (agreement) between two review authors who independently applied AMSTAR, OQAQ, RAPiD and AQASR to 32 SRs. A 4-week interval separated assessment with each tool. Agreement was assessed at item level for AMSTAR and OQAQ, and domain level for RAP and AQASR (Cohen’s kappa) | Inter-rater reliability (agreement) between two review authors who independently applied ROBIS to 8 SRs. Agreement was assessed at domain level (% agreement) | Inter-rater reliability (agreement) between two review authors who independently applied AMSTAR to 54 SRs. Agreement was assessed at item level (Cohen’s weighted kappa) | Not applicable |
Tests of validity—description of correlation coefficient testing | Correlation between summary scores on OQAQ and RAPiD (not done for tools without summary scores). Qualitative assessment of whether assessment of SR quality with different tools altered overall conclusions about strength of association between volume and outcomes (where SR quality was one of four elements used to determine strength) | Not assessed | Correlation between scores on AMSTAR and scores from a similar measure, the OQAQ (Pearson’s rank correlation coefficient, results not reported in abstract) | Correlation between measures (CA, CCA) calculated on a sample of overviews (Kendall τ-b) with each other, and each measure with the number of SRs and number of primary publications. Examined whether the measures were associated with publication source (HTA or journal publication), hypothesizing that HTA reports may have more overlap |
Other assessment (feasibility, acceptability, piloting) | Time to complete | Piloting involved three workshops on ROBIS where participants piloted the tools and provided feedback | Time to complete | Not reported |
Risk of bias in the primary methods studies | ||||
Existence of a protocol | Not reported | Not reported | Not reported | Not reported |
Method to select the sample of SRs to which the tool/measure was applied | Convenience: SRs were included studies in an overview that examined associations between surgery volume and outcomes (not intervention effects) | Convenience: SRs were included studies in an overview, being conducted by authors independent of the developers of ROBIS | Convenience: SRs were in two different medical fields (hypertension, colorectal cancer), and described as a convenience sample but unclear how they were selected | Census: All overviews identified from a literature search of five databases. Handsearching of websites of HTA agencies. Search restricted to articles published between 2009 and 2011. |
Process for selecting the raters/assessors who applied the tool/measureb | Convenience: Raters were authors of an overview in which AMSTAR was used | Convenience: Raters were authors of an overview in which ROBIS was piloted, and were independent of the tool developers. Unclear how they were recruited | Unclear: No description of how raters were selected | Not applicable |
Pre-specified hypotheses for testing of validity | No: The expected direction or magnitude of correlation was not specified. ‘The Spearman’s rank correlation coefficient was calculated to compare the CATs [critical appraisal tool]’ | Not applicable: no testing of validity | No: The expected direction or magnitude of correlation was not specified. ‘Construct validity was investigated comparing the two instruments using Pearson’s rank correlation coefficient.’ | Yes: ‘We hypothesized that the CA should have a strong (0.60–0.80) negative correlation with the number of included reviews and, compared to this, a lower negative correlation with the number of included primary publications. In contrast, we assumed that the CCA should have a very weak (0.00–0.20) or weak (0.20–0.40) negative correlation with the number of included reviews and the primary publications.’ |