Design overview
We will conduct a methodological survey of Cochrane and non-Cochrane systematic reviews. We will use standard methodology for conducting systematic reviews [13], as described in previous protocols from our group [19–22]. We did not register the project in the PROSPERO database.
Definitions
The risk of an outcome in a group is the proportion of individuals in that group who suffer that outcome. Measures of effect (whether relative or absolute) express the risk of an outcome in one group compared with another. As relative measures of effect, the relative risk is the ratio of the risk of an outcome, whereas the odds ratio (OR) is the ratio of the odds of an outcome [8]. As an absolute measure of effect, the risk difference is the difference between the observed risk in the experimental and control groups. It can also be expressed as the arithmetic difference between two outcome rates. The NNT, another absolute measure of effect, is the inverse of the risk difference, which translates into the number of subjects who need to be treated to prevent one additional outcome, good (NNT) or bad (number needed to harm).
Cochrane systematic reviews are defined as all systematic reviews published in the Cochrane Database of Systematic Reviews. All the other systematic reviews will be considered non-Cochrane systematic reviews.
Eligibility criteria
We will include systematic reviews published in English meeting the following criteria:
-
1.
Described as a ‘systematic review’ or a ‘meta-analysis’;
-
2.
Reports a search strategy in at least one database;
-
3.
Published in 2010; in the Cochrane Database of Systematic Reviews or indexed in MEDLINE;
-
4.
Includes a comparison of an intervention with another intervention or no intervention in human beings;
-
5.
Reports measures of effect for at least one dichotomous outcome either from a single study or from a pooled analysis.
If there is more than one pairwise comparison, reviewers will select the comparison that reports the largest number of dichotomous outcomes. If more than one comparison reported the same number of dichotomous outcomes, reviewers will select the comparison that reports the largest number of absolute estimates. We will identify the most patient-important outcome using a hierarchical approach (Appendix 1). If the outcome is a composite outcome, we will select the most patient-important of those included in the composite, if authors provide disaggregated data in the review (according to the hierarchy in Appendix 1). Otherwise, we will choose the next most important dichotomous outcome. Since we are interested in how authors present the results of their systematic reviews (for example, results obtained when combining the included studies), we will not collect information about absolute effects presented when describing individual studies included in the review, unless the comparison of interest includes only one trial.
Search strategy
We will use the MEDLINE database to search for potentially eligible systematic reviews. We will use two distinct search strategies. First, we will use an adaptation of the systematic review filter, designed by the Health Information Research Unit of McMaster University to retrieve non-Cochrane systematic reviews. Second, we will use the Ovid ‘search by journal’ filter to identify Cochrane systematic reviews (Appendix 2). We will limit both searches to the year 2010. We will subsequently export citations to Endnote X4.0.2., and then into a web-based systematic review software (DistillerSR, Evidence Partners, Ottawa, Canada; https://systematic-review.ca) for eligibility screening and data extraction.
Random sampling of citations
All identified citations will be stratified into Cochrane and non-Cochrane search results. We will obtain a random sample within each stratum and screen it according to our eligibility criteria. We will repeat the random sampling process as needed until reaching the final sample size, which will include the same number of Cochrane and non-Cochrane systematic reviews (see sample size section).
Review process
We will undertake, in a duplicate and independent manner, title and abstract screening, full text screening and data abstraction. Irrespective of discrepancies, all studies selected at a title and abstract level will be included for the full text screening. Reviewers will resolve discrepancies at the level of full text and data abstraction by consensus, and if unsuccessful, with the help of a third reviewer. This arbitrator will independently review the article before discussing it with the reviewers. To ensure the validity and consistency of the process, we will conduct calibration exercises for each step of the process. We will also develop and pilot-test standardized forms and upload them onto the online systematic review software application. We will accompany all forms with detailed instructions. A core group will meet regularly to discuss progress and potential difficulties. We will create a study flow to describe the results of the different steps of the selection process.
Data extraction
We will extract the following information from each included systematic review: study characteristics, quality of the systematic review, the calculation and reporting of absolute estimates of effects, and the interpretation of absolute estimates of effects.
Study characteristics
For all included systematic reviews, we will extract the following information:
-
1.
Type of systematic review (Cochrane vs. not Cochrane);
-
2.
Type of intervention (pharmacologic vs. other);
-
3.
High-impact (Journal of the Medical Association, New England Journal of Medicine, Lancet, Annals of Internal Medicine, Journal of the American Medical Association and PLoS Medicine) vs. other journals;
-
4.
Quality of the review;
-
5.
Use of GRADE vs. not use of GRADE;
-
6.
Statistical significance of the effect for the most patient-important outcome;
-
7.
Source of funding (partially or completely funded by private for-profit organization or authors with financial conflicts of interest vs. others).
Specifically, we will collect information about whether the review was published in the five journals with the most journal citations (Journal of the American Medical Association, New England Journal of Medicine, Annals of Internal Medicine, Lancet and PLoS Medicine), the population and the intervention and control of interest. We will also extract information about source of funding (partially or completely funded by private for-profit organization vs. others) and the type of intervention (pharmacologic vs. other). We will note whether the systematic review used the GRADE approach; this includes whether authors provide a summary table, such as a summary of findings.
We will note whether the reviews include an absolute measure of effect (for example, ARR, NNT) for the most patient-important outcome for the selected comparison. We will also note this for any outcome other than the most patient-important, for both the comparison of interest and, if available, any other comparison. For the selected comparison, we will note whether the authors report benefits and harm outcomes and whether they report a measure of relative effect, a measure of absolute effect or both.
Quality of the systematic reviews
We will assess the methodological quality of eligible systematic reviews using the AMSTAR instrument [23].
Calculation and reporting of absolute estimate of effects
For those reviews that report at least one absolute effect estimate, we will record whether these estimates relate to the most patient-important outcome, any outcome within the comparison of interest or elsewhere in the full text. For the reviews that provide an absolute estimate for the most patient-important outcome, we will collect information about the type of measure (for example, risk difference, NNT) and the expression used when reporting if available (for example, risk reduced by 5%). We will explore how authors calculated the absolute estimates (for example, direct calculation from a meta-analysis); or modelled from baseline risk (for example, the median baseline risk from the included studies) and whether they state the calculation methods in their methods section. We will document the number of estimates of effect for different baseline risks and, if available, whether authors specify the source of these baseline estimates. If needed, we will contact authors for additional information.
Interpretation of absolute estimates of effects
Regarding interpretation, we will document whether authors discuss the fact that risk differences may vary to an important degree across subpopulations. We will also document the extent to which authors discuss this potential variability and their interpretation of the main effect of interest. Finally, we will assess whether the absolute estimates of effects are considered in the conclusion (a separate conclusion section or in the conclusion of a discussion section).
Sample size
We will calculate the sample size on the basis of an examination of study characteristics associated with the reporting of absolute effects for the most patient-important outcome: we will undertake this by means of a regression analysis. In this model we will include seven study characteristics with a total of eight categories of variable. We will require ten events per category to examine the association. Previous estimates show that approximately 50% of systematic reviews report an absolute estimate of effect [16]. We will consider 40% as our best estimate when considering the most patient-important outcome. Therefore, we will probably require a sample size of approximately 200 systematic reviews for our study. To increase our confidence in our sample size estimate, we will conduct a pilot study of 60 systematic reviews to further inform the final sample size.
Analysis
We will assess agreement between reviewers’ judgements of whether the investigators reported an absolute measure of effect for the most patient-important outcome. We will calculate chance-corrected agreement and interpret the results according to Landis and Koch guidelines (κ values of 0 to 0.20 represent slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 substantial agreement, and greater than 0.80 almost perfect agreement) [24].
We will calculate the proportion of systematic reviews, reporting at least one absolute estimate of effect for the most patient-important outcome, for any outcome within the comparison of interest, or for any comparison and any dichotomous outcome excluding the comparison of interest. We will conduct two multivariable logistic regression analyses to examine the association between pre-specified study characteristics and, first, the reporting of an absolute estimate of effect for the most patient-important outcome and, second, the reporting of an absolute estimate of effect for any outcome within the comparison of interest.
We will also calculate the proportion of systematic reviews that report the method they used to calculate the absolute estimate. We will calculate the proportion of systematic reviews that discuss whether risk differences may vary across populations anywhere in the article. We will conduct two separate multivariable logistic regression analyses to examine the association with the pre-specified study characteristics and respectively these two features of risk difference calculation and interpretation.
In addition, we will calculate the proportion of systematic reviews that use, for the comparison of interest, relative measures for benefit outcomes and absolute measures for harm (‘mismatched framing’ henceforth). Treating the reporting of results with mismatched framing as the dependent variable, we will conduct multivariable logistic regression analyses to examine its association with the pre-specified study characteristics.
Our pre-specified study characteristics for the regression analyses are listed and ranked by importance. If there are sufficient events, we will include them all. Otherwise, we will include as many as possible according to our ten events-per-category rule (see section on sample size). The first two factors to be examined will be type of systematic review (Cochrane vs. not Cochrane) and use of GRADE. However if, as we will explain, there is excessive confounding, we will not include GRADE in the regression:
We hypothesize that systematic reviews are more likely to report absolute effects or report them appropriately if they: (i) are Cochrane reviews, (ii) use GRADE, (iii) are of better quality, (iv) achieve statistical significance for the most patient-important outcome, (v) do not receive funding from for-profit organizations or their authors have financial conflicts of interest, (vi) evaluate pharmacological interventions, (vii) are published in a high-impact journal (Journal of American Medical Association, New England Journal of Medicine, The Lancet, British Medical Journal, Annals of Internal Medicine and Public Library of Science Medicine).
Before the regressions, we will look at the proportion of Cochrane and not-Cochrane systematic reviews that include a GRADE approach. We suspect that GRADE might be reported seldom in non-Cochrane systematic reviews and relatively frequently in Cochrane systematic reviews. If this is the case, there will be excessive confounding between use of GRADE and Cochrane reviews and it will be inappropriate to include them in the same regression.
If this proves to be the case, we will use the following analytical approach. We will compare Cochrane systematic reviews with and without GRADE. If there is a significant difference, we will then compare Cochrane with GRADE vs. non-Cochrane and Cochrane without GRADE vs. non-Cochrane. The results of these analyses will determine whether we include Cochrane or GRADE or both in the complete regression.