Selection of eligible cochrane reviews and meta-analyses
We prospectively screened all intervention reviews published in the Cochrane Database of Systematic Reviews (CDSR) from July 2018 to January 2019 for meta-analyses. We restricted the selection to meta-analyses generating evidence of “moderate” or “high” quality. In this respect, we adopted the review author’s assessment based on the GRADE framework. Therein, “moderate” or “high” quality of evidence represents the upper two of four levels (the others being “low” and “very low”). These judgments imply that the review authors consider the pooled effect estimates close to the true values with “moderate” or “high” certainty, respectively [16]. We decided against a limitation to meta-analysis with “high”-quality evidence since only few meta-analyses are rated as such. In a pre-study, we found that of 986 examined meta-analyses published in 59 Cochrane intervention reviews, only 26 (2.6%) were of “high” quality of evidence and the proportion of “moderate”-quality meta-analyses was also small (7.7%). Thus, we considered pooled effect estimates of both “high” and “moderate” certainty to be suitable proxies for truth.
In case a review contained at least one meta-analysis of moderate or high-quality evidence, we obtained the accompanying data file from the Cochrane website and extracted the corresponding data via Cochrane’s Review Managersoftware, version 5.3 [17]. We entered the extracted data and identifying information on the meta-analysis into our study database.
Matching study results with publication data
For each study result included in eligible meta-analyses, we identified the source publication. When more than one publication was referenced, we screened the articles’ abstracts and full texts to identify the correct publication reliably, as not in every case the designated primary publication was the source of all data used in meta-analyses.
Journal’s impact
For each publication included in eligible meta-analyses, we obtained information on the publishing journal’s JIF of the publication year from the Journal Citations Report (Clarivate Analytics). We preferred this indicator to others (e.g., SCImago Journal Rank (SJR), Eigenfactor (EF), and H factor) for primary analysis, as it represents the de-facto standard of citation metrics in science and is most likely known outside the scientific community. Information on the JIF was available from 1997 to 2018. We excluded results reported in older publications from the analysis.
As the average reference list in the medical sciences and other fields became lengthier over time, JIFs are subject to inflation [18, 19]. We did not consider this increase as a real accretion in impact on the process of science. We, therefore, adjusted the JIF for publication date by calculating the ratio of the actual JIF and the mean value of all journals’ JIFs in our study sample of a given year. The same was done for SJR and EF. The H-Factor, calculated by the SJR’s publisher Scimago Lab on the basis of the Scopus database, was only available at the time of data entry.
Operationalization of the outcome measure
The purpose of most scientific medical studies is to estimate some unknown true parameter value based on data from a study sample. The study estimates can deviate from the true value because of systematic error (bias) and random error (chance; lack of precision). Obviously, it was impossible for us to measure the true values, and a proxy variable for truth had to be employed. Meta-analyses combine the results of multiple studies to derive a pooled estimate that approximates the true value more accurately. In the absence of better alternatives and aware of the limitations of this choice, we defined the pooled effect estimate of the respective meta-analysis as “truth.” The outcome of our analysis, the closeness of a single study’s point estimate to the “truth,” was operationalised as the relative deviation. The point estimates were centred around their respective “true” value and then standardized with the average distance of the point estimates to the pooled effect estimate of studies in the meta-analysis. Risk ratio (RR), odds ratio (OR), and hazard ratio (HR) were transformed from their multiplicative scale to an additive scale to enable a comparison of different effect measures (RR and OR for dichotomous outcomes, HR for time-dependent outcomes, mean difference in various units of measurement and standardised mean difference for continuous outcomes). As it does not matter if a deviation from the truth is an over- or underestimation, negative signs were removed from the deviation measures.
In summary, a relative deviation of 1 means that a point estimate’s deviation from its pooled effect estimate corresponds to the average of all deviations of the point estimates of a given meta-analysis. When a meta-analysis reports a treatment vs. control pooled effect estimate of RR = 0.70 while one of the meta-analysis’ studies reports an RR of 0.80, that study has a distance to its pooled effect of |log(0.70)–log(0.80)|= 0.13. When the studies in the meta-analysis scatter around the pooled effect estimate with an average distance of 0.10, our example displays a relative deviation of 0.13/10 = 1.30; a value of > 1 indicating an above-average deviation.
Our approach becomes questionable when a single study dominates a meta-analysis. The pooled effect estimate is calculated as a weighted average of the intervention effects estimated in the individual studies [20]. The more a pooled effect estimate is influenced by a particular study, the less it may serve as an appropriate benchmark for the accuracy of this very study. This is obvious when a meta-analysis comprises only one individual study. According to our definition, this study cannot fail to meet the “truth.” Therefore, we considered all individual studies with a weight of 50% or more as non-informative regarding our research question and excluded them from further analysis. We also excluded study results with no point estimates, e.g., results with no events in either study arm for dichotomous outcome measures or missing standard deviations in the case of continuous outcome measures [20].
Covariates
To control for confounders, we considered variables that we expected to be (causal) predecessors of both the JIF and the relative deviation.
First, we suspected that larger studies with a correspondingly smaller variance of the effect measure are more likely to be published in higher impact journals than smaller studies with larger variance. As smaller variance translates to bigger weight in meta-analyses, the suspected association would result in the observation that studies from higher impact journals show systematically smaller relative deviations. As the risk of confounding has been mitigated but not eliminated by the exclusion of studies with a weight of 50% or more, we considered the study weight as a confounder.
Second, we assumed that a higher methodological quality of studies (in terms of compliance with the methodological requirements of a journal and thus a lower risk of bias) reduces the relative deviation and that studies with a lower risk of bias tend to be published in journals with a higher JIF than studies with a higher risk of bias. On the other hand, the range in risk of bias of studies that a journal usually accepts for publication can be seen as a characteristic of a journal. Then, the effect of the JIF on the relative deviation is partly due to the risk of bias acceptable to that journal. We adopted the latter view for analysis and quantified the proportion of explanatory power that is due to the risk of bias.
For every study included in Cochrane reviews, the results of a detailed and mostly standardized risk of bias assessment conducted by the review authors are reported [20]. Usually, seven key items are assessed for each study (“random sequence generation,” “allocation concealment,” “selective reporting,” “blinding of participants and personnel,” “blinding of outcome assessment,” “incomplete outcome data”, and “other”). We transformed the qualitative verdicts (“low risk of bias,” “unclear risk of bias,” and “high risk of bias”) for each key item to numerical values (0.0, 0.5, and 1.0, respectively) and calculated the mean value. The result is an aggregated risk of bias between “0” (low risk of bias in every item) and “1” (high risk of bias in every item). As this approach truncates the qualitative dimensions of the risk of bias assessment, it is not suitable for comprehensive critical appraisals of an RCT. Here, we pursue a general statistical relationship, and a loss of qualitative information appears acceptable.
Statistical analysis
The mean of the relative deviation, i.e., the weighted deviation of a study estimate published in that journal to the “truth” as defined above, was compared between study results from journals with an associated JIF and those without a JIF by using a Welsh t test.
Focusing on publications with a JIF, we assessed the effect of the JIF on the relative deviation by fitting a linear regression. The regression was adjusted for the assumed confounder, “study weight,” and weighted with the JIF to account for heteroscedasticity of the model residuals. Then, visual model diagnostics (mean of 0 and homoscedasticity across all fitted values and the predictor) indicated at most small deviations from the modelling assumptions. The assumption of normality of residuals was not met, but the central limit theorem came into force due to the large sample size. From a theoretical point of view, the observations were clustered within their respective meta-analyses and were consequently not independent. However, a mixed effects model with random intercept and slope for each meta-analysis found a variance of 0 for both random intercept and slope, indicating that a fixed effects regression is sufficient. Three extreme values (with relative deviations of 11.6, 6.4, and 6.3, JIFs 0.22, 1.0, and 1.0) were excluded for more robust results and a better graphical representation.
As the linearity of the relationship between JIF and relative deviation is a quite strong assumption, we additionally allowed for a relationship without a pre-specified functional form by fitting a local polynomial regression (R function stats::loess). Again, the model was adjusted for study weight in the meta-analysis and observations were weighted by their JIF.
The regression curves of both models were plotted together with their 95% confidence bands while fixing the value for the adjustment variable “study weight” at its median value of 7.41%.
Subgroup analyses were conducted for (1) “high” vs. “moderate” quality of evidence, (2) open access vs. traditional publishing, (3) large (ten or more included studies) vs. small meta-analyses, (4) meta-analyses using random effects vs. fixed effects for the included studies, (5) low vs. moderate study weight (median split at 7.41%), (6) low vs. high risk of bias (median split at 0.21), and (7) study results using RR vs. OR vs. mean differences as effect measures. In all cases, the linear regression and the local polynomial regression are shown, both adjusted and weighted as in the main analysis.
In addition to the JIF, the scientific journal ranking (SJR; adjusted for inflation), eigenfactor (EF; adjusted for inflation), and the H-factor were analyzed. Fourteen observations (EF ranging between 43 and 69) were excluded because they exhibited a strong leverage effect. Again, linear as well as local polynomial regressions were fitted for each outcome. The regressions were weighted with the respective measure of impact to control heteroscedasticity of residuals.
The proportion of explanatory power of the JIF due to the risk of bias was calculated as the proportion of the increase in multiple R2 when adding JIF as predictor to the basic adjusted and weighted model, that could be attributed to the risk of bias predictor: ((R2JIF model—R2basic model) + (R2 ROB model—R2basic model)—(R2JIF and ROB model—R2basic model))/(R2JIF model—R2basic model).
We further assessed if study results from small studies published in high-impact journals are especially far from “truth” by adding an interaction term between JIF and study weight to the main linear regression.
All statistical analysis was performed with the R software in version 4.1.3 [21].