Risk of bias judgements and strength of conclusions in meta-evidence from the Cochrane Colorectal Cancer Group

Background The Cochrane Collaboration records risk of bias (ROB) judgements on the original studies it analyses. The aim of this review is to perform an audit of all literature produced by the Cochrane Colorectal Cancer Group (CCCG), focusing on whether intervention type has any relationship with ROB and the ability of a review to inform clinical practice. Methods The most recent version of every CCCG review from January 2000 to the end of July 2018 was included. Conclusions were categorized as informing clinical practice (I) or not (N). Both I and N categories were divided into firm (F) or tempered (T) based on the definitiveness of their language. ROB judgements were aggregated. Reviews were classed as Medical (M), Surgical (S), Medical & Surgical (MS) or Other (O) based on their intervention, with O reviews then excluded. Data were analyzed in SPSS. Results Ninety-five reviews were included, covering 1892 studies. Sixty-two percent (n = 59/95) informed clinical practice (I). Thirty-eight percent (n = 36/95) did not inform clinical practice (N). Of the N group, 53% (n = 19/36) were completely equivocal (firm) while 47% (n = 17/36) were moderately so (tempered). In the I group, 46% (n = 27/59) gave a conclusion that was firm and 54% (n = 32/59) were tempered. Seven thousand five hundred sixty-four cases of bias were assessed. Risk of bias was low in 43%, high in 20% and unclear in 37%. A review that regarded a medical intervention alone was significantly more likely to be comprised of studies with a low risk of bias than a review that included a surgical intervention (p < 0.001). Conclusion The Cochrane Colorectal Cancer Group finds the risk of bias to be low in less than half of its judgements. A review that included a surgical intervention was less likely to display low risk of bias. Risk of bias was associated with whether a review informed clinical practice, but intervention type was not. Readers of colorectal literature should be cautious when considering original and meta-evidence in this field, particularly where a surgical intervention is assessed. Electronic supplementary material The online version of this article (10.1186/s13643-019-1001-0) contains supplementary material, which is available to authorized users.


Background
The Cochrane Collaboration is an independent [1] global medical research organization [2], well regarded as having a high standard of methodology and rigor [3]. As part of their assessment of original evidence for inclusion in reviews, the Cochrane Collaboration typically assess each study's risk of bias (ROB), recording a judgement of "low risk", "high risk" or "unclear risk" across seven standard criteria [4]. The criteria, familiar to many readers, were based on consensus expert review [5] and are intended to focus on the internal validity of studies. Assessment of ROB using the Cochrane tool has been previously noted to have high inter-rater variability [6], and the impact on effect size that a judgement of high or unclear bias in a particular domain has may vary according to the intervention and design of an original study [5]. A judgement of "unclear" may reflect a deficiency in the quality of reporting, rather than poor internal validity of the study. However, Cochrane's approach is regarded as the gold standard for risk of bias assessment in meta-evidence [7].
Judgements of high or unclear risk have been associated with an over-appreciation of effect size [6]. The domains are listed in Table 1. Risk of bias judgements are collated and published in the finished reviews. Cochrane reviews are usually based on randomized control trials and do not consider "real-world data".
Cochrane's colorectal cancer group is the CCCG, the Cochrane Colorectal Cancer Group [8]. Each review published by the CCCG typically includes a recording of the ROB judgements of the relevant group of original studies. The overall picture of bias within the literature assessed by the CCCG has yet to be aggregated. This paper will collect and analyze all ROB judgements published by the CCCG, giving an overview of the risk of bias found in colorectal original research. The combined data will provide a view of the quality of a sample of colorectal literature over time, and a snapshot of its current status. It will also offer insight into the level of clinical utility of scientific publications within the colorectal domain; that is, when Cochrane reviews the colorectal literature, how often is it able to inform clinical practice?
As a subset of surgical intervention, laparoscopic surgery presents an example of the surgical research dilemma; technology precedes evidence, and the barriers to surgeons using new devices are low [9]. The learning curve of a new surgical implement and the pre-existing challenges of surgical research make robust conclusions about best practice difficult, particularly in the early phase of implementation [10]. This is in a commercial setting of high public demand and great marketing potential for new surgical technologies, as may also be seen currently with the popularity of robotic surgery [11,12]. Laparoscopic surgery ("key-hole" surgery) became widespread prior to significant evidence demonstrating its superiority over open approaches [13] and even now there remain some areas of controversy [14]. Subgroup analysis on laparoscopic papers from within the CCCG output was planned for this review as a means of assessing the meta-evidence support for this surgical technique.

Literature search
In collaboration with the CCCG, a list of all of that group's reviews from January 2000 to the end of July 2018 was acquired. The recorded ROB judgements for each study were also provided. The provided database was compared by two independent reviewers (JD and RC) with reviews retrieved from the Cochrane Library to check accuracy.

Definitions
Risk of bias judgements are defined in the Cochrane Collaboration handbook [4]. Conclusion type was classified as "informs clinical practice-firm" (I-F), "informs clinical practice-tempered" (I-T), "does not inform clinical practice-tempered" (N-T) and "does not inform clinical practice-firm" (N-F). The definitions for these categories are outlined in Table 2 and have been described and used previously [15].
Reviews were classed as Medical (M), Surgical (S), Medical & Surgical (MS) or Other (O) based on the intervention. An M paper considered an intervention that was exclusively medical, whereas an S paper examined an intervention that was exclusively surgical in nature. An MS paper was one where a surgical intervention was assessed in the setting of medical intervention or vice versa, for example, Epidural local anesthetics versus opioid-based analgesic regimens for postoperative gastrointestinal paralysis, vomiting and pain after abdominal surgery [16]. A review that did not assess an intervention or incorporated a therapy that was neither surgical nor medical (for instance, radiotherapy) was classified as "Other" (O).

Inclusion and exclusion criteria
Included papers were systematic reviews and meta-analyses produced by the Cochrane Colorectal Cancer Group from January 2000 to the end of July 2018 that were classified as M, MS or S. Where more than one version of a review had been produced, the most up-to-date version was preferred. Reviews that were classified as O (that is, reviews that considered no intervention, or an intervention that was neither Table 1 The Cochrane risk of bias tool

Risk of bias domain Judgement
Random sequence generation Selection bias (biased allocation to interventions) due to inadequate generation of a randomized sequence.
Allocation concealment Selection bias (biased allocation to interventions) due to inadequate concealment of allocations prior to assignment.
Blinding of participants and personnel Performance bias due to knowledge of the allocated interventions by participants and personnel during the study.
Blinding of outcome assessment Detection bias due to knowledge of the allocated interventions by outcome assessors.
Incomplete outcome data Attrition bias due to amount, nature or handling of incomplete outcome data.

Selective reporting
Reporting bias due to selective outcome reporting.
Other bias Bias due to problems not covered elsewhere in the table.
surgical nor medical) were a priori excluded from this analysis to facilitate comparison of surgical and medical interventions.

Data extraction
ROB judgements were extracted from each included Cochrane review. Conclusions and intervention type (M, S, MS or O) were scored independently by JD and RC. Each CCCG review was categorized based on its ability to inform clinical practice, using a standardized matrix ( Table 2). Any disagreements were resolved by discussion to arrive at a consensus. Collation of data was performed in Microsoft Excel [17]. A subgroup of reviews concerning laparoscopic interventions was isolated and assessed (17 reviews). For visual comparison, a graphical representation of the commentary made on evidence within the conclusions of those reviews (a "word cloud") was generated using Microsoft Word.

Statistical analysis
Analysis of data was performed using SPSS v24 [18]. Inter-observer agreement was assessed using weighted kappa (κ) [19]. Data that were categorical were analyzed was via cross tabulation with chi-square. A two-tailed distribution with an alpha level of 0.05 was used, with a Bonferroni correction where required. The means of continuous data were compared using one-way ANOVA with planned contrasts. An alpha level of 0.05 was used, again with a Bonferroni correction where required.

Results
The data provided by the CCCG included 117 unique Cochrane reviews. Twenty-two reviews were excluded from our audit; 9 regarded diagnostics rather than intervention, 5 reviews concerned radiotherapy, 4 examined herbal therapy, 3 regarded dietary modifications and 1 considered the impact of a perioperative blood transfusion (see Appendix 1 and 2). A PRISMA flow diagram may be found in Fig. 1. The PRISMA checklist is available as an Additional file 1. The 95 included reviews covered 1892 original studies. This created 13,244 possible cases of bias to be judged (7 bias categories across 1892 studies).
The CCCG made a combined total of 7564 judgements across the seven ROB categories. In 5680 instances, a ROB judgement was not recorded by the Cochrane Group (for example, when a review provided a judgement for some but not all of the ROB criteria). Overall, bias was judged as low 43% of the time (n = 3291), high 20% of the time (n = 1490) and unclear 37% of the time (n = 2783). Bias judgements for each intervention type may be found in Table 3. A chi-square test was performed, and a significant relationship found between the intervention type and the likelihood of a study to show differing bias (chi-square, df 4, 41.696, p < 0.001). Comparison of individual groups using chi-square with a Bonferroni correction (α = 0.0024) revealed M to be more likely than S and MS to have studies with a low risk of bias (M = 50% > S = 42%, p < 0.001), (M = 50% > MS = 42%, p = 0.002). No difference was found in the likelihood of a low risk of bias when comparing S with MS (p = 0.831). Figure 2 displays low-risk judgements by intervention as a percentage of judgements made across Cochrane's seven categories. Assessment of the likelihood of high-risk judgements showed S reviews to be more likely than M reviews to show a high risk of bias (S = 21% v M = 16%, p < 0.001). MS reviews were also more likely than M to display high risk (MS = 20% > M = 16%, p < 0.001). There was no difference in the likelihood of a high risk of bias between S and MS reviews (p = 0.2294). M had the greatest percentage of low-risk judgements across all risk-of-bias categories with the exception of selective reporting, where MS had the greatest percentage. When comparing M with S using chi-square with a Bonferroni correction (α = 0.0024), M group reviews were found to be significantly more likely to be comprised of studies with a low risk of bias than S reviews in the category of blinding of participants and personnel (M = 35% > S = 21%, p < 0.001). No difference was found in the likelihood to display low risk in random sequence generation (M = 50%, S = 42%, p = 0.049), allocation concealment (M = 44%, S = 37%, p = 0.035), blinding of outcome assessment (M = 32%, S = 22% p = 0.007), incomplete outcome data (M = 71%, S = 67%, p = 0.439), selective reporting (M = 64%, S = 54%, p = 0.0145) and other bias (M = Informs clinical practice-firm A conclusion that makes a recommendation for practice (positive or negative), with minimal or no caveats.
Informs clinical practice-tempered A conclusion that makes a recommendation for practice, but places significant caveats on that recommendation.
Does not inform clinical practice-tempered A conclusion that is unable to make a recommendation but suggests that a recommendation might be possible soon based on an emerging trend or underlying theory.
Does not inform clinical practice-firm A conclusion that is unable to make a recommendation and is completely uncertain. There may be no evidence at all, or of too poor a quality, or the evidence may be contradictory.
A chart of the conclusiveness of reviews by intervention is shown in Fig. 3. There was a significant difference between the groups when assessed using chi-square (df 6, 20.274, p = 0.002). Comparison of individual groups using chi-square with a Bonferroni correction (α = 0.004) was performed. M reviews were significantly more likely to provide conclusion that firmly informed clinical practice than S (M = 71% > S = 20%, p < 0.001) but not MS (M = 71%, MS = 30%, p = 0.0294). Reviews informed clinical practice (regardless of whether firm or tempered) in 81% of M reviews, 55% of MS reviews and 57% of S reviews, but there was no significant difference between these groups (M v S p = 0.066, M v MS p = 0.1, S v MS p = 1).
The risk of bias groups and their relationship to conclusion type were examined via chi-square, with a significant difference found between the groups (chi-square, df 6, 311.465, p < 0.001). Comparison of I and N groups using chi-square with a Bonferroni correction (α = 0.0018) revealed a significant difference in the likelihood of input studies being judged as having a low risk of bias between them (I = 46% > N = 35%, p < 0.001).
Chi-square with a Bonferroni correction (α = 0.0018) was used to examine whether there was any association between the seven risk of bias categories and a review's likelihood to inform clinical practice. Risk of bias domains that had a significant association between low-risk judgements and conclusion type were blinding of outcome assessment (I = 42% low risk > N = 17% low risk, p < 0.001), selective reporting (I = 72% > N = 42%, p < 0.001) and other bias (I = 59% > N = 36%, p > 0.001). There was not a significant association found between random sequence generation (I = 42%, N = 41%, p = 0.939), allocation concealment (I = 37%, N = 30%, p =  Fig. 4. For interest, the conclusions of a subgroup of reviews that considered laparoscopic interventions were assessed, shown in Table 4. Seventeen reviews were found. The modal recommendation was I-T in 8 (47%), followed by N-F in 5 (29%), N-T in 2 (12%) and I-F in 2 (12%). A "word cloud" made using descriptions of evidence quality from the conclusions of each of these reviews may be seen in Fig. 5.

Discussion
This review has gathered previously made judgements on the risk of bias from a large sample of original research within the colorectal field and combined them, forming a portrait of the risk of bias within the discipline generally. We have then added to this data new judgements regarding the type of intervention studied and the clinical relevance of the meta-evidence produced. Using this approach, a view of the quality of data input and data output within colorectal science may be formed. The importance of these results relates to the defining characteristic of intervention-based medical research: the drive to improve patient outcomes. Studies that exhibit a high risk of bias lead to meta-evidence that is less able to guide clinical outcomes, together forming clinical "noise", from which clinicians are tasked with separating the "signal". The results of this review illustrate the benchmarks within colorectal science of "signal" and "noise", and whether there are any associated factors that will help guide the production of useful original and combined evidence. When the Cochrane Colorectal Cancer Group examines the risk of bias in original studies within colorectal science, it found the risk of bias to be low in the minority of cases. Notably, many of the studies assessed by the CCCG do not specifically address questions surrounding colorectal cancer per se (for instance, "Cisapride for Intestinal Constipation" [21], or "Antibiotics for uncomplicated diverticulitis" [22]), but rather provide an overview of colorectal science in general, with an inclination towards cancer research. The opinion of the Cochrane group was that the risk of bias within this sample was high or unclear in greater than half of cases. If we consider risk of bias judgements that could possibly have been made but were not to be "unclear" the proportion of low-risk judgements to high or unclear drops further. This review suggests that the minority rate of low-risk judgements across all of the CCCG should be noted by researchers and readers in the colorectal discipline. Efforts to minimize the risk of bias within this field should be considered, led by feedback from meta-evidence.
There was an association found between the intervention type and the level of low risk of bias judgements made. Studies concerning a medical intervention (M) were significantly more likely to display low risk of bias than studies concerning a surgical intervention only, or studies with a surgical and medical intervention. Studies that assessed only a surgical intervention were judged to have a low risk of bias at a rate that was not significantly different to studies that assessed a medical and surgical intervention. S reviews and MS reviews were significantly more likely than M reviews to have high-risk judgements. This may suggest that the addition of a surgical intervention may decrease the amount of bias protection inherent in the assessment of a medical intervention.
Across all interventions, the rate of low-risk judgements was 50% or less in random sequence allocation, allocation concealment and blinding of outcome assessment and participants and personnel. That the randomization, concealment and blinding for all studies within the CCCG reviews were judged as having high or unclear risk of bias in more than half of cases is cause for reflection. Previous research has demonstrated that high-risk judgements from the Cochrane risk of bias tool are associated with increased effect sizes [6,23]. It should be noted that within this sample over 5000 risk of bias judgements that could have been made were not, which potentially dilutes the generalizability and impact of this finding.
Despite a significant difference in low risk of bias findings between M and S overall, analysis of each domain did not reveal a deeper pattern, though a significant difference was found in the blinding of participants and personnel, and a near-significant difference seen in the blinding of outcome assessors. Perhaps surprisingly, the rate of high-risk judgements made regarding blinding was not significantly different between M and S reviews. Both M and S reviews were judged to be high risk for binding in nearly half of all cases. Difficulty in adequately blinding is the nature of the surgical intervention and remains a systemic challenge to rigorous science in surgery. However, the result for the M group suggests that where practical, medical colorectal studies may benefit from paying particular attention to the risk of bias due to inadequate blinding. M and MS groups were significantly different across a range of bias categories, with M being more likely than MS to have a low risk of bias in random sequence allocation, allocation concealment and blinding of participants and personnel, and MS being more likely to display low risk in selective reporting. As discussed, the inclusion of a surgical intervention in a study makes participant and personnel blinding difficult, which may explain why MS reviews were less likely than M to have a low risk of bias in this domain. Errors in reporting may explain the difference in random sequence generation, as MS was no more likely to be judged as high risk in this area, suggesting the discrepancy to be related to a large proportion of "unclear" judgements. Within allocation concealment, however, MS reviews were more likely to have less low-risk judgements and they were significantly more likely to have a high-risk judgement. The increased exposure to bias from poor allocation concealment within MS reviews is also present when contrasting MS with S reviews. This result is attributable to the impact of a single MS review, "Antimicrobial prophylaxis for colorectal surgery" [24], which had a high-risk judgement rate for allocation concealment of 45% and provided 115 of the 133 allocation concealment judgements made.
Intervention type was not associated with whether or not a review would be able to inform clinical practice. The likelihood that a review will make a conclusion that informs clinical practice is similar between surgical, medical and combined medical and surgical meta-evidence, despite the fact that studies that incorporate a surgical intervention are more likely to display high or unclear bias. In contrast, there was an association between whether a review informed clinical practice and the risk of bias. M reviews were better protected against bias than S and MS reviews but were no less likely to inform clinical practice. Where a conclusion did inform practice, the likelihood that it would be firm, rather than tempered, was significantly different M and S reviews, but not M and MS. Reviews that examined a surgical intervention exclusively were less confident about their conclusions but were willing to inform practice. It is speculative, but this may suggest that the threshold for a clinical recommendation within surgical evidence is lower than that in medical evidence. Readers of these reviews should be conscious of the GRADE quality assessments that accompany modern Cochrane reviews [25] and consider all low-quality clinical recommendations cautiously. It may be possible that a new weighting metric for meta-evidence, which combines a weighted risk of bias quality score with the cohort size of each study, may deliver meta-analysis that better accounts for varying input quality.
The subgroup analysis of laparoscopic surgery, including the "word cloud", shows a negative view of the quality of the input studies and a lack of confidence regarding the evidence. In spite of this, a conclusion that informs clinical practice is made in nearly half of all cases, albeit tempered. The following quoted conclusion, regarding the use of laparoscopic or open techniques in the management of suspected appendicitis, one of surgery's most common ailments, illustrates the challenge surgeons face; "In spite of the mediocre quality of the available research data, we would generally recommend to use laparoscopy…in patients with suspected appendicitis unless laparoscopy itself is contraindicated or not feasible" [26].
These findings suggest that the original input studies informing meta-evidence within surgery may be inherently biased in a way that makes evidence synthesis in this field less applicable. In the face of this, evidence-based surgery remains elusive. Novel ways of thinking about surgical research may need to be employed. This is not a new revelation [27], but perhaps there are technological advances now that afford surgical research an opportunity not present before. In particular, cloud-based "big data" collection and machine learning analysis may be useful; an approach where high-fidelity patient and practitioner data are automatically recorded for analysis across multiple centres may provide a better avenue to approximate the real-world impact of surgical interventions.

Strengths and limitations
The strengths of this "review of reviews" are the large number of Cochrane reviews included in our assessment and the dual independent extraction of data. Cochrane is viewed as the gold standard of systematic reviews. Assessment of risk of bias is standardized across Cochrane trials, which enables comparison. Our methods and definitions have been clearly outlined.
The limitations of this study include the restriction of data to Cochrane reviews only, introducing the possibility of selection bias. The CCCG did not record a judgement in over 5000 of the instances where it could have, creating potential bias. The risk of bias judgements recorded by Cochrane are subjective and open to bias. Likewise, the judgements made by the authors of this review regarding the conclusiveness of each of the Cochrane reviews are subjective.

Conclusion
The findings of this study highlight a need for more detailed reporting and a greater degree of methodological rigor within original colorectal research. Although the type of intervention was associated with a higher risk of bias, it was not associated with the likelihood of a review to inform clinical practice. Surgical studies, in particular, are prone to a higher degree of bias risk and must be interpreted with caution; a review that included a surgical intervention was likely to have a higher risk of bias but was just as likely to inform clinical practice. This may be reflective of the systemic challenges of surgical research.