Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews

Background The Cochrane risk of bias tool for randomized clinical trials was introduced in 2008 and has frequently been commented on and used in systematic reviews. We wanted to evaluate the tool by reviewing published comments on its strengths and challenges and by describing and analysing how the tool is applied to both Cochrane and non-Cochrane systematic reviews. Methods A review of published comments (searches in PubMed, The Cochrane Methodology Register and Google Scholar) and an observational study (100 Cochrane and 100 non-Cochrane reviews from 2014). Results Our review included 68 comments, 15 of which were categorised as major. The main strengths of the tool were considered to be its aim (to assess trial conduct and not reporting), its developmental basis (wide consultation, empirical and theoretical evidence) and its transparent procedures. The challenges of the tool were mainly considered to be its choice of core bias domains (e.g. not involving funding/conflicts of interest) and issues to do with implementation (i.e. modest inter-rater agreement) and terminology. Our observational study found that the tool was used in all Cochrane reviews (100/100) and was the preferred tool in non-Cochrane reviews (31/100). Both types of reviews frequently implemented the tool in non-recommended ways. Most Cochrane reviews planned to use risk of bias assessments as basis for sensitivity analyses (70 %), but only a minority conducted such analyses (19 %) because, in many cases, few trials were assessed as having “low” risk of bias for all standard domains (6 %). The judgement of at least one risk of bias domain as “unclear” was found in 89 % of included randomized clinical trials (1103/1242). Conclusions The Cochrane tool has become the standard approach to assess risk of bias in randomized clinical trials but is frequently implemented in a non-recommended way. Based on published comments and how it is applied in practice in systematic reviews, the tool may be further improved by a revised structure and more focused guidance. Electronic supplementary material The online version of this article (doi:10.1186/s13643-016-0259-8) contains supplementary material, which is available to authorized users.


Background
Since the early 1990s, the number of published systematic reviews of randomized trials, both Cochrane and non-Cochrane reviews, has steadily increased. The ideal of taking a systematic approach to identify, summarise and analyse comparable clinical trials as a basis for therapeutic decisions has become more widespread, and systematic reviews have had a huge impact on clinical research and practice.
However, one obstacle to the usefulness of a systematic review is the possibility that some of the included trials are biased due to flaws in their design, conduct, analysis or reporting. A meta-analysis of biased effect estimates will likely produce a biased pooled analysis with increased precision and greater credibility. Thus, for authors of a systematic review, it is paramount to adequately address the risk of bias in the included trials [1].
For this purpose, the Cochrane tool for assessing risk of bias in randomized clinical trials (i.e. the tool) was released in 2008 and updated in 2011. The tool is based on seven bias domains: sequence generation and allocation concealment (both within the domain of selection bias or allocation bias), blinding of participants and personnel (performance bias), blinding of outcome assessors (detection bias), incomplete outcome data (attrition bias), selective reporting (reporting bias) and an auxiliary domain: "other bias." For each bias domain, the tool urges users to assign a judgement of "high," "low" or "unclear" risk of bias and to document the basis for their judgements (e.g. with verbatim quotes). The bias domains of the tool were selected with the intention to cover all fundamental bias mechanisms in randomized trials [2].
Several years have passed since the release of the first version of the tool. Over this period, the tool has been used in numerous systematic reviews, the scientific debate on risk of bias has proceeded (for example, reflecting on the role of source of funding [3][4][5][6] or other "meta-biases" [7]) and research publications have analysed user experience [8] and inter-agreement rates [9][10][11]. Additionally, a complementary tool for assessing non-randomized trials has been developed [12].
Researchers from the original development team and members of the Cochrane Bias Methods Group are planning a revision of the tool. To evaluate the tool and to provide a better basis for the revision, we intended (1) to identify, summarise and analyse published comments on the strengths and challenges of the tool and (2) to describe and analyse how the tool is used in both Cochrane and non-Cochrane reviews.

Methods
This study involved a review of published comments on the Cochrane tool for assessing risk of bias in randomized clinical trials and an observational study of how the tool is used in systematic reviews (please refer to Additional file 1 for the study's PRISMA checklist).

Review of published comments
We sought publications that explicitly commented on the tool. We defined "major comments" as longer comments with a substantial reflection (typically ≥100 words of text) on the strengths or weaknesses of the tool, for example, in the form of an editorial. We also included "minor comments," which we defined as shorter comments without a substantial reflection (typically <100 words of text) on the strengths or weaknesses of the tool, for example, in the form of minor elements of a discussion in a publication. We excluded "peripheral remarks" on the tool, which we defined as remarks that were implicit or short and tangential. If an author had several publications included with similar comment contents, only the publication with the most detailed comment was considered major.
We searched PubMed, The Cochrane Methodology Register and Google Scholar for publications from the start of 2008 to the end of 2014. No language restriction was applied, and Google Translate was used for nonfamiliar languages. The search strategy was developed iteratively (see Additional file 2).
One author (LJ) decided on inclusion of publications and categorised them as "major comments" and "minor comments" (and "peripheral remarks"). A second author (AS) checked the categorisation. Two authors (LJ and AS) extracted data independently. Any disagreements were solved by discussion and by consulting a third author (DL or AH).
The following information was extracted: publication year, publication type, tool version considered (i.e. 2008 or 2011) and the exact wording of the comment.
Comments from the included publications were categorised according to whether they expressed "strengths," "challenges" or "suggestions" and summarised into broader themes (each addressing a similar type of topic). We noted the numerical distribution of comparable comments, but our main intention was a qualitative mapping of the themes addressed and a categorisation according to whether they addressed a core design feature of the tool or an issue related to implementation.
Observational study of how the tool is used in systematic reviews One author (DL) identified 100 Cochrane reviews (or Cochrane review updates) from PubMed in reverse chronological order from 31.12.2014 until 20.11.2014 (see Additional file 2). The same author manually identified 100 non-Cochrane reviews from PubMed in reverse chronological order from 31.12.2014 until 22.12.2014.
A second author (AS) checked the inclusion. We defined a non-Cochrane review as a self-declared systematic review with at least one included randomized clinical trial. We excluded any non-Cochrane review that was also published as a Cochrane review.
Three authors (AS, DL and LJ) extracted data independently: intervention type (pharmacological or nonpharmacological); inclusion of meta-analyses; number of trials and how many trials were categorised as "high," "unclear" and "low" risk of bias; the method used for judging risk of bias (or quality) and how it was implemented; the type and frequency of both standard and non-standard domain use; the use of merging or splitting of standard domains (e.g. merging blinding domains or splitting for different outcomes); the use of the "other bias" domain; how risk of bias assessments were incorporated into statistical analysis using sensitivity analyses; whether risk of bias judgements were explicitly mentioned in the abstract, discussion or conclusion; and whether The Grading of Recommendations Assessment, Development and Evaluation (short GRADE) had been incorporated. We compared differences in proportions between Cochrane and non-Cochrane reviews using Fisher's exact test. In cases where Cochrane or non-Cochrane reviews included both randomized clinical trials and non-randomized clinical trials, we disregarded the non-randomized trials.

Review of published comments
We read 976 full text publications of which we excluded 908 (Fig. 1). Thus, we included 68 publications, of which we categorised 15 as "major comments" and 53 as "minor comments" (Tables 1 and 2).
The strengths of the tool were addressed in five "major comments" relating to three themes: aims, developmental basis and transparency. The comments praised the tool for aiming to assess conduct (and not reporting), being based on theoretical and empirical evidence and on broad consultation and facilitating transparent assessment of bias.
The challenges of the tool were addressed in 15 "major comments" relating to four themes: choice of the core bias domains, implementation, overall risk of bias and special situations. The comments on choice of core bias domains expressed concern whether the chosen domains comprehensively address all threats to validity (for example, five comments reflected on including funding as an independent bias domain). Comments on implementation pointed to difficulties in the subjective Fig. 1 Flowchart of the inclusion of comments on the Cochrane risk of bias tool for randomized clinical trials-evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials. 1 N= the number of records/comments screened for inclusion. 2 Of the 976 full-texts assessed, 793 full-texts did not comment on the Cochrane risk of bias tool for randomized clinical trials (i.e. the tool). 3 Seven records (ordered through The Royal Danish Library) were not retrievable and therefore not assessed. 4 183 publications were independently assessed by two authors to check type, categorisation and commentary. 5 Major comments were defined as longer comments with a substantial reflection (typically ≥100 words of text) on the strengths or challenges of the tool. 6 Minor comments were defined as shorter comments without a substantial reflection (typically <100 words of text) on the strengths or challenges of the tool. 7 Peripheral remarks (defined as implicit or short and tangential) were excluded interpretation of the tool and expressed concerns about modest inter-observer agreement, difficulty in assessing selective reporting of outcomes, terminological ambiguity (i.e. of the terms subjective/objective) and the low proportion of reviews using risk of bias assessments as a basis for sensitivity analyses. The comments on overall risk of bias expressed concern about the challenges in assigning an overall risk of bias to a trial based on risk of bias of single domains to the trial as such. A single comment regarded the special situation where the tool was used to assess risk of bias based on clinical study reports (and not clinical trial publications).
Specific suggestions to improve the tool were included in nine "major comments" relating to three themes: improved guidelines, further research and the inclusion of funding as a bias domain. The comments on guidelines suggested that updated and improved guidance and more training options for users were needed. The comments on research suggested further methodological research (for example, blind versus non-blind risk of bias assessments). The comments on funding suggested that funding/conflicts of interest should be incorporated into the tool as a specific bias domain.
All themes addressed in the "major comments" were represented in the "minor comments" (see Additional file 2). Additional themes addressed only in the "minor comments" included graphical representation, external validity and non-randomized designs. Specifically, (i) one comment praised the tool for its graphical representation of risk of bias assessments, (ii) one comment criticised that the tool does not address external validity (and only focuses on internal validity) and (iii) one comment noted that non-randomized trials should be included in Cochrane reviews and should be addressed in risk of bias assessments. The latter two suggestions are inconsistent with the aim of the tool, which is to assess only bias (i.e. internal validity) in randomized clinical trials. Such comments help to unveil the assumptions and basic structure of the tool but would be difficult to implement without significantly changing the tool.
Other comments reflected concerns about the implementation of the tool. An example is the suggestion for improved guidelines for how to assess selective outcome reporting. Also, improved training options and more detailed guidelines aimed to improve agreement rates address the implementation of the tool. Such suggestions are easier to implement while keeping the fundamental structure of the tool intact.
The majority of Cochrane reviews included one or more meta-analyses (85/100, 85 %). According to the information reported in their methods section, most of the Cochrane reviews had planned to perform sensitivity analyses based on risk of bias (70/100, 70 %). One fifth of the Cochrane reviews reported to have performed sensitivity analyses (19/100, 19 %). Few reviews based sensitivity analyses on an overall risk of bias (2/19, 11 %). Most reviews based sensitivity analyses on individual bias domains (9/19, 47 %) or did not state what sensitivity analyses were based on (8/19, 42 %). The majority of the Cochrane reviews who did not conduct the planned analyses reported that the lack thereof was due to insufficient data (41/50, 82 %), either because there were few trials included in the review or few trials with "low" risk of bias. The remaining reviews did not explain why they did not perform the planned analyses (9/50, 18 %) (Tables 3 and 4).  Minor comments were defined as shorter comments without a substantial reflection (typically ≤100 words of text) on the strengths or challenges of the tool. c Comments, editorials and letters (to the editor) were defined as such if self-declared 1. "…the tool aims at being completely transparent, with a separation of the facts and reviewers' judgments. This aim is particularly important because reviewers, editors, and readers can challenge the author on the judgment." 2. "…the tool is intended to assess the risk of bias related to the design, conduct, and analyses of the trial and not the quality of reporting." 3. "This tool has been an important step forward in the assessment of the risk of bias in systematic reviews and meta-analyses." 1. "Low agreement between reviewers suggests the need for more specific guidance regarding interpretation and application of the Risk of Bias (ROB) tool…" 2. "The majority of trials in the sample were assessed as high or unclear risk of bias…This raises concerns about the ability of the ROB [risk of bias] tool to detect differences across trials that may relate to biases in estimates of treatment effects." 3. "…trials with different design features (e.g., crossover) or hypotheses (e.g., equivalence, non-inferiority), and those examining non-pharmacological interventions appear to create more ambiguity for risk of bias assessments."  One tenth of the non-Cochrane reviews that had any risk of bias assessment reported plans for sensitivity analyses based on risk of bias assessments (8/80, 10 %). One in seven of all the non-Cochrane reviews reported to have performed sensitivity analyses based on risk of bias or quality assessments (11/80, 14 %). In nine reviews, the sensitivity analyses were based on an overall risk of bias (9/11, 82 %) ( Table 4).

Challenges
Two Cochrane reviews performed subgroup analyses (both with "low" versus "high" risk of bias) (2/100, 2 %). None of the non-Cochrane reviews performed subgroup analyses based on risk of bias.
Most Cochrane reviews explicitly commented on risk of bias assessments in the discussion and/or conclusion (89/100, 89 %), although fewer incorporated this information into the abstract (80/100, 80 %). Most of the non-Cochrane reviews that applied the Cochrane tool and some of the non-Cochrane reviews that applied non-Cochrane tools explicitly commented on risk of bias assessments in the discussion and/or conclusion (Cochrane tool: 25/31, 81 %; non-Cochrane tools: 12/ 49, 24 %) and more than half incorporated this information into the abstract (Cochrane tool: 18/31, 58 %; non-Cochrane tools: 30/49, 61 %). No significant differences were found between the non-Cochrane reviews that used the Cochrane tool versus the non-Cochrane reviews that used other risk of bias tools when comparing the use of risk of bias results in the abstract and discussion/conclusion.
The majority of Cochrane reviews applied all standard domains (59/100, 59 %). Only few Cochrane reviews explicitly assessed risk of bias on an outcome level (i.e. differentiating between subjective versus objective outcomes) (12/100, 12 %). Most Cochrane reviews (88/100, 88 %) performed one risk of bias assessment without making it clear whether this assessment concerned a single outcome, a group of outcomes or the trial as a whole. A similar pattern was seen for non-Cochrane reviews ( Table 5).
One third of the Cochrane reviews merged standard bias domains (37/100, 37 %), most often merging "performance bias" and "detection bias" into a single blinding bias domain (31/37, 84 %) (predominantly done in updates of reviews that had originally used the 2008 Table 2 Selected key points of major comments on the Cochrane risk of bias tool for randomized clinical trials: strengths, challenges and suggestions (Continued)

Challenges
• Bias domains (1.) • Implementation (2.) 1. "Some of the items that authors have included (such as sample size calculations and funding source) are explicitly discouraged in the Cochrane Handbook guidance. While there is evidence that some factors are empirically associated with effect estimates, such as single versus multicentre design, early stopping of trials and funding source [14][15][16], the extent to which these should be considered alongside the main bias domains is still a topic of debate." 2. "The main purpose of this evaluation was to identify potential problems with the RoB [risk of bias] that can be rectified, and we suspect that users who encountered problems are more likely to have responded. This speculation is based on the high proportion of respondents who reported having problems with some aspects of the 1. "The Cochrane Handbook states that because the ability to measure the true bias (or even the true risk of bias) is limited, then the possibility to validate a tool to assess that risk is also limited. Nevertheless, authors of Cochrane systematic reviews are required to use the Cochrane risk of bias tool." 2. "Assessing risk of bias was particularly difficult for the more subjective domains [i.e. 'selective outcome reporting' and 'other bias']."

Suggestions None mentioned
Major comments were defined as longer comments with a substantial reflection (typically ≥100 words of text) on the strengths or challenges of the Cochrane risk of bias tool for randomized clinical trials (i.e. the tool) a See Additional file 2 for references version of the tool in which the domains were merged (21/31, 68 %)). Approximately one fifth of the Cochrane reviews split a standard bias domain into separate subentities (18/100, 18 %), for example, blinding (within the performance bias domain) was split into blinding of personnel and blinding of patients or incomplete outcome data (i.e. attrition bias) was split into assessment of intention-to-treat and assessment of dropouts. Again, a similar pattern was seen for non-Cochrane reviews ( Table 5). A minority of Cochrane reviews added non-standard bias domains to the tool (11/100, 11 %). "Baseline imbalance" (6/11, 55 %) and "funding/conflicts of interest" (5/ 11, 45 %) were the most used. A similar pattern was found for non-Cochrane reviews ( Table 6). The majority of Cochrane reviews used the "other bias" domain option for the same purpose (73/100, 73 %). "Baseline imbalance" (33/73, 45 %) and "funding/conflicts of interest" (23/73, 32 %) were also the most used "other biases." Most non-Cochrane reviews that used the Cochrane tool included the "other bias" domain (17/31, 55 %), but none of the non-Cochrane reviews reported what specific items were considered as "other biases" (Table 6).
Very few of the randomized clinical trials included in the Cochrane reviews had all standard domains judged as "low" risk of bias (74 of 1242 trials, 6 %). Most had at least one standard domain judged as "unclear" risk of bias (407 of 1242 trials, 33 %) or as "high" risk of bias (761 of 1242 trials, 61 %). A similar pattern was found for the non-Cochrane reviews (Table 3).
Thus, only a few reviews could conduct sensitivity analyses based on overall risk of bias, e.g. the Cochrane reviews with at least one trial with all standard domains judged as "low" risk of bias and at least one trial with one bias domain judged as "high" risk of bias (26/100, 26 %) (or as "high"/"unclear" risk of bias (32/100, 32 %)). A similar pattern was found for the non-Cochrane reviews (Table 3).

Discussion
Published comments about the Cochrane risk of bias tool considered it to be an important step forward but If a trial had all standard domains (not including the "other bias" domain) judged as "low" risk of bias, we defined the trial as "low risk of bias" b If a trial had at least one of the standard domains (not including the "other bias" domain) judged as "unclear" risk of bias and no domains judged as "high" risk of bias, we defined the trial as "unclear risk of bias." The judgement of at least one standard risk of bias domain (not including the "other bias" domain) as "unclear" was found in 1103 of 1242 included randomized clinical trials (89 %) c If a trial had at least one of the six standard domains (not including the "other bias" domain) judged as "high" risk of bias, we defined the trial as "high risk of bias" d We only included systematic reviews with one or more randomized clinical trials included in their analyses e It was only possible to assess whether a trial was judged as "low," "unclear" or "high" risk of bias in 18 non-Cochrane reviews (which provided information on risk of bias judgements for all six standard domains (not including the "other bias" domain) for individual trials via a "risk of bias graph/summary" or "characteristics of studies" section) f The 18 non-Cochrane reviews included 424 randomized clinical trials in total highlighted some challenges including its omission of funding/conflicts of interest and its modest interagreement rates. Suggestions for improvement included more explicit guidelines and training options. The tool was used in 100 % of Cochrane reviews and in 31 % of non-Cochrane reviews in a sample published towards the end of 2014. Often the tool was implemented in a non-recommended way. Also, 70 % of Cochrane reviews planned to use the risk of bias assessment as basis for sensitivity analyses, but only 19 % of Cochrane reviews conducted such analyses, in many cases, because there were few trials with "low" risk of bias.

Strengths and weaknesses
We are not aware of other reviews of published comments on the Cochrane risk of bias tool. Our study complements previous studies of user experience [8] and inter-observer variance [9][10][11].
It is challenging to search for published comments as not all are indexed in standard databases. However, we focused on "major comments," which are more reliably identified. It is reasonable to assume that the threshold for publishing a comment pointing out a problem with the tool (and maybe suggesting an improvement) is lower than for publishing a comment praising the tool. Thus, we consider the qualitative summary of the expressed themes as more interesting than the quantitative distribution of the themes. The analyses of how the tool was used were based on samples of representative and contemporary Cochrane and non-Cochrane reviews, enabling both a description and comparison between the two types of reviews.  All subgroup analyses were based on "low" versus "high" risk of bias b "Insufficient data" was due to few trials included in the review or few trials judged as "low risk of bias" c 15 non-Cochrane reviews made their own risk of bias construct/tool, eight incorporated two constructs/tools and the following constructs/tools (/methods) were used 18 times in total: CASP (×2), CEBM, Chalmers, CONSORT (×2), CTAM, Downs and Black criteria (×2), Evidence-based medicine toolkit, GRADE (×2), Methods Guide for Effectiveness and Comparative Effectiveness Reviews, MOOSE (×2), Newcastle Ottawa, QUOROM and STROBE d 31 of 100 non-Cochrane reviews used the Cochrane risk of bias tool for randomized clinical trials (i.e. the tool) and were compared to the 100 Cochrane reviews that used the tool for randomized clinical trials Other similar studies Based on feedback from focus groups and an online survey, Savović and colleagues concluded that users of the Cochrane tool identified positive experiences and perceptions of the tool and that revisions and associated guidance as well as improved provision of training may improve implementation [8]. Several studies have analysed the assessment of risk of bias in systematic reviews [10][11][12][13][14][15]. Hartling and colleagues and Armijo-Olivo and colleagues concluded unsatisfactory agreement rates by users of the tool and suggested the need for more detailed guidance in assessing the risk of bias [9,15]. Comments made by the authors of all three studies are included in our study.
Hopewell and colleagues [16] studied assessment of risk of bias in Cochrane and non-Cochrane reviews indexed in The Database of Abstracts of Reviews of Effects (DARE) [17] and published in 2012. They reported that all reviews incorporated some kind of assessment of risk of bias, even though Cochrane reviews more often specified which tool was used. Also, the Cochrane tool was used more often in Cochrane reviews (and the Jadad scale was used less often). A low proportion of reviews incorporated sensitivity analyses based on risk of bias in their conclusion.
Our study confirms and expands on the findings of Hopewell and colleagues. We found that all 100 Cochrane reviews in our sample used the Cochrane risk of bias tool, but that only one in five Cochrane reviews conducted sensitivity analyses based on risk of bias assessments, despite the fact that seven in ten had planned to do so.

Mechanisms and implications
Based on the degree of implementation, the tool has proven successful. All Cochrane reviews and a fair proportion of non-Cochrane reviews used the tool in 2014. However, the tool is often used in ways not recommended.
Firstly, both Cochrane and non-Cochrane reviews implemented non-standard domains, either as fully new domains or incorporated into the "other bias" function. Approximately one in six Cochrane reviews added "intervention differed between groups" under "other Review has a singular risk of bias assessment despite more than one outcome included in the review. No review based its risk of bias assessment on a singular or primary outcome i.e. splits blinding of patients and care providers into blinding of personnel and blinding of patients or splits incomplete outcome data into assessment of intention-to-treat and assessment of dropouts. f 31 of 100 non-Cochrane reviews used the Cochrane risk of bias tool for randomized clinical trials (i.e. the tool) and were compared to the 100 Cochrane reviews that used the tool for randomized clinical trials bias," though this problem is intended to be addressed under "performance bias." Furthermore, a similar proportion of Cochrane reviews added "unclear reporting" under "other bias," although the tool specifically addresses conduct and not reporting (unclear reporting would normally result in contacting trial authors for clarification). Thus, there seems to be a widespread uncertainty as to the scope of what the tool seeks to evaluate. Adding bias domains and using the "other bias" option are primarily intended for special situations, for example, when assessing crossover trials. Thus, better guidance as to what is meant by "bias," "bias domain" and the basic purpose of the tool is warranted. Secondly, only a minority of reviews used the risk of bias assessments as a basis for sensitivity analyses. This problem seems to be a result of few trials having a "low" risk of bias, although sensitivity analyses may be based on "unclear" versus "high" risk of bias. Only 6 % of the trials included in our review sample had been classified as "low" risk of bias for all domains. It is unclear whether such a low proportion (also found by e.g. Hartling and colleagues [9] and Hopewell and colleagues [16]) is a fair reflection of the "true" risk of bias in trials or whether the tool as currently applied is too sensitive (or authors simply do not use all sources of information as recommended and possibly opt for "unclear" based on the published report). A better guideline on how to move from the level of individual bias domains to an overall risk of bias is warranted.
Thirdly, most reviews based their risk of bias assessment on a singular risk of bias assessment despite including more than one outcome and several reviews (mostly updates) merged "blinding of participants and personnel" and "blinding of outcome assessor" into a single blinding bias domain. The latter was recommended in the 2008 version of the tool, but not in the updated 2011 version [18]. Hopefully, the merging of blinding associated bias domains will be addressed when the reviews in question are updated (again).
Fourthly, risk of bias is very often assessed based on incomplete or missing information. The judgement of at least one risk of bias domain as "unclear" was found in 1103 of 1242 included randomized clinical trials (89 %). Though "unclear" may be a reasonable option in some trials, this large proportion is a considerable problem. In many cases, the uncertainty can be resolved by contacting trial authors (who are often able to provide the information) or by searching publicly available trial registers. Table 6 Use of additional non-standard domains and the "other bias" domain in the Cochrane and non-Cochrane reviews that applied the Cochrane risk of bias tool for randomized clinical trials Use of additional domains and "other bias" 100 Cochrane reviews (100 %) 31 non-Cochrane reviews (100 %) c P value*

Additional domains
Any additional domain(s) 11 (11 %) 6 (19 %) 0.37 -Adds "baseline imbalance" 6 of 11 (55 %) 2 of 6 (33 %) 1.00 -Adds "funding" or "conflicts of interest All of the following other additional domains appeared ones in review samples: Cochrane reviews: "co-interventions avoided or similar," "confounding variables," "definition of incomplete response," "definition of local recurrence," "method of follow up" and "size"; non-Cochrane reviews: "co-intervention" and "double blinding" b "Other bias"-comments were interpreted and categorised (e.g. the "other bias" comment "There were baseline differences between groups." was categorised as "baseline imbalance"). The five most used "other bias"-categories are listed c 31 of 100 non-Cochrane reviews used the Cochrane risk of bias tool for randomized clinical trials (i.e. the tool) and were compared to the 100 Cochrane reviews that used the tool for randomized clinical trials Occasionally, one may access trial protocols, internal company study reports or reports by drug regulation agencies (such as the United States' Food & Drug Administration) to facilitate better risk of bias judgements [19]. Improved guidelines on how to access and acquire the relevant information for assessing risk of bias are warranted. Furthermore, low inter-rater agreement rates for risk of bias assessors are a potential problem for users of systematic reviews. Readers may consider whether a review's conclusion would have been different if other reviewers had assessed the risk of bias in the included trials. It is prudent to check the risk of bias assessments in a review. Fortunately, the tool has a configuration that facilitates such checking. Studies assessing between-rater agreement for complex assessment procedures often have modest agreement rates [20], which in some cases may be improved with training [21]. The Cochrane tool is no exception. Disagreement seems to occur when terminology is used inconsistently (e.g. for blinding [22]), when judgements are based on insufficient information or when the intervention is more complex (e.g. in nonpharmacological trials [9]). In addition, reviewers often encounter problems when assessing the domains "incomplete outcome data" and "selective outcome reporting" [8]. Clarified terminology, revised structure, better training options and guidance will hopefully improve agreement rates. It will be interesting to read the result of a forthcoming study on the impact of training [23].
Funding/conflicts of interest is also a challenge for the tool. It is widely believed that industry funding and other conflicts of interest are associated with higher estimates of treatment effects in randomized trials [24]. It is more controversial whether this association is appropriately accounted for by adding "funding/conflicts of interest" as an independent bias domain. Adding a domain would go against the logic structure of the tool, which is based on core bias domains that reflect fundamental, independent bias mechanisms. An alternative option would be to address the issue within the existing bias domains (for example, under risk of selective outcome reporting), while paying careful attention to any clinical or methodological differences between industry funded and nonfunded trials, such as selection of control groups. The problem with the latter option is that detailed information on trial conduct is often missing. It is notable that 5 % of Cochrane reviews added funding as a separate domain and that 32 % incorporated funding into the "other bias" function. Clearly, more work is needed on this issue.
A general tension exists between bias in randomized trials as defined mechanistically in the tool, and as defined empirically based on results from meta-epidemiological studies. Several design features of randomized clinical trials have been reported in meta-epidemiological studies to be associated with exaggerated treatment effects, such as sample size [25], development country status [26], single centre status [27] and stopping a trial early [28]. The list of potential bias domains selected purely on empirical grounds will quickly become quite large and involve a risk of spurious inclusion of bias domains that are secondary in nature (and thus, in principle, explainable by the core bias domains). However, an open question is whether a pragmatic and careful selection of a few empirically defined bias domains that are simple to assess (such as sample size or single centre status) may act as proxy measures and supplement a risk of bias tool based on mechanistically defined core bias domains.

Conclusions
Based on published comments, the Cochrane tool for assessing risk of bias in randomized clinical trials is regarded as an important step forward but challenged by how to deal with the risk of bias associated with funding/conflicts of interest and modest inter-rater agreement. The tool is used in a very high proportion of Cochrane reviews and in many non-Cochrane reviews, but often in a non-recommended way, for example, by incorporating additional bias domains. The tool has become the standard approach to assess risk of bias in randomized clinical trials. Its implementation may be further improved by a revised structure, further research and more focused guidance.
revision of the article for important intellectual content and the final approval of the article. JH contributed to the conception of the study, the design of the study, the critical revision of the article for important intellectual content and the final approval of the article. AH contributed to the conception of the study, the design of the study, the analysis and interpretation of the data, the drafting of the article, the critical revision of the article for important intellectual content and the final approval of the article.