- Open Access
- Open Peer Review
Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews
Systematic Reviewsvolume 5, Article number: 80 (2016)
The Cochrane risk of bias tool for randomized clinical trials was introduced in 2008 and has frequently been commented on and used in systematic reviews. We wanted to evaluate the tool by reviewing published comments on its strengths and challenges and by describing and analysing how the tool is applied to both Cochrane and non-Cochrane systematic reviews.
A review of published comments (searches in PubMed, The Cochrane Methodology Register and Google Scholar) and an observational study (100 Cochrane and 100 non-Cochrane reviews from 2014).
Our review included 68 comments, 15 of which were categorised as major. The main strengths of the tool were considered to be its aim (to assess trial conduct and not reporting), its developmental basis (wide consultation, empirical and theoretical evidence) and its transparent procedures. The challenges of the tool were mainly considered to be its choice of core bias domains (e.g. not involving funding/conflicts of interest) and issues to do with implementation (i.e. modest inter-rater agreement) and terminology. Our observational study found that the tool was used in all Cochrane reviews (100/100) and was the preferred tool in non-Cochrane reviews (31/100). Both types of reviews frequently implemented the tool in non-recommended ways. Most Cochrane reviews planned to use risk of bias assessments as basis for sensitivity analyses (70 %), but only a minority conducted such analyses (19 %) because, in many cases, few trials were assessed as having “low” risk of bias for all standard domains (6 %). The judgement of at least one risk of bias domain as “unclear” was found in 89 % of included randomized clinical trials (1103/1242).
The Cochrane tool has become the standard approach to assess risk of bias in randomized clinical trials but is frequently implemented in a non-recommended way. Based on published comments and how it is applied in practice in systematic reviews, the tool may be further improved by a revised structure and more focused guidance.
Since the early 1990s, the number of published systematic reviews of randomized trials, both Cochrane and non-Cochrane reviews, has steadily increased. The ideal of taking a systematic approach to identify, summarise and analyse comparable clinical trials as a basis for therapeutic decisions has become more widespread, and systematic reviews have had a huge impact on clinical research and practice.
However, one obstacle to the usefulness of a systematic review is the possibility that some of the included trials are biased due to flaws in their design, conduct, analysis or reporting. A meta-analysis of biased effect estimates will likely produce a biased pooled analysis with increased precision and greater credibility. Thus, for authors of a systematic review, it is paramount to adequately address the risk of bias in the included trials .
For this purpose, the Cochrane tool for assessing risk of bias in randomized clinical trials (i.e. the tool) was released in 2008 and updated in 2011. The tool is based on seven bias domains: sequence generation and allocation concealment (both within the domain of selection bias or allocation bias), blinding of participants and personnel (performance bias), blinding of outcome assessors (detection bias), incomplete outcome data (attrition bias), selective reporting (reporting bias) and an auxiliary domain: “other bias.” For each bias domain, the tool urges users to assign a judgement of “high,” “low” or “unclear” risk of bias and to document the basis for their judgements (e.g. with verbatim quotes). The bias domains of the tool were selected with the intention to cover all fundamental bias mechanisms in randomized trials .
Several years have passed since the release of the first version of the tool. Over this period, the tool has been used in numerous systematic reviews, the scientific debate on risk of bias has proceeded (for example, reflecting on the role of source of funding [3–6] or other “meta-biases” ) and research publications have analysed user experience  and inter-agreement rates [9–11]. Additionally, a complementary tool for assessing non-randomized trials has been developed .
Researchers from the original development team and members of the Cochrane Bias Methods Group are planning a revision of the tool. To evaluate the tool and to provide a better basis for the revision, we intended (1) to identify, summarise and analyse published comments on the strengths and challenges of the tool and (2) to describe and analyse how the tool is used in both Cochrane and non-Cochrane reviews.
This study involved a review of published comments on the Cochrane tool for assessing risk of bias in randomized clinical trials and an observational study of how the tool is used in systematic reviews (please refer to Additional file 1 for the study’s PRISMA checklist).
Review of published comments
We sought publications that explicitly commented on the tool. We defined “major comments” as longer comments with a substantial reflection (typically ≥100 words of text) on the strengths or weaknesses of the tool, for example, in the form of an editorial. We also included “minor comments,” which we defined as shorter comments without a substantial reflection (typically <100 words of text) on the strengths or weaknesses of the tool, for example, in the form of minor elements of a discussion in a publication. We excluded “peripheral remarks” on the tool, which we defined as remarks that were implicit or short and tangential. If an author had several publications included with similar comment contents, only the publication with the most detailed comment was considered major.
We searched PubMed, The Cochrane Methodology Register and Google Scholar for publications from the start of 2008 to the end of 2014. No language restriction was applied, and Google Translate was used for non-familiar languages. The search strategy was developed iteratively (see Additional file 2).
One author (LJ) decided on inclusion of publications and categorised them as “major comments” and “minor comments” (and “peripheral remarks”). A second author (AS) checked the categorisation. Two authors (LJ and AS) extracted data independently. Any disagreements were solved by discussion and by consulting a third author (DL or AH).
The following information was extracted: publication year, publication type, tool version considered (i.e. 2008 or 2011) and the exact wording of the comment.
Comments from the included publications were categorised according to whether they expressed “strengths,” “challenges” or “suggestions” and summarised into broader themes (each addressing a similar type of topic). We noted the numerical distribution of comparable comments, but our main intention was a qualitative mapping of the themes addressed and a categorisation according to whether they addressed a core design feature of the tool or an issue related to implementation.
Observational study of how the tool is used in systematic reviews
One author (DL) identified 100 Cochrane reviews (or Cochrane review updates) from PubMed in reverse chronological order from 31.12.2014 until 20.11.2014 (see Additional file 2). The same author manually identified 100 non-Cochrane reviews from PubMed in reverse chronological order from 31.12.2014 until 22.12.2014. A second author (AS) checked the inclusion. We defined a non-Cochrane review as a self-declared systematic review with at least one included randomized clinical trial. We excluded any non-Cochrane review that was also published as a Cochrane review.
Three authors (AS, DL and LJ) extracted data independently: intervention type (pharmacological or non-pharmacological); inclusion of meta-analyses; number of trials and how many trials were categorised as “high,” “unclear” and “low” risk of bias; the method used for judging risk of bias (or quality) and how it was implemented; the type and frequency of both standard and non-standard domain use; the use of merging or splitting of standard domains (e.g. merging blinding domains or splitting for different outcomes); the use of the “other bias” domain; how risk of bias assessments were incorporated into statistical analysis using sensitivity analyses; whether risk of bias judgements were explicitly mentioned in the abstract, discussion or conclusion; and whether The Grading of Recommendations Assessment, Development and Evaluation (short GRADE) had been incorporated. We compared differences in proportions between Cochrane and non-Cochrane reviews using Fisher’s exact test. In cases where Cochrane or non-Cochrane reviews included both randomized clinical trials and non-randomized clinical trials, we disregarded the non-randomized trials.
Review of published comments
The strengths of the tool were addressed in five “major comments” relating to three themes: aims, developmental basis and transparency. The comments praised the tool for aiming to assess conduct (and not reporting), being based on theoretical and empirical evidence and on broad consultation and facilitating transparent assessment of bias.
The challenges of the tool were addressed in 15 “major comments” relating to four themes: choice of the core bias domains, implementation, overall risk of bias and special situations. The comments on choice of core bias domains expressed concern whether the chosen domains comprehensively address all threats to validity (for example, five comments reflected on including funding as an independent bias domain). Comments on implementation pointed to difficulties in the subjective interpretation of the tool and expressed concerns about modest inter-observer agreement, difficulty in assessing selective reporting of outcomes, terminological ambiguity (i.e. of the terms subjective/objective) and the low proportion of reviews using risk of bias assessments as a basis for sensitivity analyses. The comments on overall risk of bias expressed concern about the challenges in assigning an overall risk of bias to a trial based on risk of bias of single domains to the trial as such. A single comment regarded the special situation where the tool was used to assess risk of bias based on clinical study reports (and not clinical trial publications).
Specific suggestions to improve the tool were included in nine “major comments” relating to three themes: improved guidelines, further research and the inclusion of funding as a bias domain. The comments on guidelines suggested that updated and improved guidance and more training options for users were needed. The comments on research suggested further methodological research (for example, blind versus non-blind risk of bias assessments). The comments on funding suggested that funding/conflicts of interest should be incorporated into the tool as a specific bias domain.
All themes addressed in the “major comments” were represented in the “minor comments” (see Additional file 2). Additional themes addressed only in the “minor comments” included graphical representation, external validity and non-randomized designs. Specifically, (i) one comment praised the tool for its graphical representation of risk of bias assessments, (ii) one comment criticised that the tool does not address external validity (and only focuses on internal validity) and (iii) one comment noted that non-randomized trials should be included in Cochrane reviews and should be addressed in risk of bias assessments. The latter two suggestions are inconsistent with the aim of the tool, which is to assess only bias (i.e. internal validity) in randomized clinical trials. Such comments help to unveil the assumptions and basic structure of the tool but would be difficult to implement without significantly changing the tool.
Other comments reflected concerns about the implementation of the tool. An example is the suggestion for improved guidelines for how to assess selective outcome reporting. Also, improved training options and more detailed guidelines aimed to improve agreement rates address the implementation of the tool. Such suggestions are easier to implement while keeping the fundamental structure of the tool intact.
Analysis of user patterns in systematic reviews
All Cochrane reviews assessed risk of bias using the Cochrane risk of bias tool (100/100, 100 %) (Tables 3 and 4). Most of the non-Cochrane reviews assessed risk of bias (80/100, 80 %), with the Cochrane tool being the most frequently used (31/80, 39 %). Other tools and scales used to assess risk of bias included the Jadad Quality Assessment Scale (19/80, 24 %)  and the Physiotherapy Evidence Database (short PEDro) scale (5/80, 6 %)  (Table 4).
The majority of Cochrane reviews included one or more meta-analyses (85/100, 85 %). According to the information reported in their methods section, most of the Cochrane reviews had planned to perform sensitivity analyses based on risk of bias (70/100, 70 %). One fifth of the Cochrane reviews reported to have performed sensitivity analyses (19/100, 19 %). Few reviews based sensitivity analyses on an overall risk of bias (2/19, 11 %). Most reviews based sensitivity analyses on individual bias domains (9/19, 47 %) or did not state what sensitivity analyses were based on (8/19, 42 %). The majority of the Cochrane reviews who did not conduct the planned analyses reported that the lack thereof was due to insufficient data (41/50, 82 %), either because there were few trials included in the review or few trials with “low” risk of bias. The remaining reviews did not explain why they did not perform the planned analyses (9/50, 18 %) (Tables 3 and 4).
One tenth of the non-Cochrane reviews that had any risk of bias assessment reported plans for sensitivity analyses based on risk of bias assessments (8/80, 10 %). One in seven of all the non-Cochrane reviews reported to have performed sensitivity analyses based on risk of bias or quality assessments (11/80, 14 %). In nine reviews, the sensitivity analyses were based on an overall risk of bias (9/11, 82 %) (Table 4).
Two Cochrane reviews performed subgroup analyses (both with “low” versus “high” risk of bias) (2/100, 2 %). None of the non-Cochrane reviews performed subgroup analyses based on risk of bias.
Most Cochrane reviews explicitly commented on risk of bias assessments in the discussion and/or conclusion (89/100, 89 %), although fewer incorporated this information into the abstract (80/100, 80 %). Most of the non-Cochrane reviews that applied the Cochrane tool and some of the non-Cochrane reviews that applied non-Cochrane tools explicitly commented on risk of bias assessments in the discussion and/or conclusion (Cochrane tool: 25/31, 81 %; non-Cochrane tools: 12/49, 24 %) and more than half incorporated this information into the abstract (Cochrane tool: 18/31, 58 %; non-Cochrane tools: 30/49, 61 %). No significant differences were found between the non-Cochrane reviews that used the Cochrane tool versus the non-Cochrane reviews that used other risk of bias tools when comparing the use of risk of bias results in the abstract and discussion/conclusion.
The majority of Cochrane reviews (64/100, 64 %) and few non-Cochrane reviews (4/80, 5 %) incorporated GRADE in their overall assessment of confidence in the results (Table 4).
The majority of Cochrane reviews applied all standard domains (59/100, 59 %). Only few Cochrane reviews explicitly assessed risk of bias on an outcome level (i.e. differentiating between subjective versus objective outcomes) (12/100, 12 %). Most Cochrane reviews (88/100, 88 %) performed one risk of bias assessment without making it clear whether this assessment concerned a single outcome, a group of outcomes or the trial as a whole. A similar pattern was seen for non-Cochrane reviews (Table 5).
One third of the Cochrane reviews merged standard bias domains (37/100, 37 %), most often merging “performance bias” and “detection bias” into a single blinding bias domain (31/37, 84 %) (predominantly done in updates of reviews that had originally used the 2008 version of the tool in which the domains were merged (21/31, 68 %)). Approximately one fifth of the Cochrane reviews split a standard bias domain into separate sub-entities (18/100, 18 %), for example, blinding (within the performance bias domain) was split into blinding of personnel and blinding of patients or incomplete outcome data (i.e. attrition bias) was split into assessment of intention-to-treat and assessment of dropouts. Again, a similar pattern was seen for non-Cochrane reviews (Table 5).
A minority of Cochrane reviews added non-standard bias domains to the tool (11/100, 11 %). “Baseline imbalance” (6/11, 55 %) and “funding/conflicts of interest” (5/11, 45 %) were the most used. A similar pattern was found for non-Cochrane reviews (Table 6). The majority of Cochrane reviews used the “other bias” domain option for the same purpose (73/100, 73 %). “Baseline imbalance” (33/73, 45 %) and “funding/conflicts of interest” (23/73, 32 %) were also the most used “other biases.” Most non-Cochrane reviews that used the Cochrane tool included the “other bias” domain (17/31, 55 %), but none of the non-Cochrane reviews reported what specific items were considered as “other biases” (Table 6).
Very few of the randomized clinical trials included in the Cochrane reviews had all standard domains judged as “low” risk of bias (74 of 1242 trials, 6 %). Most had at least one standard domain judged as “unclear” risk of bias (407 of 1242 trials, 33 %) or as “high” risk of bias (761 of 1242 trials, 61 %). A similar pattern was found for the non-Cochrane reviews (Table 3).
Thus, only a few reviews could conduct sensitivity analyses based on overall risk of bias, e.g. the Cochrane reviews with at least one trial with all standard domains judged as “low” risk of bias and at least one trial with one bias domain judged as “high” risk of bias (26/100, 26 %) (or as “high”/“unclear” risk of bias (32/100, 32 %)). A similar pattern was found for the non-Cochrane reviews (Table 3).
Published comments about the Cochrane risk of bias tool considered it to be an important step forward but highlighted some challenges including its omission of funding/conflicts of interest and its modest inter-agreement rates. Suggestions for improvement included more explicit guidelines and training options. The tool was used in 100 % of Cochrane reviews and in 31 % of non-Cochrane reviews in a sample published towards the end of 2014. Often the tool was implemented in a non-recommended way. Also, 70 % of Cochrane reviews planned to use the risk of bias assessment as basis for sensitivity analyses, but only 19 % of Cochrane reviews conducted such analyses, in many cases, because there were few trials with “low” risk of bias.
Strengths and weaknesses
It is challenging to search for published comments as not all are indexed in standard databases. However, we focused on “major comments,” which are more reliably identified. It is reasonable to assume that the threshold for publishing a comment pointing out a problem with the tool (and maybe suggesting an improvement) is lower than for publishing a comment praising the tool. Thus, we consider the qualitative summary of the expressed themes as more interesting than the quantitative distribution of the themes. The analyses of how the tool was used were based on samples of representative and contemporary Cochrane and non-Cochrane reviews, enabling both a description and comparison between the two types of reviews.
Other similar studies
Based on feedback from focus groups and an online survey, Savović and colleagues concluded that users of the Cochrane tool identified positive experiences and perceptions of the tool and that revisions and associated guidance as well as improved provision of training may improve implementation . Several studies have analysed the assessment of risk of bias in systematic reviews [10–15]. Hartling and colleagues and Armijo-Olivo and colleagues concluded unsatisfactory agreement rates by users of the tool and suggested the need for more detailed guidance in assessing the risk of bias [9, 15]. Comments made by the authors of all three studies are included in our study.
Hopewell and colleagues  studied assessment of risk of bias in Cochrane and non-Cochrane reviews indexed in The Database of Abstracts of Reviews of Effects (DARE)  and published in 2012. They reported that all reviews incorporated some kind of assessment of risk of bias, even though Cochrane reviews more often specified which tool was used. Also, the Cochrane tool was used more often in Cochrane reviews (and the Jadad scale was used less often). A low proportion of reviews incorporated sensitivity analyses based on risk of bias in their conclusion.
Our study confirms and expands on the findings of Hopewell and colleagues. We found that all 100 Cochrane reviews in our sample used the Cochrane risk of bias tool, but that only one in five Cochrane reviews conducted sensitivity analyses based on risk of bias assessments, despite the fact that seven in ten had planned to do so.
Mechanisms and implications
Based on the degree of implementation, the tool has proven successful. All Cochrane reviews and a fair proportion of non-Cochrane reviews used the tool in 2014. However, the tool is often used in ways not recommended.
Firstly, both Cochrane and non-Cochrane reviews implemented non-standard domains, either as fully new domains or incorporated into the “other bias” function. Approximately one in six Cochrane reviews added “intervention differed between groups” under “other bias,” though this problem is intended to be addressed under “performance bias.” Furthermore, a similar proportion of Cochrane reviews added “unclear reporting” under “other bias,” although the tool specifically addresses conduct and not reporting (unclear reporting would normally result in contacting trial authors for clarification). Thus, there seems to be a widespread uncertainty as to the scope of what the tool seeks to evaluate. Adding bias domains and using the “other bias” option are primarily intended for special situations, for example, when assessing crossover trials. Thus, better guidance as to what is meant by “bias,” “bias domain” and the basic purpose of the tool is warranted.
Secondly, only a minority of reviews used the risk of bias assessments as a basis for sensitivity analyses. This problem seems to be a result of few trials having a “low” risk of bias, although sensitivity analyses may be based on “unclear” versus “high” risk of bias. Only 6 % of the trials included in our review sample had been classified as “low” risk of bias for all domains. It is unclear whether such a low proportion (also found by e.g. Hartling and colleagues  and Hopewell and colleagues ) is a fair reflection of the “true” risk of bias in trials or whether the tool as currently applied is too sensitive (or authors simply do not use all sources of information as recommended and possibly opt for “unclear” based on the published report). A better guideline on how to move from the level of individual bias domains to an overall risk of bias is warranted.
Thirdly, most reviews based their risk of bias assessment on a singular risk of bias assessment despite including more than one outcome and several reviews (mostly updates) merged “blinding of participants and personnel” and “blinding of outcome assessor” into a single blinding bias domain. The latter was recommended in the 2008 version of the tool, but not in the updated 2011 version . Hopefully, the merging of blinding associated bias domains will be addressed when the reviews in question are updated (again).
Fourthly, risk of bias is very often assessed based on incomplete or missing information. The judgement of at least one risk of bias domain as “unclear” was found in 1103 of 1242 included randomized clinical trials (89 %). Though “unclear” may be a reasonable option in some trials, this large proportion is a considerable problem. In many cases, the uncertainty can be resolved by contacting trial authors (who are often able to provide the information) or by searching publicly available trial registers. Occasionally, one may access trial protocols, internal company study reports or reports by drug regulation agencies (such as the United States’ Food & Drug Administration) to facilitate better risk of bias judgements . Improved guidelines on how to access and acquire the relevant information for assessing risk of bias are warranted.
Furthermore, low inter-rater agreement rates for risk of bias assessors are a potential problem for users of systematic reviews. Readers may consider whether a review’s conclusion would have been different if other reviewers had assessed the risk of bias in the included trials. It is prudent to check the risk of bias assessments in a review. Fortunately, the tool has a configuration that facilitates such checking. Studies assessing between-rater agreement for complex assessment procedures often have modest agreement rates , which in some cases may be improved with training . The Cochrane tool is no exception. Disagreement seems to occur when terminology is used inconsistently (e.g. for blinding ), when judgements are based on insufficient information or when the intervention is more complex (e.g. in non-pharmacological trials ). In addition, reviewers often encounter problems when assessing the domains “incomplete outcome data” and “selective outcome reporting” . Clarified terminology, revised structure, better training options and guidance will hopefully improve agreement rates. It will be interesting to read the result of a forthcoming study on the impact of training .
Funding/conflicts of interest is also a challenge for the tool. It is widely believed that industry funding and other conflicts of interest are associated with higher estimates of treatment effects in randomized trials . It is more controversial whether this association is appropriately accounted for by adding “funding/conflicts of interest” as an independent bias domain. Adding a domain would go against the logic structure of the tool, which is based on core bias domains that reflect fundamental, independent bias mechanisms. An alternative option would be to address the issue within the existing bias domains (for example, under risk of selective outcome reporting), while paying careful attention to any clinical or methodological differences between industry funded and non-funded trials, such as selection of control groups. The problem with the latter option is that detailed information on trial conduct is often missing. It is notable that 5 % of Cochrane reviews added funding as a separate domain and that 32 % incorporated funding into the “other bias” function. Clearly, more work is needed on this issue.
A general tension exists between bias in randomized trials as defined mechanistically in the tool, and as defined empirically based on results from meta-epidemiological studies. Several design features of randomized clinical trials have been reported in meta-epidemiological studies to be associated with exaggerated treatment effects, such as sample size , development country status , single centre status  and stopping a trial early . The list of potential bias domains selected purely on empirical grounds will quickly become quite large and involve a risk of spurious inclusion of bias domains that are secondary in nature (and thus, in principle, explainable by the core bias domains). However, an open question is whether a pragmatic and careful selection of a few empirically defined bias domains that are simple to assess (such as sample size or single centre status) may act as proxy measures and supplement a risk of bias tool based on mechanistically defined core bias domains.
Based on published comments, the Cochrane tool for assessing risk of bias in randomized clinical trials is regarded as an important step forward but challenged by how to deal with the risk of bias associated with funding/conflicts of interest and modest inter-rater agreement. The tool is used in a very high proportion of Cochrane reviews and in many non-Cochrane reviews, but often in a non-recommended way, for example, by incorporating additional bias domains. The tool has become the standard approach to assess risk of bias in randomized clinical trials. Its implementation may be further improved by a revised structure, further research and more focused guidance.
The Database of Abstracts of Reviews of Effects
The Grading of Recommendations Assessment, Development and Evaluation
Physiotherapy Evidence Database
Hróbjartsson A, Boutron I, Turner L, Altman DG, Moher D. Assessing risk of bias in randomised clinical trials included in Cochrane Reviews: the why is easy, the how is a challenge. Cochrane Database Syst Rev. 2013;4:ED000058.
Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.
Bero LA. Why the Cochrane risk of bias tool should include funding source as a standard item. Cochrane Database Syst Rev. 2013;12:ED000075.
Sterne JAC. Why the Cochrane risk of bias tool should not include funding source as a standard item. Cochrane Database Syst Rev. 2013;12:ED000076.
Lundh A, Sismondo S, Lexchin J, Busuioc OA, Bero L. Industry sponsorship and research outcome. Cochrane Database Syst Rev. 2012;12:MR000033.
Roseman M, Turner EH, Lexchin J, Coyne JC, Bero LA, Thombs BD. Reporting of conflicts of interest from drug trials in Cochrane reviews: cross sectional study. BMJ. 2012;345:e5155.
Goodman S, Dickersin K. Metabias: a challenge for comparative effectiveness research. Ann Intern Med. 2011;155(1):61–2.
Savović J, Weeks L, Sterne JAC, Turner L, Altman DG, Moher D, et al. Evaluation of the Cochrane Collaboration’s tool for assessing the risk of bias in randomized trials: focus groups, online survey, proposed recommendations and their implementation. Syst Rev. 2014;3:37.
Hartling L, Hamm MP, Milne A, Vandermeer B, Santaguida PL, Ansari M, et al. Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. J Clin Epidemiol. 2013;66(9):973–81.
Hartling L, Bond K, Vandermeer B, Seida J, Dryden DM, Rowe BH. Applying the risk of bias tool in a systematic review of combination long-acting beta-agonists and inhaled corticosteroids for persistent asthma. PLoS One. 2011;6(2):e17242.
Hartling L, Ospina M, Liang Y, Dryden DM, Hooton N, Krebs Seida J, et al. Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ. 2009;339:b4012.
Sterne JAC, Higgins JPT, Reeves BC. On behalf of the development group for ACROBAT-NRSI. A Cochrane risk of bias assessment tool: for non-randomized studies of interventions (ACROBAT-NRSI), Version 1.0.0, 24 September 2014. Available from http://www.riskofbias.info. Accessed 20 Jan 2015.
Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996;17(1):1–12.
The Physiotherapy Evidence Database (PEDro) Scale. Available from: http://www.pedro.org.au/english/downloads/pedro-scale/. Accessed 20 Jan 2015.
Armijo-Olivo S, Ospina M, da Costa BR, Egger M, Saltaji H, Fuentes J, et al. Poor reliability between Cochrane reviewers and blinded external reviewers when applying the Cochrane risk of bias tool in physical therapy trials. PLoS One. 2014;9(5):e96920.
Hopewell S, Boutron I, Altman DG, Ravaud P. Incorporation of assessments of risk of bias of primary studies in systematic reviews of randomised trials: a cross-sectional study. BMJ Open. 2013;3(8):e003342.
The Database of Abstracts of Reviews of Effects (DARE). Available from: http://www.crd.york.ac.uk/CRDWeb/. Accessed 20 Jan 2015.
The Cochrane Handbook. Available from: http://handbook.cochrane.org/. Accessed 20 Jan 2015.
Jefferson T, Jones MA, Doshi P, Del Mar CB, Hama R, Thompson MJ, et al. Risk of bias in industry-funded oseltamivir trials: comparison of core reports versus full clinical study reports. BMJ Open. 2014;4(9):e005253.
Brorson S, Hróbjartsson A. Training improves agreement among doctors using the Neer system for proximal humeral fractures in a systematic review. J Clin Epidemiol. 2008;61(1):7–16.
Brorson S, Bagger J, Sylvest A, Hróbjartsson A. Improved interobserver variation after training of doctors in the Neer system. A randomised trial. J Bone Joint Surg (Br). 2002;84(7):950–4.
Haahr MT, Hróbjartsson A. Who is blinded in randomized clinical trials? A study of 200 trials and a survey of authors. Clin Trials Lond Engl. 2006;3(4):360–5.
da Costa BR, Resta NM, Beckett B, Israel-Stahre N, Diaz A, Johnston BC, et al. Effect of standardized training on the reliability of the Cochrane risk of bias assessment tool: a study protocol. Syst Rev. 2014;3(1):144.
Bero L. Industry sponsorship and research outcome: a Cochrane review. JAMA Intern Med. 2013;173(7):580–1.
Dechartres A, Trinquart L, Boutron I, Ravaud P. Influence of trial sample size on treatment effect estimates: meta-epidemiological study. BMJ. 2013;346:f2304.
Panagiotou OA, Contopoulos-Ioannidis DG, Ioannidis JPA. Comparative effect sizes in randomised trials from less developed and more developed countries: meta-epidemiological assessment. BMJ. 2013;346:f707.
Dechartres A, Boutron I, Trinquart L, Charles P, Ravaud P. Single-center trials show larger treatment effects than multicenter trials: evidence from a meta-epidemiologic study. Ann Intern Med. 2011;155(1):39–51.
Bassler D, Briel M, Montori VM, Lane M, Glasziou P, Zhou Q, et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA. 2010;303(12):1180–7.
LJ would like to thank Allison E. Crank for her assistance in editing the manuscript.
The study received no funding or grant other than standard salary to the data collectors (LJ, AS and DL) provided by The Nordic Cochrane Centre (Rigshospitalet, Copenhagen). The National Institute supports JS for Health Research Collaboration for Leadership in Applied Health Research and Care West (NIHR CLAHRC West). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
All authors are affiliated with the Cochrane Collaboration. JH, JS, IB, JACS and AH have comments included in our review of published comments. We have no further conflicts of interest to declare.
LJ contributed to the design of the study, the collection and assembly of data, the analysis and interpretation of the data, the drafting of the article, the critical revision of the article for important intellectual content and the final approval of the article. AS contributed to the design of the study, the collection and assembly of data, the analysis and interpretation of the data, the critical revision of the article for important intellectual content and the final approval of the article. DL contributed to the design of the study, the collection and assembly of data, the critical revision of the article for important intellectual content and the final approval of the article. JS contributed to the conception of the study, the critical revision of the article for important intellectual content and the final approval of the article. IB contributed to the conception of the study, the critical revision of the article for important intellectual content and the final approval of the article. JACS contributed to the conception of the study, the critical revision of the article for important intellectual content and the final approval of the article. JH contributed to the conception of the study, the design of the study, the critical revision of the article for important intellectual content and the final approval of the article. AH contributed to the conception of the study, the design of the study, the analysis and interpretation of the data, the drafting of the article, the critical revision of the article for important intellectual content and the final approval of the article.