Evaluation of the Cochrane Collaboration’s tool for assessing the risk of bias in randomized trials: focus groups, online survey, proposed recommendations and their implementation

Background In 2008, the Cochrane Collaboration introduced a tool for assessing the risk of bias in clinical trials included in Cochrane reviews. The risk of bias (RoB) tool is based on narrative descriptions of evidence-based methodological features known to increase the risk of bias in trials. Methods To assess the usability of this tool, we conducted an evaluation by means of focus groups, online surveys and a face-to-face meeting. We obtained feedback from a range of stakeholders within The Cochrane Collaboration regarding their experiences with, and perceptions of, the RoB tool and associated guidance materials. We then assessed this feedback in a face-to-face meeting of experts and stakeholders and made recommendations for improvements and further developments of the RoB tool. Results The survey attracted 380 responses. Respondents reported taking an average of between 10 and 60 minutes per study to complete their RoB assessments, which 83% deemed acceptable. Most respondents (87% of authors and 95% of editorial staff) thought RoB assessments were an improvement over past approaches to trial quality assessment. Most authors liked the standardized approach (81%) and the ability to provide quotes to support judgements (74%). A third of participants disliked the increased workload and found the wording describing RoB judgements confusing. The RoB domains reported to be the most difficult to assess were incomplete outcome data and selective reporting of outcomes. Authors expressed the need for more guidance on how to incorporate RoB assessments into meta-analyses and review conclusions. Based on this evaluation, recommendations were made for improvements to the RoB tool and the associated guidance. The implementation of these recommendations is currently underway. Conclusions Overall, respondents identified positive experiences and perceptions of the RoB tool. Revisions of the tool and associated guidance made in response to this evaluation, and improved provision of training, may improve implementation.


Background
Systematic reviews of randomized trials provide the best evidence about the effects of healthcare interventions. Nevertheless, randomized trials are not immune from bias. There is good empirical evidence [1][2][3] that flaws in particular aspects of trial conduct may lead to biased intervention effect estimates, which will then bias results of systematic reviews that aim to collate and synthesize all studies meeting pre-specified eligibility criteria. It is therefore important, in order to minimize bias in the conclusions of a systematic review, to consider potential limitations of each eligible study.
Systematic reviews produced by The Cochrane Collaboration have previously used a variety of methods to assess methodological quality of included trials [4]. There was no consistency between approaches recommended by different Cochrane Review Groups, most of the approaches were not evidence-based and many review groups used methods based on numerical scores, which have been shown to be inadequate [4,5]. In 2005, The Cochrane Collaboration's Methods Groups initiated the development of a new strategy for addressing the quality of randomized trials. This project commenced with a 3-day meeting of statisticians, epidemiologists and review authors, held in Cambridge, UK, following which designated pairs of individuals wrote the first draft of different components of the tool. In brief, the Cochrane risk of bias (RoB) tool involves assessment of the risk of bias arising from each of six domains (generation of the allocation sequence, concealment of the allocation sequence, blinding, incomplete outcome data, selective outcome reporting and other biases). In contrast to previous approaches, this tool elicited judgements for the domain-level risk of bias, supported by narrative explanation of evidence-based methodological features known to increase the risk of bias in trials. The narrative description can include quotes from the papers that authors have used to inform their judgements. Another novel feature of the tool was that figures can be generated to display the RoB judgements graphically across included studies. The original version of the RoB tool (Table 1) was first published in 2008 in the Cochrane Handbook for Systematic Reviews of Interventions [6] and implemented in the Collaboration's review-writing software, RevMan [7]. An updated version was published in 2011 [8,9].
In this paper, we describe the results of an evaluation of the initial version of The Cochrane Collaboration's RoB tool following its launch in 2008, the resulting recommendations for amendments and current progress in their implementation. Objectives of the evaluation were to: 1) assess the usability of the tool; 2) assess the acceptability of the resources needed to use the tool; 3) identify areas authors are finding difficult to implement; and 4) identify additional training requirements.

Methods
The evaluation of the RoB tool was initiated in early 2009. A planning meeting, comprising the organizing committee and other Cochrane contributors with relevant expertise and/or experience, including editors and other editorial office staff, was held during the 17th annual Cochrane Colloquium in Singapore in October 2009. The evaluation consisted of three stages.
First, a series of focus groups was held with a main goal of guiding the development of a questionnaire that would be subsequently used to survey stakeholders within The Cochrane Collaboration. Participants were invited to take part in focus groups via emails sent to a Cochrane Collaboration mailing list (CC-Info) and the focus groups were also listed in the program on the 17th Cochrane Colloquium website. Four 90-minute focus groups were held: one via teleconference and three in person during the Colloquium. The discussions were semi-structured and open-ended and were facilitated by one team member (DM, JACS, JS or LW). Questions focused on experiences with the RoB tool, perceptions about the level of difficulty in using the tool and in summarizing RoB assessments at different levels, confidence in RoB assessments and perspectives regarding the sufficiency and adequacy of available training materials, or reasons for non-use of the tool. The discussions were recorded and transcribed. Transcripts were coded using basic content analysis to identify questionnaire items and appropriate response categories.
Analysis of transcripts from the focus groups, together with the expertise of investigators and project staff, guided the development of three online questionnaires aimed at: 1) review authors who had used the tool; 2) review authors who had not used the tool (to ask about barriers); and 3) editorial teams within the Collaboration. Questionnaires were pilot tested before the survey was launched. Review authors who had used the RoB tool were asked questions assessing their experience of using the tool, including workload, opinions and perceptions of the tool, experience with specific bias domains, and training preferences (32 questions). Review authors who had not used the RoB tool were asked about reasons for not using the tool and about training preferences (nine questions). Review group staff were asked about their experiences of providing support to review authors (29 questions). Participants were recruited through established Cochrane Collaboration mailing lists. Links to each questionnaire were emailed to lists of review authors (5,038 subscribers), coordinating editors (79 subscribers), managing editors (69 subscribers) and to the general purpose email list, CC-Info (2,182 subscribers). The survey took place over a 3-week period in February 2010. The extent of subscriber overlap between these lists was unknown as they are maintained by different groups and are confidential. In addition, it was not possible to estimate the proportion of out-of-date or inactive subscribers in each list. Responses were analyzed using descriptive statistics, and free-text answers were analyzed by basic content analysis.
A face-to-face meeting was held in Cardiff, UK, in March 2010 to discuss results from the focus groups and surveys, and consider revisions to the first version of the RoB tool. There were 23 participants, including statisticians, epidemiologists, Cochrane review authors, editors and other members of Cochrane Review Groups and Cochrane Methods Groups, and the Editor in Chief of The Cochrane Library (http://www.thecochranelibrary. com). At the meeting, results from the focus groups and surveys were presented to initiate a semi-structured, open-ended discussion regarding specific aspects of implementation, while encouraging participants to raise issues they considered important. The discussion was guided by a set of topic areas identified as important through the survey. Recommendations for changes to the RoB tool and related guidance in the Cochrane Handbook were discussed and agreed through informal consensus.
In the months after the meeting, we collaborated with relevant groups within The Cochrane Collaboration to implement the proposed changes, including working with the software developers to integrate the proposed changes into Cochrane software and making arrangements for revising relevant guidance. As a part of a wider consultation within The Cochrane Collaboration about the proposed changes, an interactive discussion workshop was held at the 18th Cochrane Colloquium in Keystone, CO, USA. This was open to any Colloquium participants interested in attending. We presented the results from the online surveys as well as the proposed recommendations and invited participants to discuss the recommendations and provide feedback. Discussion points and feedback were recorded and fed back to the evaluation team and other groups within the Collaboration involved in the implementation of the recommendations. The implementation of proposed longer-term changes is ongoing and working groups were set up with the aim of continuous evaluation and development of the RoB tool.
This project was approved by the Ottawa Hospital Research Institute Ethics Committee (ON, Canada). The University of Bristol Faculty of Medicine and Dentistry Ethics Committee (Bristol, UK) classified this project as an audit of research practices, rather than a research project, and thus advised that explicit ethics approval was not required.

Focus groups
The four focus groups involved 25 participants, the majority of whom were experienced users of the RoB tool.
Others were familiar with the RoB tool but had not yet used it in the context of a Cochrane review. The main topics of discussion were: how the RoB tool is used in practice (for example pilot testing, updated reviews, modifications, use of quotes); opinions of the RoB tool (for example comparison to past practice, aspects liked and not liked); opinions of, and experiences with, specific domains; and current and desired training materials.
Focus group participants felt that the RoB tool was an improvement over past practice. Specific benefits described included: having a standardized approach to bias assessments; the transparency provided by requesting quotes; the flexibility of the tool; the figures that can be produced in RevMan (the Cochrane Collaboration's software for systematic reviews and meta-analyses); providing a good framework for consideration of the risk of bias; and providing a platform to encourage critical thinking. Questions about these potential benefits were therefore included in the survey. The main drawbacks described, which were also addressed in the survey, included: the increased workload and complexity as compared with past practice; the subjectivity of assessments; and a lack of clarity regarding the meaning of the 'Yes' , 'No' or 'Unclear' judgements. The original RoB tool phrased the judgements as answers to questions requiring a 'Yes' , 'No' or 'Unclear' response, with 'Yes' reflecting a low risk of bias. Many participants deemed this wording to be confusing and instead expressed a preference for a direct response such as 'Low risk'. The analysis of the focus group discussions identified important topics to cover in the survey and helped formulate survey questions and possible response options.
The focus groups also identified issues and suggestions that would require discussion during the subsequent face-to-face meeting relating to how the RoB tool is used in practice. For example, several participants raised the issue that RoB assessments present a particular problem when updating systematic reviews. Adopting the new tool in an updated review requires review authors to reassess the risk of bias of studies included in the original review, which they were often unwilling to do, and Cochrane Review Groups were not resourced to do this on behalf of authors. Participants also suggested that graphical displays of RoB assessments across studies should be prepared separately for individual outcomes measured in the review rather than at the study level, as individual outcomes can be judged to be at higher or lower risk of bias using the tool. They further suggested that such figures should reflect the sizes of the studies rather than a simple count of how many studies were in each RoB judgement category, as had been implemented in RevMan.
Finally, training and guidance materials (for example the Cochrane Handbook guidance, workshops) were considered important by focus group participants. Most participants described these materials as clear, but editorial groups described a challenge in persuading authors to follow and understand the guidance. Participants also described a need for more, in particular online, training materials. A list of specific gaps in existing guidance was developed to guide future training needs. These include guidance on: how to use RoB assessments within systematic reviews; how to assess risk of bias for study designs other than randomized trials; and whether and when it might be appropriate to add specific items (for example reporting of power calculations, funding source) to the 'other' bias domain. For detailed focus group findings see Additional file 1: Appendix 4.

Survey
In total, 380 respondents completed the survey. This represents a 4.4% response rate under assumptions that all subscribers' emails were active and up-to-date, and that there was no overlap in subscribers between mailing lists. We received 190 responses from authors who had used the RoB tool and 132 from authors who had not (non-users). Of the 58 Cochrane Review Group staff who responded, 19 were managing editors, 11 coordinating editors, 11 editors and 17 other staff.
Non-users of the RoB tool were asked nine questions covering: reasons for not using the RoB tool; training needs; and opinions on the availability of training. Most non-user respondents identified themselves as likely future users, for example because: they had not conducted a Cochrane review since the introduction of the RoB tool (95 of the 132 respondents); their review was still in the protocol stage (four respondents); they had not yet started the RoB assessments for their review (three respondents); or their co-authors were tasked with completing the RoB assessments (four respondents). Only eight respondents stated that they preferred using another assessment method, and two stated that their reason for non-use of the tool was the time it would take to use it. The answers of non-user respondents to training-related questions are summarized in Table 2 and provided in detail in Additional file 1: Appendix 2.
Authors' and editorial staff's experience with using the risk of bias tool Table 3 presents the main results from the survey of review authors who had used the RoB tool, while Table 4 presents a summary of Cochrane Review Group staff responses to related questions, answered from an editorial perspective. Authors of all levels of experience with the RoB tool were represented in the survey (Table 3, Q1). The time taken to complete a RoB assessment for one trial varied widely among respondents, but the majority of respondents considered the time taken to be acceptable (Table 3, Q3 and Q4). We did not observe an association between number of reviews authored and reported speed of completing RoB assessments (χ 2 = 18.9, P = 0.27). The majority of respondents (159, 84%) completed the recommended RoB table in RevMan, while 68 (36%) also included at least one RoB figure. The majority of respondents thought that the requirement to add quotes added transparency (128 authors and 49 editorial staff ) and increased confidence in RoB assessments (104 authors and 30 editorial staff; see Additional file 1: Appendices 1 and 3).
Nearly a third of respondents (56, 31%) said they had used a modified version of the RoB tool to assess randomized trials (Tables 3 and 4, Q7). Modifications consisted of adding new domains, modifying criteria for 'Yes/Unclear/No' judgements, or removing some domains. These modifications were usually based on own expertise (37 respondents), or following guidelines from their Cochrane Review Group (21 respondents; see Additional file 1: Appendix 1). Thirty-nine (21%) respondents had used the RoB tool to assess non-randomized studies, and 16 editorial staff who responded (28%) stated their review group recommended this practice. When used for this purpose, the RoB tool was usually modified (Tables 3 and 4, Q6). Non-randomized study designs identified by respondents were quasi-randomized, cohort, case-control, cross-sectional,   Coordinating editor 11 (19) Other editor 11 (19) Trial search coordinator/information specialist 2 (3) Other 15 (26) CRG policy regarding RoB assessments for new reviews (Q2) All new reviews must include RoB assessment 45 (78) Recommended, but not compulsory 9 (16) No clear policy or not sure 4 (7) CRG policy regarding RoB assessments for updated reviews (Q3) All updated reviews must include RoB assessment 28 (48) Only for newly included studies (Q3a) 3 Both newly and previously included studies (Q3a) 10 Recommended, but not compulsory 22 (38) Only for newly included studies (Q3a) 0 interrupted time-series and controlled before-and-after studies. Modifications were usually based on respondents' expertise and literature, but with no consistent or standard approach. Two other instruments reported to be used for this purpose were the Newcastle-Ottawa scale [10,11] and the Cochrane Effective Practice and Organisation of Care (EPOC) Group's quality assessment checklist (see Additional file 1: Appendices 1 and 3) [12]. The survey responses indicated that authors need clearer guidance on what to do with RoB assessments once completed: 26 (14%) respondents did not incorporate their RoB assessments into review conclusions at all, while the majority (104, 55%) opted to include a narrative summary ( Table 3, Q9). In terms of review group policy, the most prevalent recommendation was that authors should include a sensitivity analysis (Table 4, Q9).

Issues specific to individual bias domains
Authors reported some difficulties in completing each bias domain, but the domains thought to be most difficult were 'incomplete outcome data' and 'selective outcome reporting' (Table 3, Q17 to Q22). Editorial staff identified similar issues (Table 4, Q16 to Q21). Nevertheless, 172 (91%) of respondents reported feeling 'somewhat' or 'very confident' in their RoB assessments ( Table 3, Q12). We did not observe an association between the number of domains with which respondents reported problems and whether or not they had any RoB-specific training (T = 0.29, P = 0.77). Similarly, having received specific RoB training was not associated with the respondents' level of confidence in their RoB assessments (T = 1.59, P = 0.11). We describe below more detailed responses for each domain (shown in Additional file 1: Appendix 1).
The most common problems with assessing sequence generation were: confusing sequence generation with allocation concealment (50% of those reporting a problem with this domain); and difficulty in assessing whether a particular reported method was associated with bias (52% of those reporting a problem). Respondents also reported that the method of sequence generation was commonly not described in trial reports and accordingly wanted guidance on how to make judgements based on their overall impression of trial conduct. Similarly, if allocation concealment is well described and adequate, respondents wanted guidance on whether this can be used as a basis for a judgement of low risk of bias for sequence generation. Most respondents reported that they simply select 'unclear' whenever study reports do not describe sequence generation.
The most common problems with allocation concealment were: difficulty in assessing whether a particular reported method was associated with bias (61% of those reporting a problem with this domain); confusing allocation concealment with blinding (34% of those reporting a problem); and consistency between assessors (26%). Again, a commonly raised issue was insufficient information in the trial report, especially for older studies.
Respondents who reported problems with blinding experienced difficulty with making a judgement in studies where patients and/or caregivers cannot be blinded (68% of those reporting problems), while 64% reported difficulty in making a global assessment of blinding of patients, providers and outcome assessors.
The most common problems with the incomplete outcome data domain included: difficulties in making an assessment when the dropout rate is described but not acceptable (55% of those reporting a problem); establishing whether an intention-to-treat analysis had been conducted (57%); establishing what constitutes 'complete' outcome data (67%); making assessments of missing outcome data at different follow-up periods (52%); and confusing incomplete outcome data with selective outcome reporting (33%). Inconsistency in the meaning and understanding of the phrase 'intention-to-treat analysis' was also cited as a source of problems in some free-text answers.
The most common problem reported for selective outcome reporting was making an assessment without access to a study protocol (86% of those reporting a problem) and confusing selective outcome reporting with incomplete outcome data (41%). Inconsistency between assessors (20%) and lack of standard outcome measures in a given clinical area (22%) were also reported. One respondent raised concerns that this domain is not relevant to review results because either the missing information can be obtained from the study author, or the study cannot be included in the meta-analysis and should thus be excluded from the RoB table.
Many respondents (95, 89% of those reporting a problem with this domain) found it difficult to decide what should be considered under other sources of bias. Some suggested the domain is too vague and therefore open to misuse. The following are some of the items respondents had included under the 'other bias' domain in their reviews: compliance; baseline comparability; funding source and conflict of interest; adjustment for confounding factors; biases in cluster-randomized trials; carry-over effects in cross-over trials; co-interventions; early stopping of trials for benefit; multiple interim analyses; sample size calculations; publication bias; selection/recruitment bias; validity of outcome measures; surgical learning curve; and timing of outcome assessment. A decision on what should be included in the 'other bias' category had usually been made in consultation with co-authors (39 respondents).
Responses relating to training specific to the RoB tool are shown in Table 2 for all three groups of respondents, separately. Existing training materials and opportunities seem to be satisfactory in general, but respondents did favour provision of additional examples and web-based training.

Recommendations and implementation
At the face-to-face meeting in March 2010, 23 participants considered the findings of the focus groups and the surveys and made consensus-based recommendations for improvements to the RoB tool, which are summarized in Table 5. Some of the short-term changes were implemented in a new edition of the Cochrane Handbook [8] and RevMan version 5.1 [13]. Specifically, wording of bias judgements was changed from 'Yes/No/ Unclear' to 'Low/High/Unclear' risk of bias; category headings were introduced for selection, performance and detection, attrition, reporting, and other bias; authors are now encouraged to make separate judgements for blinding for 1) participants and personnel, and 2) outcome assessment; and guidance was clarified, particularly for incomplete outcomes, selective outcome reporting and 'other sources of bias'.
Medium-and longer-term recommendations (implementation to coincide with the development of RevMan version 6 or later) include: separation of assessments of blinding into blinding of participants and personnel (under performance bias) and blinding of outcome assessment (under detection bias) will be enforced by structural changes in the software; weighting RoB graphs by study size; providing an algorithm for reaching a summary assessment of risk of bias per study/outcome; and developing a RoB tool for assessment of non-randomized studies. Extensions to the written guidance will be incorporated into upcoming versions of the Cochrane Handbook, including: further clarification of guidance with regards to selective reporting and other sources of bias; clearer and more explicit guidance for incorporating RoB assessments into meta-analyses; an algorithm for formulating summary assessments across domains of bias; and a bank of worked examples. A dedicated steering group was formed in 2011, funded by the Cochrane Collaboration's Methods Innovation Fund, to develop a RoB tool for the assessment of non-randomized studies. This work is expected to be completed by the end of 2014. Another working group, formed in 2012, was tasked with introducing signalling questions within each bias domain and an overall RoB judgement for each outcome in the RoB tool for randomized trials in order to provide a more structured framework for reaching domain-level and outcomelevel judgements. The same structure of signalling questions and bias domains is being implemented in RoB tools for randomized and non-randomized studies, with the aim of applying the same standards of assessments for all study types.

Discussion
Our multi-staged evaluation of the RoB tool found wide acceptance of the need for the tool, with consensus that it represents an improvement over methods previously Table 5  recommended for use in systematic reviews. The interpretation of these findings should however be cautious, due to a low response rate of the survey. The time required to complete assessments of risk of bias was greater than had been required by previous approaches, but was nonetheless considered acceptable. A high proportion of respondents reported problems with each of the individual RoB domains. The domains reported to be the most difficult to assess were risk of bias due to incomplete outcome data and selective reporting of outcomes. There was wide variation in how review authors had approached the 'other bias' domain, with a lack of clarity over what additional items should be considered here. Some of the items that authors have included (such as sample size calculations and funding source) are explicitly discouraged in the Cochrane Handbook guidance.
While there is evidence that some factors are empirically associated with effect estimates, such as single versus multicentre design, early stopping of trials and funding source [14][15][16], the extent to which these should be considered alongside the main bias domains is still a topic of debate. The evaluation highlighted a need for more and better training and guidance materials, such as algorithms or similar structured guidance for reaching domain-level judgements, as well as guidance on how to incorporate RoB assessments into meta-analyses and review conclusions. Recommendations for changes or further developments were made based on identified needs and many have already been incorporated into the new edition of the Cochrane Handbook, while other developments are underway. As suggested by evaluation participants, an online bank of worked examples for RoB assessments will be incorporated into future versions of the Cochrane Handbook or made available online.
This was the first study to evaluate the implementation of the new Cochrane tool for assessment of trials included in reviews. We used qualitative methods (focus groups) to help design the questionnaire, which we piloted to improve face validity. The focus groups were facilitated by the authors (DM, JACS, JS or LW), two of whom are bias experts and contributed to the development of the original RoB tool (DM and JACS). It is possible that, under such circumstances, the participants could have been reluctant to admit lack of understanding or confusion with the tool. However, the main purpose of the focus groups was to inform the development of the survey questionnaire and not to draw any firm conclusions. Some of the focus group participants were later involved in the piloting of the questionnaire. Although the proportion of respondents to the survey was small (4.4% of the 7,368 mailing list subscribers), it is possible that the effective response rate was somewhat higher due to a combination of overlap among the four mailing lists and the presence of inactive Cochrane review authors on the authors' list. However, given the low response rate, it is possible that authors and Cochrane Review Group staff who read the email and chose to respond differ from those who did not read the email or chose not (or forgot) to respond. Due to time limitations, our survey was live for only 3 weeks, which also could have reduced the response rate. Nevertheless, the main purpose of this evaluation was to identify potential problems with the RoB tool that can be rectified, and we suspect that users who encountered problems are more likely to have responded. This speculation is based on the high proportion of respondents who reported having problems with some aspects of the RoB tool, especially with individual RoB domains. However, it is equally possible that those users of the RoB tool who experienced the most problems with RoB felt disillusioned and chose not to participate. One further limitation to consider is that the survey measured confidence and self-reported difficulty; it is possible that the number of people incorrectly applying these concepts may be higher as authors may be unaware of their misunderstandings. We also wanted to gauge general perceptions of users of the RoB tool, and to find out if their training needs were being met. Another potential limitation is the small number of non-users of the RoB tool represented in the evaluation. It is impossible to determine whether the number of non-user respondents was small because few authors made a decision not to use the tool or because such authors chose not to respond to the survey.
We are not aware of a similar survey of Cochrane review authors or evaluation of the RoB tool. Several studies used other methods to investigate the use of the RoB tool in practice and evaluate its reliability. Hartling et al. found that, although the tool takes longer to complete than other approaches, trials assessed to be at high risk of bias produced more exaggerated effect estimates compared to low risk trial reports [17]. This is consistent with other empirical studies [2,18]. The same authors assessed the reliability of the tool and found, consistent with the results reported here, that incomplete outcome data and selective reporting are the most difficult domains to assess [17]. It is important that guidance and training materials continue to be developed for all aspects of the tool, but particularly these two items. One of the findings from our evaluation that was of particular concern is that 44% or more of respondents had difficulty with assessing each of the individual RoB domains. This is consistent with the results of the reliability testing reported by Hartling et al. [17]. Inter-rater reliability is a substantial problem facing the RoB tool, in common with many of the other tools used for similar purposes in systematic reviews. Nevertheless, a further study has found the reliability of the RoB tool to be better when review-specific guidance was used, with reported agreement on bias domains ranging from fair to almost perfect [19]. Liu et al. carried out a review of systematic reviews of acupuncture in Chinese journals in the period from 2009 to 2011 in order to assess the prevalence of use of the Cochrane RoB tool in this field of research. They found that only 6% of reviews reported information on all six RoB domains [20].
Our evaluation led to recommendations for improvements to the tool [9]. There was consensus that assessment of blinding should be separated into blinding of participants and health professionals (performance bias) and blinding of outcome assessors (detection bias), and that classification of bias domains into categories of bias (selection bias, performance bias, detection bias, attrition bias, reporting bias and other bias) would be helpful. Some of the recommended changes have been implemented in RevMan version 5.1 [13] and in a revised version of the Cochrane Handbook, released in March 2011 [8]. There was agreement that improved training materials and availability of worked examples would increase the quality and reliability and reduce misuse of items assessed in RoB assessments.
The current RoB tool addresses main sources of bias in randomized trials of a standard parallel-group design. The evaluation helped to identify a need for timely development of extensions of the RoB tool to cover other randomized trial designs, and non-randomized studies. The next generation of the tool will meet the need for more structured guidelines for reaching domain-based RoB judgements (for example algorithms), since it will introduce a signalling question-based approach as used in the QUADAS 2 tool for assessing diagnostic accuracy studies [21]. Signalling questions are additional, specific questions within each bias domain aimed at helping the assessor reach the domain-level judgement more easily and in a more structured way.
More empirical evidence is needed to further inform considerations of what methodological aspects are most important in assessing risk of bias. There is a particular need for assessment of the influence of participant attrition on effect estimates, and on separate contributions to bias from blinding of patients and caregivers versus blinding of outcome assessors. Further, clearer guidance, ideally based on empirical evidence, is needed on how to deal with studies at high risk of bias in meta-analyses, other syntheses of evidence across studies and drawing conclusions.

Conclusion
Our evaluation of the Cochrane RoB tool suggests that it is a step in the right direction, but that revisions of the tool and associated guidance, and improved provision of training, are required. Extensions of the tool for non-parallel group randomized trials and non-randomized studies were identified as a priority and such developments have been initiated as a consequence of this evaluation.

Additional file
Additional file 1: Appendices 1, 2, 3 and 4. Evaluation of the Cochrane Collaboration's tool for assessing the risk of bias in randomized trials: focus groups, online survey and proposed recommendations. Detailed survey and focus group results.
Abbreviations CRG: Cochrane Review Group; EPOC: Effective Practice and Organisation of Care; RoB: Risk of bias.
Competing interests JACS, DGA, DM and JPTH contributed to the development of the original RoB tool and continue to be involved in the tool development as well as the development of the extensions of the RoB tool for non-randomized studies and trials with non-standard designs. JS and LT are involved in the development of the extensions of the RoB tool for non-randomized studies and trials with non-standard designs. LW declares no competing interests.
Authors' contributions JS participated in study design, conducted focus groups, administered the survey, analyzed the survey results, organized the face-to-face meeting, interpreted results and drafted the manuscript. LW participated in study design, conceived, conducted and analyzed focus groups, administered the survey, interpreted results, and helped draft the manuscript. JACS conceived the study, participated in its design, conducted focus groups, interpreted results and critically reviewed the manuscript. LT administered the survey, interpreted results and critically reviewed the manuscript. DGA interpreted results and critically reviewed the manuscript. DM conducted focus groups and critically reviewed the manuscript. JPTH analyzed the data, interpreted results and helped draft the manuscript. All authors read and approved the final manuscript.