### Overview of the study

Forty systematic reviews (20 Cochrane, 20 non-Cochrane) of RCTs published from January 2010 to January 2012 and indexed in the Cochrane Database of Systematic Reviews (CDSR) or PubMed will be randomly sampled. The first meta-analysis of a continuous outcome within each review will be included. From each review protocol (where available) and published review we will extract information regarding which types of outcome data were eligible for inclusion in the meta-analysis (for example, measurement instruments, time points, analyses). From the RCT reports we will extract all outcome data that are compatible with the meta-analysis outcome as it is defined in the review and with the outcome data eligibility criteria and hierarchies in the review protocol. The association between selection of RCT outcome data included in a meta-analysis and the magnitude and statistical significance of the RCT result will be investigated. We will also investigate the impact of the selected trial result on the magnitude of the resulting meta-analytic effect estimates.

### Eligibility criteria

A systematic review was defined using the definition by Moher *et al*.: …‘the authors’ stated objective was to summarize evidence from multiple studies, and the article described explicit methods, regardless of the details provided’ [27]. The eligibility criteria for inclusion of both Cochrane and non-Cochrane systematic reviews include: 1) the review was published between Issue 1, 2010 to Issue 1, 2012 in the CDSR, or between January 2010 to January 2012 in a non-Cochrane journal; 2) the review is written in English (as we do not have the resources available to translate systematic reviews published in other languages); 3) references of all included RCTs are reported in the review; 4) the review evaluates the effects of any intervention for either RA, OA, depressive disorders (including major depressive disorder, dysthymic disorder, bipolar depression, seasonal affective disorder, and post-partum depression), or anxiety disorders (including generalized anxiety disorder, obsessive-compulsive disorder, panic disorder, phobic disorders, acute stress disorder, and post-traumatic stress disorder) [28], and 5) the review includes at least one continuous outcome meta-analysis of RCTs (for example, pain, function, number of tender or swollen joints, depression, anxiety, quality of life), with reporting of i) either the summary statistics (for example, mean, SD) or effect estimate and precision of each RCT included in the meta-analysis, and ii) the meta-analytic effect estimate and its precision.

We have selected these clinical areas to explore whether the existence of a core outcome set being available for the clinical condition of the review (namely, RA and OA) impacts on selective inclusion of results. We will focus on continuous outcomes since there is greater scope for multiplicity of continuous outcomes in these clinical areas (for example, arising from multiple measurement instruments, final versus change from baseline values, adjusted versus unadjusted means, sub-scale scores) compared with dichotomous outcomes. Both Cochrane and non-Cochrane reviews will be eligible regardless of whether a published protocol for the review is available. Unpublished protocols will be requested from authors. Both new and updated reviews will be eligible. For updated reviews, the protocol drafted closest to the latest update will be included in this study.

The exclusion criteria are: 1) no meta-analyses of continuous outcomes are reported in the review; 2) results from non-randomised studies are included in each of the meta-analyses of continuous outcomes, and 3) non-standard meta-analytical methods are used (for example, Bayesian, multiple-treatments, or individual patient data meta-analyses).

### Literature search

We will identify systematic reviews by performing an electronic search of the CDSR and PubMed. We will use RA and OA search terms recommended by The Cochrane Collaboration Musculoskeletal Review Group [29], and depressive and anxiety disorders search terms recommended by The Cochrane Collaboration Depression, Anxiety and Neurosis Review Group [30]. For the PubMed search strategy we will combine the clinical search terms with a search filter used to identify systematic reviews in a previous empirical study on the epidemiology and reporting characteristics of systematic reviews [27]. As the CDSR only includes records of Cochrane reviews, we will not use the systematic review search filter in the CDSR search strategy. We will limit searches to English language publications and date of publication from 1 January 2010 to 31 January 2012. The search strategies for both databases are reported in Additional file 1.

### Selection of systematic reviews

The citations retrieved from the CDSR and PubMed databases will be exported to Microsoft Excel and randomly sorted using the random number generator (citations of Cochrane reviews retrieved in the PubMed search will be deleted). One investigator (MJP) will read down the list of randomly sorted citations and screen the titles and abstracts, marking them as potentially eligible or ineligible. The full text of each potentially eligible systematic review will be retrieved and assessed against the inclusion criteria. This process will continue until 10 Cochrane RA or OA reviews, 10 non-Cochrane RA or OA reviews, 10 Cochrane depressive or anxiety disorders reviews, and 10 non-Cochrane depressive or anxiety disorders reviews, are included. Within both clinical categories (that is, RA or OA and depressive or anxiety disorders), we will not constrain the selection by the particular clinical condition (for example, we will not require an equal number of reviews of depression and anxiety). Any difficulties with determining whether a systematic review meets inclusion criteria will be resolved by discussion with a second researcher (JEM).

### Selection of continuous outcome for investigation

We will select from each systematic review the first meta-analysis of a continuous outcome that meets the inclusion criteria (henceforth referred to as the index meta-analysis). The index meta-analysis may be selected from the abstract, summary of findings table, or results section of the review, depending on where the result is first reported in the publication. We will not constrain the selection based on the outcome label of the review (that is, primary, secondary, or unlabelled), because we anticipate that in some reviews the primary outcome(s) may be dichotomous or the primary continuous outcome may not have been meta-analysed. We will not constrain the selection based on the domain measured (for example, pain, or function). Meta-analyses will be eligible regardless of meta-analytic effect measure (that is, MD or SMD), meta-analytical model (that is, fixed-effect or random-effects), and number of RCTs included (as long as at least two RCTs are included).

### Report retrieval

We will retrieve reports of systematic reviews, review protocols, and RCTs using library services. Reports of RCTs may comprise journal articles, conference abstracts, unpublished dissertations, or regulatory agency or pharmaceutical company reports. For RCTs included in Cochrane reviews with reports written in languages other than English, we will request a copy of the translation, if available, from the Cochrane Review Groups, or will use Google Translate. We will retrieve reports of RCTs included in the index meta-analysis and those reported by the systematic reviewers as investigating the same pairwise comparison but which were excluded from the meta-analyses (to explore whether any eligible outcome data may have been missed from these reports or potentially excluded based on the results). If more than one reference for an RCT was reported by the systematic reviewers (for example, both a journal article and a conference abstract), we will retrieve all references reported. This will enable investigation of potential selective inclusion resulting from differences in results reported across different sources [31–33].

### Data extraction

One investigator (MJP) will extract data from all reviews and RCTs into a standardised form created in Microsoft Excel. This form will be pilot-tested on one review from each of the four categories (Cochrane RA or OA review, non-Cochrane RA or OA review, Cochrane depression or anxiety disorders review, non-Cochrane depression or anxiety disorders review), and refined accordingly. A second investigator will independently extract data from a random sample of 10 reviews and their included RCTs. If many data extraction discrepancies are identified, we will consider undertaking double data extraction for the remaining reviews. Any discrepancies between the data extracted will be resolved through discussion or adjudication by a third investigator if necessary. The list of data we will extract from the systematic review protocols, published systematic reviews, and RCTs is reported in Additional file 2. A brief summary is provided below.

#### Data to extract from systematic review protocols

From the systematic review protocol (where available) we will extract: 1) general characteristics of the review, including date of publication, and participants, interventions, comparisons, and outcomes of interest to the review; 2) reported outcome data eligibility criteria (for example, measurement scales, time points, intervention groups, and/or analyses), and 3) reported outcome data hierarchies (for example, whether final values were preferred over change from baseline values if both are reported in an RCT publication).

#### Data to extract from published systematic reviews

From the published systematic review, we will extract the same information as from the protocols. In addition, we will extract information on any other outcome data reported in the review that are related measures of the index meta-analysis outcome under the same comparison. For example, if the index meta-analysis outcome is global pain at 4 to 6 weeks, we will record whether any outcome data for different pain scales at different time points were included in the review, either in a subsequent meta-analysis or in separate tables; these additional analyses also include sensitivity analyses related to the index meta-analysis. For the index meta-analysis, we will extract the following information: 1) the measurement instrument, time point of measurement, and intervention and comparison group for each RCT; 2) summary statistics for both groups in each RCT; 3) the MD or SMD, measures of variability, the statistical significance, and direction of the effect estimate for each RCT and for the meta-analytic effect; 4) heterogeneity statistics, and 5) which outcome data were obtained from the trialists because it was not reported in the RCT publication, involved algebraic manipulation of statistics (for example, calculating SDs from reported 95% CIs of the mean), came from a report translated into English, or required a method of imputation (such as imputing a missing SD).

#### Data to extract from RCT reports

From the RCT reports we will extract all outcome data that are compatible with the index meta-analysis outcome as it is defined in the review and with the outcome data eligibility criteria and hierarchies reported in the review protocol. This could include data from multiple measurement instruments measuring the same outcome, multiple time points, multiple intervention or control groups, final and change from baseline values, intention-to-treat and per-protocol analyses, adjusted and unadjusted effect estimates, and other analyses. For example, if the index meta-analysis is an MD meta-analysis of depression scores and the systematic reviewers report in the protocol that only HRSD outcome data will be included in a meta-analysis of depression, and specify no other outcome data eligibility criteria, we will extract all data for the HRSD (for example, all time points, adjusted and unadjusted effect estimates), but no data for any other depression measurement instrument reported in the RCTs. Alternatively, if the index meta-analysis is an SMD meta-analysis of pain intensity at 12 weeks, and the systematic reviewers have not pre-specified any outcome data eligibility criteria or hierarchies, we will extract all pain intensity data (for example, based on any measurement scale, intention-to-treat and per-protocol analyses) from each RCT at 12 weeks only. For systematic reviews without a protocol, we will request the unpublished protocol from the systematic reviewers. If one does not exist or is not provided, we will assume that no outcome data eligibility criteria or hierarchies were pre-specified, and will extract all outcome data from the RCTs, as long as they are compatible with the index meta-analysis outcome as it is defined in the review (as per the second example above). Final and change from baseline values are a special case in that systematic reviewers performing an SMD meta-analysis of different measurement instruments should include only final values or change from baseline values, not a mixture [34]. For systematic reviews that only include final values in an SMD meta-analysis, we will not extract any change from baseline values from the RCTs (and vice versa for systematic reviews that only include change from baseline values in an SMD meta-analysis). If systematic reviewers include a mixture of final and change from baseline values in an SMD meta-analysis, we will extract both types of values from the RCTs.

For each type of RCT outcome data deemed eligible for inclusion in the meta-analysis, we will extract: 1) the measurement instrument, time point of measurement, and intervention and comparison groups; 2) sample sizes, measures of central tendency, and measures of variability per group; 3) the effect estimate (MD or SMD) and measures of variability, the statistical significance, and direction of the effect estimate; 4) the baseline SD of the outcome per group, and 5) whether outcome data were fully reported in the RCT report (where fully reported is defined as reporting sufficient information to include the data in a meta-analysis [35]). We will use DigitizeIt 1.5.8© software to extract outcome data presented in figure format when the data are not available in the text of the report. We will not contact trialists for unpublished data.

### Sample size

A study of the characteristics of meta-analyses (with at least two studies) contained in the January 2008 issue of the *Cochrane Database of Systematic Reviews*[36] found the median number of studies per meta-analysis to be three. Assuming three RCTs per meta-analysis, a sample of forty meta-analyses will provide one hundred and twenty RCTs. This will allow estimation of the proportion of RCTs with multiplicity of outcome data to within ± 9% of the true population percentage. This assumes a population proportion of 50%, a worst case scenario for the sample size calculation.

### Analysis

#### Descriptive analyses of general characteristics of systematic reviews

We will use descriptive statistics to summarise the characteristics of the systematic reviews included in the study. These characteristics include, for example, the clinical condition, intervention and comparison type, number of primary and secondary outcomes (reported in the review protocol and published review), number of RCTs included in the review overall, and characteristics of the index meta-analysis outcome (outcome definition, meta-analytic effect measure, meta-analytical model, and number of included RCTs).

#### Descriptive analyses of reporting of outcome data eligibility criteria and hierarchies in systematic review protocols and published reviews

We will calculate the proportion of systematic review protocols and published reviews reporting at least one outcome data eligibility criterion and the proportion reporting at least one outcome data hierarchy. We will also separately calculate the proportion of protocols and reviews reporting eligibility criteria and hierarchies in relation to each of the following types of outcome data multiplicity: 1) multiple measurement instruments; 2) multiple time points; 3) multiple intervention or control groups; 4) final and change from baseline values; 5) sets of participants contributing to the analysis (for example, intention-to-treat, per-protocol, as-treated); 6) unadjusted and adjusted effect estimates; 7) period results in crossover RCTs, and 8) other. Further, we will calculate the proportion of systematic reviews with at least one discrepancy in outcome data eligibility criteria and hierarchies between the protocol and published review (where a discrepancy is defined as an addition, removal, or modification of an eligibility criterion or hierarchy).

#### Quantifying outcome data multiplicity in RCT reports

We will calculate the proportion of RCTs with at least one type of outcome data multiplicity that is compatible with the index meta-analysis outcome as it is defined in the review and with the outcome data eligibility criteria and hierarchies reported in the review protocol. We will also calculate the proportion of RCTs with the following types of outcome data multiplicity: 1) multiple measurement instruments; 2) multiple time points; 3) multiple intervention or control groups; 4) final and change from baseline values; 5) sets of participants contributing to the analysis (for example, intention-to-treat, per-protocol, as-treated); 6) unadjusted and adjusted effect estimates; 7) period results in crossover RCTs, and 8) other. In addition, for each RCT we will quantify the number of effect estimates that were eligible for inclusion in the index meta-analysis, and will quantify the median (interquartile range) of eligible effect estimates per RCT. We will also quantify the number of eligible effect estimates that were not included in the index meta-analysis but were included in other meta-analyses or elsewhere in the review (for example, tables).

#### Testing the association between selection of outcome data and the magnitude and statistical significance of the effect estimate

When multiple effect estimates are available for inclusion in a meta-analysis, without pre-specified selection rules, several different methods may be acceptable (in terms of not introducing bias) for selecting an effect estimate from the set available. These mechanisms may include: 1) selecting data for the most commonly reported instrument, time point, or analysis across RCTs; 2) random selection of an effect estimate; 3) selection of the median effect estimate, and 4) selection of the outcome data based on clinical criteria. The commonality of these selection methods is that the selection of the effect estimate is not based on choosing systematically higher or lower effect estimates. If across the RCTs, selection methods 1) to 4) are employed, we would expect that the distribution of selected effect estimates would be consistent with what we would observe under purely random selection, although this does not necessarily mean that the process used to select the effect estimates was indeed random selection.

We have developed an index, which we call the Potential Bias Index (PBI), to assess whether the estimates selected for inclusion in the index meta-analysis are systematically higher or lower than what would be expected by purely random selection. This index is based on the ordered effect estimates for each trial and the positioning (that is, rank) of the effect estimate selected within that order. A rank of 1 is assigned to the smallest effect estimate and a rank equal to the number of effect estimates is assigned to the largest effect estimate. Since the number of effect estimates varies across trials we rescale the ranks of the effect estimates to reflect their relative positioning (in ranking units) between the smallest and largest effect estimates. This is obtained by subtracting one from the rank of the selected effect estimate and dividing by the number of effect estimates minus one. The smallest effect estimate in a trial then has a location of zero and the largest effect estimate has a location of 1. So for a trial with three effect estimates and the rank of the chosen effect estimate of 2, its location is (2–1)/(3–1) = 0.5 - halfway between the lowest and highest rank. The Potential Bias Index (PBI) is defined as the weighted average of the locations of the selected estimates for each trial, with the weights being the number of effect estimates in each trial. With this weighting, greater priority is given to the locations of effect estimates the larger the number of effect estimates there were to choose from. The expression for PBI is:

\mathit{PBI}={\displaystyle {\sum}_{i=1}^{k}\frac{{n}_{i}\left({X}_{i}-1\right)}{{n}_{i}-1}}/{\displaystyle {\sum}_{i=1}^{k}{n}_{i}}

where there are *k* trials, *n*_{i} is the number of effect estimates in trial *i*, and *X*_{i} is the rank of the selected effect estimate in trial *i*. Derivation of this index and a worked example is provided in Additional files 3 and 4. Only trials with more than one effect estimate are included in the PBI since a trial with one effect estimate provides no information about relative location. When the largest effect estimate in each of the trials is selected for inclusion, the PBI will have the value 1, and conversely PBI = 0 when the smallest effect estimate is always selected. Under a process consistent with random selection, the PBI is expected to take the value of 0.5, so, on average the chosen effect estimates are at the middle location. Similarly, a PBI of 0.75 would indicate that on average the effect estimates chosen were 75% of the distance between the smallest and largest ranks, or equivalently halfway between the middle and highest rank. We have constructed a simple statistical test based on the PBI to test whether the observed selection of effect estimates is consistent with randomness of selection (see Additional file 3). Confidence intervals for the PBI can be constructed using bootstrap methods by resampling individual trials [37]. We will also apply the PBI to assess possible selection mechanisms in which the smaller *P*-values of the effect estimates are chosen for inclusion.

#### Impact of selection of outcome data on meta-analytic results

The PBI described above will also be used to compare the index meta-analytic effect estimates with all possible meta-analytic effects. For each meta-analysis, all possible meta-analytic effects will be calculated from all combinations of available RCT effect estimates. The meta-analysis model used to combine the estimates (either fixed or random effects) will be the model that was used in the systematic review. However, sensitivity analyses will be undertaken to examine whether the type of meta-analysis model affects the PBI.

We will also investigate the impact of the selected RCT effects on the magnitude of the resulting meta-analytic effect estimates. For each meta-analysis, the difference between the index meta-analytic effect estimate and the median of all possible meta-analytic effect estimates will be calculated. These differences will be standardised (by dividing by the pooled baseline SD of the outcome) and meta-analysed using a random-effects model across reviews. The meta-analytic weights will be based on the standardised standard error of the median meta-analytic estimates, and between RCT variability estimated using DerSimonian and Laird’s method of moments estimator [38]. Note that this approach ignores the correlation between the meta-analytic effects within meta-analysis, arising from correlated RCT effects.

#### Subgroup analyses

We will examine whether the existence of 1) a systematic review protocol and 2) a core outcome set being available for the clinical condition of the review affects a) the specificity of outcome data eligibility criteria and hierarchies reported in systematic review protocols and published reviews; b) the proportion of RCTs with multiplicity and the proportion of systematic reviews with at least one RCT with multiplicity; c) the PBI of the RCT effect estimates selected for inclusion in the index meta-analysis, and d) the PBI of the resulting index meta-analytic effect estimates.

#### Sensitivity analyses

For systematic reviews without protocols, it is not known whether the outcome eligibility criteria reported in the methods section of the review were specified prior, or subsequent to undertaking the review. Therefore, for our primary analyses, we have chosen to include the set of RCT effect estimates that are compatible with the assumption of no pre-specified outcome data eligibility criteria. However, through sensitivity analyses, we plan to investigate if the PBIs (calculated at both the RCT and meta-analysis level) are modified when the set of RCT effect estimates are restricted to those that are compatible with the outcome data eligibility criteria and hierarchies specified in the methods section of the review.