SAMURAI: Sensitivity analysis of a meta-analysis with unpublished but registered analytical investigations (software)

Background The non-availability of clinical trial results contributes to publication bias, diminishing the validity of systematic reviews and meta-analyses. Although clinical trial registries have been established to reduce non-publication, the results from over half of all trials registered in ClinicalTrials.gov remain unpublished even 30 months after completion. Our goals were i) to utilize information available in registries (specifically, the number and sample sizes of registered unpublished studies) to gauge the sensitivity of a meta-analysis estimate of the effect size and its confidence interval to the non-publication of studies and ii) to develop user-friendly open-source software to perform this quantitative sensitivity analysis. Methods The open-source software, the R package SAMURAI, was developed using R functions available in the R package metafor. The utility of SAMURAI is illustrated with two worked examples. Results Our open-source software SAMURAI, can handle meta-analytic datasets of clinical trials with two independent treatment arms. Both binary and continuous outcomes are supported. For each unpublished study, the dataset requires only the sample sizes of each treatment arm and the user predicted ‘outlook’ for the studies. The user can specify five outlooks ranging from ‘very positive’ (i.e., very favorable towards intervention) to ‘very negative’ (i.e., very favorable towards control). SAMURAI assumes that control arms of unpublished studies have effects similar to the effect across control arms of published studies. For each experimental arm of an unpublished study, utilizing the user-provided outlook, SAMURAI randomly generates an effect estimate using a probability distribution, which may be based on a summary effect across published trials. SAMURAI then calculates the estimated summary treatment effect with a random effects model (DerSimonian & Laird method), and outputs the result as a forest plot. Conclusions To our knowledge, SAMURAI is currently the only tool that allows systematic reviewers to incorporate information about sample sizes of treatment groups in registered but unpublished clinical trials in their assessment of the potential impact of publication bias on meta-analyses. SAMURAI produces forest plots for visualizing how inclusion of registered unpublished studies might change the results of a meta-analysis. We hope systematic reviewers will find SAMURAI to be a useful addition to their toolkit.

1 Formatting a data set file for use by SAMURAI

Binary outcomes
The data set should have the same column headings as those in the example below in Table 1. The outlook of a study can be one of the following: "published", "very positive", "positive", "no effect", "negative", "very negative", "very positive CL", "positive CL", "current effect", "negative CL", or "very negative CL".
ctrl.n and expt.n refer to the sample sizes of the control and experimental arms, respectively. ctrl.events and expt.events refer to the numbers of events within the control and experimental arms, respectively.

Continuous outcomes
A data set with continuous outcomes can be in one of two formats.

Means and standard deviations
For studies containing the mean effect and its standard deviation for each arm of a published study, the column headings should be the same as in the example below in Table 2. ctrl.n and expt.n refer to the sample sizes of the control and experimental arms, respectively. ctrl.mean and expt.mean refer to the mean effect size within the control and experimental arms, respectively.
ctrl.sd and expt.sd refer to the standard deviation of the effect size within the control and experimental arms, respectively.

Standardized mean differences (SMD)
For studies containing data on the standardized mean difference (SMD) and its variance for each published study, the column headings should be the same as in the example below in Table 3. The SMD should be equivalent to Hedges' g. ctrl.n and expt.n refer to the sample sizes of the control and experimental arms, respectively. smd is the SMD and smd.v is its variance.
2 Importing a data set file The R function read.csv() can be used to import CSV files that separate values by commas. The R function read.csv2() can be used to import CSV files that separate values by semi-colons (as is done on computers with Microsoft Windows with German language settings).

Pseudocode of forestsens()
Step 0: (Optional) Designate all unpublished studies to have the same outlook (i.e. the same risk ratio).
The user can override the outlooks of the unpublished studies with a specified outlook by using the option outlook. For example, to assign the outlook "no effect" to all unpublished studies, we can specify the option outlook="no effect".

For studies with binary outcomes
Step B1: Subset the published studies. Calculate the log risk ratio and its variance for each of the published studies. Calculate a summary effects across the collectoin of published studies using a random effects model. Let k denote the number of published studies included in the meta-analysis. For the j-th published study, with j 2 {1, . . . , k}, denote x 0j events out of n 0j persons in the control group, and x 1j events out of n 1j persons in the treatment group.
Step B2: For each individual published study Letp 0j = x 0j /n 0j be the estimate of the rate of events in the control group, andp 1j = x 1j /n 1j be the estimate of the rate of events in the treatment group. Calculate the estimate of the log risk ratio as and its approximate variance as .
The lower and upper bounds of the (1 ↵) confidence interval (with ↵ 2 [0, 1]) of the risk ratio are then defined to be The default confidence level is 95%.
Step B3: For the published studies collectively To get a summary effect of the published studies using a random effects model, the binary outcome data are converted to log risk ratios. ( 2. Estimate the between-studies variance ⌧ 2 as follows: Weight each study by the inverse of the variance. w j = 1/v j . Add up the weights. W = P k j=1 w j . Also calculate the following quantities: where y j is the standardized mean difference in the j-th study. Then an estimator of ⌧ 2 is: 3. For each study, define the total variance as v j + ⌧ 2 . Weight each study by the inverse of the estimated total variance. (Note that the between-studies variance and the withinstudies variances are assumed to be independent of each other.) Add up these weights.
Then a random-effects model effect for the summary log risk ratio . Those examples involve log odds ratios but the procedure is the same for log risk ratios.
Step B4: For the unpublished studies, impute the number of events in the control arms, based on the risk of events in the control arms of the published studies. No random variation is used to impute these numbers. We assume that the rates of events in the control arms are the same across all published and unpublished studies. Let m denote the number of unpublished studies included in the meta-analysis, and letp 0,pub = C/n 0 denote the estimated proportion of events across the control arms of all published studies. For the i-th unpublished study (i 2 {1, . . . , m}), with n 0i persons in the control arm (and n 1i persons in the treatment arm), impute the number of events within the control arm to be x 0i = [n 0ip0,pub ], that is, n 0ip0,pub rounded to the nearest integer. Repeat for all m unpublished studies.
Step B5: Assign risk ratios to all defined outlooks. Defined outlooks include outlooks based on the confidence interval of the log risk ratio of the published studies. The default risk ratios assigned to each of the outlooks are as follows, depending on whether the event is desirable or not. Note that outlooks denoted with "CL" are defined according to a confidence interval of the risk ratio of the published studies collectively. (See Step B3.) Step B6: For each of the unpublished studies, estimate the proportion of events in the intervention arm, based on the risk ratio assigned to the study outlook. Then impute the number of events in the intervention arm randomly from a binomial distribution. Let m t  m be the number of unpublished studies in the meta-analysis with outlook t, and denote the proportion of events in the treatment and control arms of such studies as p 1t and p 0t respectively. Note that P t m t = m. Extract the assigned risk ratio RR t = p 1t /p 0t for studies with outlook t from Table 5. Estimate the proportion of events in the treatment arms by rearranging the formula for risk ratio:p p 0,pub since we have assumed that the true rate of event in the control arm is the same across all studies.) For the i-th unpublished study, having outlook t and treatment sample size n 1i , impute the number of events within the treatment arm to be x 1i = [X], where X is a random variable from a binomial distribution with mean y i = n 1ip1t and variance v i =p 1t (1 p 1t )/n 1i ).
Step B7: For each unpublished study, calculate the standard error of the log risk ratio using the imputed figures for the numbers of events in the control and intervention arms. Calculate a summary effect across the collection of unpublished studies using a random effects model. Calculate a summary effect across all (published and unpublished) studies in the meta-analysis using a random effects model. These summary effects are calculated using the DerSimonian & Laird method, as detailed under Step B3.
Step B8: Graph a forest plot of individual and aggregate results. This is done using the metafor functions forest() and addpoly() [WV10].
Step B9: (Optional) Repeat Steps B1 to B8 for each outlook. The user can generate one plot for each of the ten outlooks defined for unpublished studies (see Table 5) by using the option all.outlooks=TRUE.

For studies with continuous outcomes
Step C1: If the data is in the form of means and standard deviations, convert the data in each study to a standardized mean difference (Hedges' g ).
The following procedure is from [BHHR09].
The following formula for pooled within-groups standard deviation is used: Hedges' g is then Step C2: Subset the published studies. Impute the variance of the SMD of each study using a 'very good' approximator mentioned by Borenstein [CHV09,226]. Calculate the summary SMD for the published studies with a random-effects model using the method by DerSimonian & Laird [DS86]. Let k denote the number of published studies included in the meta-analysis, indexed by j 2 {1, . . . , k}.
Step C3: For each published study Imputing the variance of the SMD can be done as follows if we assume the SMD is equivalent to Hedges' g: Let ⌫ = k 1 and J ⌫ = 1 3/(4⌫ 1). J ⌫ is known as a correction factor for converting Cohen's d to Hedges' g. Assuming the SMD is equivalent to Hedges' g, convert the SMD to Cohen's d using the formula d = g/J ⌫ . Then use a 'very good' approximator of the variance v d of Cohen's d mentioned by Borenstein [CHV09,226]: , where n 0 , n 1 are as defined in Table 6.
The variance v g of Hedges' g is then approximated by the following: v g = J 2 ⌫vd . Then define the lower and upper bounds of the (1 ↵) confidence interval (with ↵ 2 [0, 1]) of the SMD as where y j denotes the SMD of the j-th study, and z 1 ↵/2 is selected such that, for a standard normal random variable Z, P (|Z| > z 1 ↵/2 ) = 1 ↵.
Step C4: For the published studies collectively To get a summary effect of the published studies using a random effects model, use the DerSimonian-Laird method [DS86; BHHR09] to calculate y pub , the standardized mean difference (SMD) across all published studies. This is accomplished using the R package metafor function rma() with the option method=DL. The default confidence level is 95%.
DerSimonian & Laird method The steps of the DerSimonian & Laird method are as the same as under Step B3, except as follows: 1. For each study, calculate the estimate of the variance of the SMD:v j = d var (y j ), where y j is the standardized mean difference in the j-th study. (See Step C3.) Step C5: Assign SMD to all defined outlooks. Defined outlooks include outlooks based on the confidence interval of the SMD of the published studies collectively. The default risk ratios assigned to each of the outlooks are as follows, depending on whether the event is desirable or not. Note that outlooks denoted with "CL" are defined according to a confidence interval of the risk ratio of the published studies collectively. (See Step C4.) Step C6: For each unpublished study, impute the SMD and its variance.
Based on the outlook of the unpublished study, impute the SMD using Table 7. Then employ Borenstein's 'very good' approximator of v d (as we did in Step C3).
Step 7: Calculate a summary effect across the collection of unpublished studies using a random effects model. Calculate a summary effect across all (published and unpublished) studies in the meta-analysis using a random effects model. These summary effects are calculated using the DerSimonian & Laird method, as detailed under Step C4.
Step 8: Graph a forest plot of individual and aggregate results. This is done using the metafor functions forest() and addpoly() [WV10].
Step 9: (Optional) Repeat Steps C1 to C8 for each outlook. The user can generate one plot for each of the ten outlooks defined for unpublished studies (see Table 7) by using the option all.outlooks=TRUE.