 Methodology
 Open Access
 Open Peer Review
 Published:
SAMURAI: Sensitivity analysis of a metaanalysis with unpublished but registered analytical investigations (software)
Systematic Reviewsvolume 3, Article number: 27 (2014)
Abstract
Background
The nonavailability of clinical trial results contributes to publication bias, diminishing the validity of systematic reviews and metaanalyses. Although clinical trial registries have been established to reduce nonpublication, the results from over half of all trials registered in ClinicalTrials.gov remain unpublished even 30 months after completion. Our goals were i) to utilize information available in registries (specifically, the number and sample sizes of registered unpublished studies) to gauge the sensitivity of a metaanalysis estimate of the effect size and its confidence interval to the nonpublication of studies and ii) to develop userfriendly opensource software to perform this quantitative sensitivity analysis.
Methods
The opensource software, the R package SAMURAI, was developed using R functions available in the R package metafor. The utility of SAMURAI is illustrated with two worked examples.
Results
Our opensource software SAMURAI, can handle metaanalytic datasets of clinical trials with two independent treatment arms. Both binary and continuous outcomes are supported. For each unpublished study, the dataset requires only the sample sizes of each treatment arm and the user predicted ‘outlook’ for the studies. The user can specify five outlooks ranging from ‘very positive’ (i.e., very favorable towards intervention) to ‘very negative’ (i.e., very favorable towards control).
SAMURAI assumes that control arms of unpublished studies have effects similar to the effect across control arms of published studies. For each experimental arm of an unpublished study, utilizing the userprovided outlook, SAMURAI randomly generates an effect estimate using a probability distribution, which may be based on a summary effect across published trials. SAMURAI then calculates the estimated summary treatment effect with a random effects model (DerSimonian & Laird method), and outputs the result as a forest plot.
Conclusions
To our knowledge, SAMURAI is currently the only tool that allows systematic reviewers to incorporate information about sample sizes of treatment groups in registered but unpublished clinical trials in their assessment of the potential impact of publication bias on metaanalyses. SAMURAI produces forest plots for visualizing how inclusion of registered unpublished studies might change the results of a metaanalysis. We hope systematic reviewers will find SAMURAI to be a useful addition to their toolkit.
Background
Clinicians, policy makers, and patients rely on the results of clinical trials to make informed decisions about health care. Metaanalyses collate and combine results of clinical trials to provide a quantitative summary of available evidence regarding a specific clinical question. Unfortunately, the nonpublication of trial results can undermine the ability of metaanalysts to accurately estimate a summary treatment effect [1]. In particular, the nonrelease of negative or nonsignificant results may lead to the overestimation of the magnitude of a treatment effect and consequently to false conclusions about the treatment’s efficacy (Figure 1).
For example, a systematic review by Jefferson et al. [2] estimated that 60 percent of patient data from 10 Rocheconducted trials of oseltamivir (sold as ‘Tamiflu’) remained unavailable to reviewers even after two years, despite repeated requests for the data [3]. This high percentage of unavailable information is clearly inadequate for making robust conclusions about the efficacy and the risks of using the drug.
Recent developments to mitigate publication bias due to the nonpublication of studies include i) the formation of public clinical trial registries for the prospective registration of clinical trials, such as ClinicalTrials.gov in the USA and ClinicalTrialsRegister.eu in the European Union, ii) the adoption of a policy by the International Committee of Medical Journal Editors (ICMJE) in 2005 that the journals they oversee would only publish results of clinical trials which have been prospectively registered in a public registry [4], and iii) the passage of the Food and Drug Administration Amendments Act (FDAAA) of 2007 which expanded the requirements for the registry of clinical trials that receive USA federal funding. These measures have led to a dramatic increase in the number and proportion of clinical trials that are prospectively registered [5]. However, researchers are not necessarily obligated to release to the public the results of such trials. Ross et al. [6] found that in a sample of NIHfunded trials registered with ClinicalTrials.gov, less than half were published within 30 months of trial completion. Thus, while systematic reviewers may be aware of trials that have been conducted and have details regarding trial methodologies and sample sizes, they may not have timely access to the results to include them in a metaanalysis.
Our goal was to design and program a pragmatic exploratory tool for the systematic reviewer who wishes to visualize the sensitivity of a metaanalytic summary to the addition of one or more registered, unpublished studies (RUSTs). We wanted our software to be open source, easy to use, and with output easy to understand. Our result, which we introduce in this paper, is the R package Sensitivity Analysis of a Metaanalysis with Unpublished but Registered Analytical Investigations (SAMURAI).
Statistical methods used for gauging the potential impact of publication bias, such as the trim and fill method [7] and the Copas selection model [8], already exist. As far as we know, however, no existing method uses the information available in clinical trial registries to conduct the sensitivity of a metaanalytic summary effect to RUSTs. Therefore, we believe that SAMURAI can be a useful addition to the systematic reviewer’s toolkit.
Methods
Assumptions
In developing the software application, we assumed the following:
i) The metaanalysis consists of randomized clinical trials, all addressing the same research question and each with two independent intervention arms; an experimental arm and a control arm. ii) For each RUST, the clinical trial registry has information on the sample sizes of both treatment arms. iii) Effect rates in the control arms of RUSTs are the same as the pooled event rate across control arms of all published studies in the metaanalysis. iv) Any variation among studies is adequately accounted for by random effects models; no additional information on covariates is necessary in order to explain this variation.
We make no assumptions about the effect size of the experimental arms of RUSTs. Instead, we leave it up to the enduser to decide the anticipated effect size and direction of each RUST. In essence, with our methodology, the enduser acquires the added flexibility and responsibility of predicting anticipated effect sizes and directions.
Description of algorithm for the computer software
The software requires that all published studies in the metaanalytic dataset report their results in a similar format. Either all the studies have binary outcomes or all the studies have continuous outcomes. Studies with binary outcomes should report 2 × 2 tables. Studies with continuous outcomes should all report their results as either i) the mean effect size and its standard deviation within each treatment arm, or ii) the standardized mean difference.
The dataset should also have the sample sizes of both treatment arms for every study, including RUSTs. Entries in ClinicalTrials.gov typically indicate total sample size but not sample sizes for each treatment arm. We recommend simply assuming a 1:1 allocation ratio between the two treatment arms, unless the entry specifies a different allocation ratio.
The enduser assigns each RUST an ‘outlook’ specifying the size and direction of the effect. The enduser can choose outlooks from a list of predefined options ranging from ‘very positive’, for which the RUST is anticipated to heavily favor the experimental intervention, to ‘very negative’, for which the RUST is anticipated to heavily favor the control intervention.
We defined ten such outlooks for RUSTs. For five of these outlooks, the summary effect is associated with a fixed number (which is preset to a default value but can be adjusted by the enduser), whereas the other five outlooks are based on the summary effect across the published studies or its confidence interval.
With binaryoutcome studies, SAMURAI will impute the relative risk of each RUST according to the outlook assigned to it by the enduser (Table 1). With studies having continuous outcomes in the form of means and standard deviations for each arm, the data are converted to standardized mean differences (SMD). For each RUST, SAMURAI will impute a SMD according to the outlook assigned to it by the enduser (Table 2).
In addition, SAMURAI will impute the variance of the SMD of a RUST using Borenstein’s adhoc ‘very good’ approximation [9]. We chose Borenstein’s approximation as a matter of convenience, since it requires only the sample sizes of the treatment arms in the unpublished studies and the summary SMD across the published studies, thereby bypassing the need to impute variances for each treatment arm of a RUST.
Based on the enduser’s outlook selection for a RUST, the software imputes an effect size and confidence interval for that RUST, using the predefined effect size associated with that outlook, along with random noise thrown in to mirror uncertainty in the estimate of the nonpublished effect size. Once the effect sizes and confidence intervals of the RUST are imputed, the software calculates a summary effect using standard metaanalysis methods. By default, the software generates a random effects model using the popular inversevariance method by DerSimonian and Laird [10]. More mathematical details can be found in Additional file 1, which contains pseudoalgorithmic descriptions of the methodology used by SAMURAI.
We opted to use a random effects model since it allows statistical inferences to be made about studies other than those in the metaanalysis. In contrast, statistical inferences made using a fixedeffects model are limited to the studies in the metaanalysis [11].
The software outputs the results as forest plots. Each forest plot includes a summary effect across just the published studies, a summary effect across just the unpublished studies (whose outlooks are chosen by the enduser), and a summary effect across all the studies, published and unpublished, along with the betweenstudy variance τ^{2}.
Software
We developed our software as an R package instead of as a spreadsheet program for the following reasons: i) R is a widely used and freely available language for statistical computing, and ii) R can produce betterlooking and more consistent graphics than a spreadsheet program. In addition, R can readily be used to export graphics to an Adobe Portable Document Format (PDF) file.
Learning to use R may be more difficult than learning how to use a spreadsheet program; however, there are a number of tutorials available online or in print form. We hope that endusers unfamiliar with R will find the worked examples in this article a helpful primer.
Results and worked examples
SAMURAI is available free of charge and it is opensource. The program and code are freely available on the Internet via the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org/web/packages/SAMURAI/index.html. SAMURAI employs functions in the R package metafor[12].
Installing and running SAMURAI
The enduser will first need to install R on their computer, available through CRAN at http://cran.rproject.org. R is available for computers with Linux, Macintosh, or Windows operating systems. Once the enduser has installed R and can run R on their computer, they can install SAMURAI with the following command:
> install.packages(‘SAMURAI’)
(Note that the caret > represents a prompt that is not typed by the enduser.)
Then the enduser can begin using SAMURAI after typing in the following command:
> library(SAMURAI)
Formatting of dataset files
Endusers of SAMURAI will first need to prepare their dataset as a comma separated value (CSV) file with specific column headings. This data file can be created in a spreadsheet program such as the open source LibreOffice Calc or Microsoft Office Excel, then imported into R. Details can be found in Additional file 1.
Worked example 1: trials with binary outcomes
We consider a dataset of trials published from 1990 to 2001 comparing counts of nonhealing of duodenal ulcer in patients on ulcerhealing drug with H. pylori eradication treatment (experimental arm) versus counts of nonhealing in patients on ulcerhealing drug alone (control arm). The example dataset consists of 33 published studies listed in a metaanalysis by Ford et al. [13] (Table 3).
For the purposes of illustration, we will pretend that a subset of these studies were registered but never published, i.e., RUSTs. For each RUST, we remove information on the number of events, but leave in the sample sizes.
A version of the dataset is included in the SAMURAI package as the dataset Hpylori, which differs from the original dataset in Ford et al. [13] in that seven of the 33 studies are treated as unpublished. The enduser may load and view this modified dataset by typing in the following commands at the R prompt:
> data(Hpylori)
> Hpylori
The second of these commands displays the dataset.
The sample sizes of the control and experimental arms are in the columns labeled ctrl.n and expt.n, respectively. The number of events in the control and experimental arms are in the columns labeled ctrl.events and expt.events, respectively.
To generate a forest plot (not shown) for the dataset, we can give the following command:
> forestsens(table=Hpylori, binary=TRUE, higher.is.better=FALSE, scale=0.8)
For reproducible results, one may freeze the random variation by specifying a seed for the random number generator, as in the following example:
> forestsens(table=Hpylori, binary=TRUE, higher.is.better=FALSE, scale=0.8, random.number.seed=106)
With random.number.seed=106 the summary relative risk across all 33 studies is 0.64, with a 95% confidence interval of (0.49, 0.82) (Figure not shown). Endusers may get slightly different results depending on which, if any, random number seed they specify.
Since the outcome being measured is binary, we specified binary=TRUE. Furthermore, the events in the dataset indicate the number of patients not healed; we thereby specify higher.is.better=FALSE. Specifying the scale parameter is optional, but in this case helps our plot look neater by reducing font sizes to 80% of the default font size.
Suppose we want to modify the outlooks of all unpublished studies to, say, ‘very negative’. We can do this with the following command:
> forestsens(table=Hpylori, binary=TRUE, higher.is.better=FALSE, scale=0.8, random.number.seed=106, outlook=‘very negative’ )
In order to illustrate how an enduser would actually examine the effect of RUSTs and their assumptions of outlooks on the summary effect size, Figure 2 presents the forest plot output from SAMURAI when the last seven studies are considered as RUSTs and the outlook chosen for the ‘unpublished’ studies is ‘very negative’. We see that the effect size now does not significantly differ from 1.
Note that in Figure 2, the relative risks for unpublished studies have values that are close to but not exactly 3, which is the relative risk value assigned to ‘very negative’ studies with binary events for ‘bad’ outcome variables.
As a sensitivity analysis, we compare the summary effect sizes and their 95% confidence intervals under different scenarios. For the purposes of illustration, we pretend that varying subsets of these studies were registered but never published; using the original dataset in Ford et al. [13], we set studies published from 1995 onwards as RUSTs and incrementally consider as published the studies in subsequent years, until reaching 2001. We then create a sensitivity plot of effect sizes as more studies are considered published. We examine the scenarios under which all RUSTs are assigned the same outlook, ranging from ‘very positive’ to ‘very negative’. Figure 3 displays the variation in the estimated effect sizes as studies from 1995 onwards are considered published or not.
One can generate a forest plot for each of the ten outlooks by specifying the option all.outlooks=TRUE, which will assign the same outlook to all RUSTs:
> forestsens(table=Hpylori, binary=TRUE, higher.is.better=FALSE, scale=0.8, random.number.seed=106, all.outlooks=TRUE )
We can put all of these plots into a single PDF file (in this case with the name filename.pdf) with the following commands:
> pdf(‘filename.pdf’)
> forestsens(table=Hpylori, binary=TRUE, higher.is.better=FALSE, scale=0.8, random.number.seed=106, all.outlooks=TRUE )
> dev.off()
When the parameter all.outlooks=TRUE is specified, the function forestsens also generates a table of overall summary effects, their confidence intervals (from the lower confidence limit lcl to the upper confidence limit ucl), and an estimate of τ^{2} (tau2), which is a measure of heterogeneity between studies.
The forestsens function allows the enduser to override the default relative risk assigned to any outlook. For example, if we want to change the relative risks of ‘very negative’ studies from the default of 3 to 2.5, and to change the relative risks of ‘negative’ studies from 2 to 1.5, we can do so as follows:
> forestsens(table=Hpylori, binary=TRUE, higher.is.better=FALSE, scale=0.8, random.number.seed=106, all.outlooks=TRUE, rr.neg=1.5, rr.vneg=2.5 )
The results are included in Table 4.
Worked example 2: trials with continuous outcomes
A systematic review by Jurgens et al. [14] included a metaanalysis of the effect of green tea consumption on weight loss using 14 placebocontrolled randomized trials published between 2004 and 2010.
Of these fourteen studies, we shall, for the purposes of our example, arbitrarily treat the three studies published from 2009 onward as RUSTs. Thus the dataset greentea, included in the SAMURAI package, contains 11 published studies and three RUSTs (Table 5). This dataset can be loaded into memory and viewed by typing in the following commands:
> data(greentea)
> greentea
The sample sizes of the control and experimental arms are in the columns labeled ctrl.n and expt.n, respectively. The mean weight loss in control and experimental arms are in the columns labeled ctrl.mean and expt.mean, respectively. Their respective standard deviations are ctrl.sd and expt.sd.
We can generate a forest plot for the dataset with the following command:
> forestsens(greentea, binary=FALSE, mean.sd=TRUE, higher.is.better=FALSE)
Since the outcome being measured is continuous (and hence not binary), we specified binary=FALSE. Furthermore, since the outcome data are in the form of means and standard deviations, we specify mean.sd=TRUE.
In this example, a more negative change in weight is desired. That is to say, we desire that the weight (outcome) in the experimental arm will be lower than the weight (outcome) in the control arm. When a lower outcome is desired, as in this case, choose the option higher.is.better=FALSE (conversely, if a higher outcome is desired, choose the option higher.is.better=TRUE).
Again, to make these results reproducible, we can specify a random.number.seed with any integer, as in the following example:
> forestsens(greentea, binary=FALSE, mean.sd=TRUE, higher.is.better=FALSE, random.number.seed=52 )
Suppose we want to modify the outlooks of all unpublished studies to, say, ‘negative’. We can do this with the following command:
> forestsens(greentea, binary=FALSE, mean.sd=TRUE, higher.is.better=FALSE,random.number.seed=52, outlook=‘negative’ )
Note that in Figure 4, the SMD for each unpublished study is close to but not exactly 0.3, which is the SMD assigned to ‘negative’ studies when a lower outcome is preferable.
As a sensitivity analysis, we compare the summary effect sizes and their 95% confidence intervals under different scenarios. As was done for the Hpylori dataset in Worked Example 1 (Figure 3), we pretend that varying subsets of the studies in the greentea dataset were registered but never published; using the original dataset in Jurgens et al. [14], we set studies published from 2006 onwards as RUSTs and incrementally consider as published the studies in subsequent years, until reaching 2010. We then create a sensitivity plot of effect sizes as more studies are considered published. We examine the scenarios under which all RUSTs are assigned the same outlook, from ‘very positive’ to ‘very negative’. Figure 5 displays the variation in the estimated effect sizes as studies from 2006 onwards are considered published or not.
We can generate a forest plot for each of the ten outlooks in Table 2 with the option all.outlooks=TRUE.
> forestsens(greentea, binary=FALSE, mean.sd=TRUE, higher.is.better=FALSE, random.number.seed=52, all.outlooks=TRUE )
This command also generates a table of overall SMD’s, their confidence intervals (from the lower confidence limit lcl to the upper confidence limit ucl), and an estimate of τ^{2} (tau2), which is a measure of heterogeneity between studies (Table 6).
Results and discussion
Discussion of worked examples
The sensitivity analyses in the worked examples indicate that successful approximation of the actual summary effect depends on i) which outlooks the enduser selects (Tables 4 and 6) and ii) the ratio of number of published studies to RUSTs (Figures 3 and 5).
Figures 3 and 5 in the worked examples illustrate how close the estimated summary effect may come to the actual summary effect. This accuracy, however, depends largely on the enduser’s success in predicting the outcomes of registered, unpublished studies. As with any software, the ‘garbage in, garbage out’ principle applies to the use of SAMURAI. The enduser has the flexibility of choosing anticipated effect sizes and directions but also thereby assumes the burden of responsibility for making these choices.
Worked Example 1
By the year 2001, all 33 studies in the original dataset compiled by Ford et al. [13] had been published. As determined by the DerSimonianLaird method, the summary relative risk was 0.66, and its 95% confidence interval was (0.58, 0.76) (Figure 3). The relative risk and its confidence interval are below 1.0, indicating that the proportion of patients not healed by the experimental intervention was lower than the proportion of patients not healed by the control intervention (recall that the events counted in this dataset represent numbers of patients not healed). Thus, the summary relative risk favors the experimental treatment.
Now, suppose it is the year 1998, when five of the studies remain unpublished. Table 4 shows that treating all studies published on or after 1998 as RUSTs with a ‘negative’ or ‘very negative’ outlook leads to an overall summary effect with a confidence interval that straddles a relative risk of 1.0, which corresponds to having no statistically significant difference between the two treatment arms beyond a 0.05 level. After changing the relative risk for a ‘negative’ study from the default value of 2 to 1.5, we see that having all unpublished studies with a ‘negative’ outlook no longer straddles a relative risk of 1.0. Thus, we see that the summary effect size is sensitive to the outlooks chosen by the enduser.
We can also see in Figure 3, that choosing a ‘very negative’ outlook would have suggested the nonsignificance of a treatment effect as early as 1998. Similarly, choosing a ‘negative’ outlook for all RUSTs would have suggested the nonsignificance of the treatment effect as early as 1997, when seven of the studies were unpublished.
Now suppose it is the year 1995, when 15 of these studies have been published while 18 studies yet remain as RUSTs, registered but unpublished. If we then choose to assign these 18 RUSTs the outlook ‘very negative’, then the summary relative risk and its 95% confidence interval would be above 1.0, thereby favoring the control intervention (as the proportion of patients not healed by the experimental intervention would exceed the proportion of patients not healed by the control intervention). If we instead elected to assign all 18 RUSTs the outlook ‘negative’, then the 95% confidence interval of the summary relative risk would straddle 1.0, suggesting there would be no statistically significant difference between the control and the experimental interventions. On the other hand, had we chosen ‘no effect’, ‘positive’, or ‘very positive’, then the summary relative risk and its 95% confidence interval would still be below 1.0, thereby favoring the experimental intervention. Thus, we see that the summary effect size is also sensitive to the ratio of published studies to RUSTs.
Worked Example 2
The summary treatment effect calculated for the original dataset compiled by Jurgens et al. [14] is –0.61, with a 95% confidence interval of (–1.10, –0.11). A standardized mean difference of zero corresponds to having no detectable difference between treatments. This result suggests that participants in the experimental arm lost more weight than participants in the control arm.
Table 6 shows that treating all studies published on or after 2009 as RUSTs with a ‘very positive’ or ‘positive’ outlook leads to an overall summary effect with a confidence interval that straddles an SMD of 0.0, which corresponds to having no statistically significant difference between the two intervention arms. In contrast, choosing any of the other eight outlooks does not result in a confidence interval that straddles 0.0. Thus, we see that the summary effect size is sensitive to the outlooks chosen by the enduser.
In Figure 5, we can see that successful approximation of the actual summary effect depends on which outlooks the end selects and on the ratio of number of published studies to RUSTs. Choosing a ‘very negative’ or ‘negative’ outlook would have suggested a statistically significant treatment effect until 2009, when only one RUST was left. Choosing a ‘no effect’ outlook would not have suggested a statistically significant treatment effect until 2008 when three RUSTs remained. In contrast, choosing a ‘positive’ or ‘very positive’ outlook would have suggested a statistically significant treatment effect at least as early as 2006.
Comparison with existing methods and software
Out of all his numerous press conferences after 9/11, perhaps the best known quote by former US Secretary of Defense Donald Rumsfeld was this Socratic idea: “There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say, we know there are some things we don’t know. But there are also unknown unknowns; the ones we don’t know we don’t know.”
Prior to the establishment of clinical trial registries, assessing the impact of unpublished studies on a metaanalysis has entailed making broad assumptions about the distribution of ‘unknown unknowns’. The trim and fill method by Duval & Tweedie [7] works under the assumption that the distribution of all effect sizes, published and unpublished, is symmetrical around the true mean. Selection models, such as the Copas selection model [8], assume that publication is conditionally dependent on effect size, i.e., the data is assumed to be missing at random.
The establishment of clinical trial registries now hold the promise of reducing the proportion of ‘unknown unknown’ effect sizes determined by unpublished studies and increasing the proportion of ‘known unknown’ effect sizes determined by RUSTs. However, this ideal is far from the current reality, for laws and regulations have not been thorough enough to ensure full and timely disclosure of completed trial results.
Discrepancies in reporting standards have allowed reporting bias to continue. Some data elements required by the ICMJE are optional for ClinicalTrials.gov and for the FDAAA of 2007, including study completion date and reasons why a study was stopped [15]. Also, the ICMJE requires registration prior to enrollment of the first participant, while ClinicalTrials.gov allows registration ‘at any time’, even after study completion. Ross et al. [16] found that out of a sample of registered trials completed prior to 2004, only 60% had their results published within 4 years. Mathieu et al. [17] found prevalence of selective reporting among adequately registered published studies, as evidenced by discrepancies between study outcomes registered and study outcomes published in 31% of those studies.
Furthermore, the increasing number of clinical trials being conducted outside of the US contributes to retrieval bias. FDAAA 801 only applies to ‘applicable clinical trials’, including trials with at least one site in the US. As clinical trials are increasingly held overseas, it is unclear how quickly the laws of other nations will catch up and require sufficient registration of these as well. The Trial and Experimental Studies Transparency Act of 2012 was introduced to the US Congress in May 2012 for the purpose of closing loopholes in the FDAAA; as of December 2013, it remains in committee, not having yet been approved for a vote on the House floor.
In the current landscape of clinical trial registries with its mix of ‘known unknowns’ and ‘unknown unknowns’, modeling may become too complicated to implement. Copas notes the limitations of selection models: “[P]ublication may depend on study size as well as many other features of the study’s design and outcome. However, attempts to fit more complicated selection models seem problematical, since when the number of studies is small (as is often the case in practice) the information in the data is very limited. No model will reflect all the reasons why some papers are selected and some are not” [18].
We have designed SAMURAI to incorporate enrollment information (required by FDAAA), but have forgone the theoretical rigor of previously existing methods in order to pursue the pragmatic goal of developing an exploratory tool for systematic reviewers (including nonstatisticians) wishing to illustrate the potential, if not necessarily the most probable, impact of ‘known unknown’ RUSTs on a metaanalytic summary effect. That is, SAMURAI does not on its own conduct a probabilistic sensitivity analysis, but rather, it produces output that could be used in, say, a ‘bestcase, worstcase’ analysis. As far as we know, there exists no other software that allows the enduser to include sample sizes of unpublished studies. However, we have largely left it up to the systematic reviewer as to how to address issues stemming from the unfulfilled potential of clinical trial registries.
We have designed SAMURAI on the premise that the requirements by the ICJME will eventually be incorporated into law. Thus we make the following assumptions: i) all studies will be registered before commencement; ii) all registered studies are easily accessible to the systematic reviewers; and iii) all results found by registered studies will be released in a timely manner. We thereby encourage systematic reviewers to use SAMURAI to make interim metaanalyses but not for drawing final conclusions. As noted by Deborah Zarin, director of ClinicalTrials.gov, “[J]ournals will continue to add value by publishing useful and readable trial reports that clinicians, the media, and patients can interpret and use… [T]he results disclosed for the FDA will not have been externally peer reviewed and will be preliminary” [19].
While the default methodology of SAMURAI is ad hoc, SAMURAI also allows endusers to impute effect sizes with values of their choosing rather than with the default values. The enduser may elect to use a model with assumptions about ‘unknown unknowns’ to generate these values; in so doing the enduser should be knowledgeable of the assumptions associated of that model.
While we hope that the consideration of RUSTs will, in the long term, sufficiently address publication bias, we acknowledge that the current reality is far from ideal. And while our approach leaves much of the burden of imputation up to the systematic reviewer, we hope that this burden will be lifted by regulations requiring more transparency on the part of study investigators.
No matter what approach is taken, the systematic reviewer should keep in mind that “high quality syntheses require considerably more than just the application of quantitative procedures” [16]. It is still incumbent upon systematic reviewers to examine clinical trials for other kinds of biases, such as trial designs that rig experimental treatment dosages to be much larger than control treatment dosages [20]. Thus, the estimates of confidence intervals presented in forest plots generated using SAMURAI should be regarded with some caution. As Copas notes, “If we see the aim of sensitivity analysis in terms of an informal warning of how sensitively the conclusions from a metaanalysis can depend on selection, then, arguably, the numerical accuracy of these intervals for specific values of P <1 is not particularly important” [18]. After all, ‘known unknowns’ are still unknowns.
Assumptions and limitations
Pigott differentiates between the following types of missing data: missing studies, missing effect sizes, and missing covariates [21]. SAMURAI handles missing effect sizes under the premise that all relevant missing studies have been identified. SAMURAI does not handle missing covariates, but rather assumes that a random effects model without covariates adequately accounts for variation between studies.
Some additional assumptions were listed in the Methods section. As a result, the justification or utility of the use of SAMURAI may be weakened in cases where i) there exist a large number of unpublished and unregistered trials, ii) event rates of control arms of unpublished studies with binary outcomes may be substantially different from the event rate across the control arms of published studies, or iii) heterogeneity among studies is severe enough to question whether stratification of studies would be more appropriate (an easy workaround for severe heterogeneity, however, may be to compute a summary effect for each strata).
DerSimonian and Kacker [22] have proposed alternative random effects modeling approaches as improvements upon the commonly used DerSimonianLaird method. Future software could implement the more complex but less widespread methods they proposed.
Conclusions
With the increase in the number of registered clinical trials, systematic reviewers are now more likely to acquire evidence of the existence of unpublished studies. However, they have not yet had a statistical approach to incorporate this information into quantitative analyses. SAMURAI could prove a useful tool for reviewers to integrate information from unpublished studies to assess the potential impact of publication bias on the results of metaanalyses.
Availability and requirements
Project name: SAMURAI
Project home page: http://cran.rproject.org/web/packages/SAMURAI/index.html
Operating system(s): Platform independent
Programming language: R
License: GNU GPL 2 or 3
Package installation
We have written SAMURAI as an R package, and have made the SAMURAI package available free of charge under an open source General Public License (GPL). Users of SAMURAI will also need to install and load the existing R package metafor [11].
Using R GUI or RStudio, one can install these packages with the following commands:
> install.packages(‘metafor’)
> install.packages(‘SAMURAI’)
Alternatively, one may download the SAMURAI and metafor packages from the Comprehensive R Archive Network (CRAN) at the following websites:
Abbreviations
 FDAAA:

Food and Drug Administration Amendments Act
 ICMJE:

International Committee of Medical Journal Editors
 RUST:

Registered unpublished study
 SMD:

Standardized mean difference.
References
 1.
Thornton A, Lee P: Publication bias in metaanalysis: its causes and consequences. J Clin Epid. 2000, 53: 207216. 10.1016/S08954356(99)001614.
 2.
Jefferson T, Jones MA, Doshi P, Del Mar CB, Heneghan CJ, Hama R, Thompson MJ: Neuraminidase inhibitors for preventing and treating influenza in healthy adults and children. Cochrane Database Syst Rev. 2012, 1: CD008965
 3.
Goodlee F: Open letter to Roche about oseltamivir trial data. BMJ. 2012, 345: e730510.1136/bmj.e7305.
 4.
De Angelis CD, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, Kotzin S, Laine C, Marusic A, Overbeke AJ, Schroeder TV, Sox HC, Van Der Weyden MB: Is this clinical trial fully registered? A statement from the international committee of medical journal editors. Lancet. 2005, 365: 18271829. 10.1016/S01406736(05)665889.
 5.
Zarin DA, Tse T, Ide NC: Trial registration at ClinicalTrials.gov between May and October 2005. N Engl J Med. 2005, 353: 27792787. 10.1056/NEJMsa053234.
 6.
Ross JS, Tse T, Zarin DA, Xu H, Zhou L, Krumholz HM, Hines HH: Publication of NIH funded trials registered in ClinicalTrials.gov: cross sectional analysis. BMJ. 2012, 344: d729210.1136/bmj.d7292.
 7.
Duval S, Tweedie R: Trim and fill: a simple funnelplotbased method of testing and adjusting for publication bias in metaanalysis. Biometrics. 2000, 56: 455463. 10.1111/j.0006341X.2000.00455.x.
 8.
Copas JB: What works? Selectivity models and metaanalysis. J Roy Stat Soc Ser A (Stat Soc). 1999, 162: 95109. 10.1111/1467985X.00123.
 9.
Borenstein M: Effect sizes for continuous data. The Handbook of Research Synthesis and MetaAnalysis. Edited by: Cooper H, Hedges LV, Valentine JC. 2009, New York: Russell Sage, 221235. 2
 10.
DerSimonian R, Laird N: Metaanalysis in clinical trials. Control Clin Trials. 1986, 7: 177188. 10.1016/01972456(86)900462.
 11.
Ellis PD: The Essential Guide to Effect Sizes. 2010, Cambridge: Cambridge University Press
 12.
Viechtbauer W: Conducting metaanalyses in R with the metafor package. J Stat Software. 2010, 36: i03
 13.
Ford AC, Delaney B, Forman D, Moayyedi P: Eradication therapy for peptic ulcer disease in Helicobacter pylori positive patients. Cochrane Database Syst Rev. 2011, 1: CD003840
 14.
Jurgens TM, Whelan AM, Killian L, Doucette S, Kirk S, Foy E: Green tea for weight loss and weight maintenance in overweight or obese adults. Cochrane Database Syst Rev. 2011, 12: CD008650
 15.
ClinicalTrials.gov Protocol Data Element Definitions. 2013, [http://prsinfo.clinicaltrials.gov/definitions.html]
 16.
Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM: Trial publication after registration in ClinicalTrials.gov: a crosssectional analysis. PLoS Med. 2009, 6: e100014410.1371/journal.pmed.1000144.
 17.
Mathieu S, Boutron I, Moher D, Altman DG, Ravaud P: Comparison of registered and published primary outcomes in randomized controlled trials. JAMA. 2009, 302: 1532
 18.
Copas JB: A likelihoodbased sensitivity analysis for publication bias in metaanalysis. J Roy Stat Soc Ser C (Appl Stat). 2013, 62: 4766. 10.1111/j.14679876.2012.01049.x.
 19.
Zarin D: Update on FDAAA from ClinicalTrials.gov: Basic Results Reporting at ClinicalTrials.gov and ‘Prior Publication’. 2008, [http://www.icmje.org/newsandeditorials/update_fdaaa_jun2008.html]
 20.
Angell M: The Truth About the Drug Companies: How They Deceive Us and What to Do About It. 2004, New York: Random House
 21.
Pigott TD: Handling missing data. The Handbook of Research Synthesis and MetaAnalysis. Edited by: Cooper H, Hedges LV, Valentine JC. 2009, New York: Russell Sage, 399416. 2
 22.
DerSimonian R, Kacker R: Randomeffects model for metaanalysis of clinical trials: an update. Contemp Clin Trials. 2007, 28: 105114. 10.1016/j.cct.2006.04.004.
Acknowledgements
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/20072013) under grant agreement n°282574. We thank Guido Schwarzer for peerreviewing the code of a preliminary version of SAMURAI and for making helpful suggestions. We also thank Gerta Rucker and Wynanda van Enst for their time and thoughtful comments as referees of this paper.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
NYK, conception and design, data collection and analysis, manuscript writing, critical revision, final approval of the manuscript. SIB, conception and design, data collection and analysis, manuscript writing, critical revision, final approval of the manuscript. KT, conception and design, critical revision, final approval of the manuscript. GG, conception and design, data collection and analysis, manuscript writing, critical revision, final approval of the manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Clinical trial registries
 Metaanalysis
 Publication bias
 Rsoftware
 Sensitivity analysis
 Statistical software
 Unpublished studies
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Please note that comments may be removed without notice if they are flagged by another user or do not comply with our community guidelines.