- Open Access
- Open Peer Review
Network meta-analysis incorporating randomized controlled trials and non-randomized comparative cohort studies for assessing the safety and effectiveness of medical treatments: challenges and opportunities
Systematic Reviewsvolume 4, Article number: 147 (2015)
Network meta-analysis is increasingly used to allow comparison of multiple treatment alternatives simultaneously, some of which may not have been compared directly in primary research studies. The majority of network meta-analyses published to date have incorporated data from randomized controlled trials (RCTs) only; however, inclusion of non-randomized studies may sometimes be considered. Non-randomized studies can complement RCTs or address some of their limitations, such as short follow-up time, small sample size, highly selected population, high cost, and ethical restrictions. In this paper, we discuss the challenges and opportunities of incorporating both RCTs and non-randomized comparative cohort studies into network meta-analysis for assessing the safety and effectiveness of medical treatments. Non-randomized studies with inadequate control of biases such as confounding may threaten the validity of the entire network meta-analysis. Therefore, identification and inclusion of non-randomized studies must balance their strengths with their limitations. Inclusion of both RCTs and non-randomized studies in network meta-analysis will likely increase in the future due to the growing need to assess multiple treatments simultaneously, the availability of higher quality non-randomized data and more valid methods, and the increased use of progressive licensing and product listing agreements requiring collection of data over the life cycle of medical products. Inappropriate inclusion of non-randomized studies could perpetuate the biases that are unknown, unmeasured, or uncontrolled. However, thoughtful integration of randomized and non-randomized studies may offer opportunities to provide more timely, comprehensive, and generalizable evidence about the comparative safety and effectiveness of medical treatments.
Many medical conditions exist for which there are multiple treatment options. Meta-analysis is a widely used approach for aggregating results from multiple studies to provide more robust evidence on the safety and effectiveness of various treatments . However, evidence based on pair-wise meta-analysis only considers two treatments at a time. Accordingly, new meta-analytic methods have emerged to permit simultaneous comparison of multiple treatment options across studies that compare two or more treatments. These methods are most commonly referred to as network meta-analysis (NMA) [2, 3].
Although earlier NMAs only included randomized controlled trials (RCTs) , recent NMAs have begun to consider both RCTs and non-randomized studies [5–9]. In this paper, we describe NMA involving both RCTs and non-randomized comparative cohort studies—defined as cohort studies that compare two or more treatment alternatives (which may include usual care or no treatment) using observational data. We discuss some of the promises and challenges, highlight the potential application of NMA in multi-center distributed data networks, and offer insights on opportunities for improving the application of this methodology.
Introduction to network meta-analysis
A network meta-analysis (sometimes called mixed or multiple treatments meta-analysis) is a method for comparing more than two interventions, some of which may not have been compared directly head-to-head in the same study (Fig. 1) [2, 3, 9–13]. The key assumption underlying any NMA is exchangeability of the studies [2, 3, 14]. That is, all studies measure the same underlying relative treatment effects, and any observed differences are due to chance. Stated another way, all treatments included in the NMA could have been included in the same study, and treatments are genuinely competing interventions [2, 3, 14]. For example, in Fig. 1, AC trials do not have B arms and AB trials do not have treatment C arms; however, the assumption underlying a NMA is that if an AB trial would have included a C arm, it would measure the same underlying relative effect for AC as the AC trials included in the network.
To assess exchangeability, one can collect information about the studies and carefully consider whether they appear similar enough to be compared based on inspection of this information (Fig. 1) [2, 3, 14]. Although this approach is intuitive, it can sometimes be subjective. Another way to assess exchangeability is to compare the event rate in the common treatment arm(s) [2, 3, 14]. Similar event rates may provide some reassurance that the populations are comparable. However, even if the rates differ, the exchangeability assumption may still hold if the populations do not differ in characteristics that are modifiers of the treatment effect.
Lack of exchangeability in NMA can produce discrepancy in the treatment effect estimated from direct (solid lines in panel A of Fig. 1) and indirect evidence (dashed lines in panel a of Fig. 1), sometimes also known as inconsistency . There are various statistical methods to evaluate inconsistency when closed loops are available (i.e., both direct and indirect evidence are available to allow a comparison), although issues such as low statistical power may limit the applicability of some of these methods .
Rationale and caveats for including non-randomized studies in NMA
With a sufficiently large sample, well-designed RCTs are expected to achieve high internal validity by balancing all measured and unmeasured prognostic factors across intervention groups through random allocation [11, 16]. However, RCTs are not without their limitations. They often have short follow-up time, small sample size, highly selected population, high cost, and ethical constraints to study certain treatments or populations. Well-designed, high-quality non-randomized studies can complement RCTs or address some of their limitations (Table 1) [17–20]. These studies may have longer follow-up time, larger sample size, and more generalizable populations who receive various treatments in real-world settings.
When considering the inclusion of both RCTs and non-randomized studies in NMA, the quality of evidence underpinning a network should be carefully assessed for each pair-wise comparison in the network. Non-randomized studies are vulnerable to several biases, including confounding which occurs when treatment groups differ in their underlying risk for the outcome [21–23]. Studies that do not appropriately account for confounding factors may therefore produce biased effect estimates (Fig. 2) . Therefore, the inclusion of non-randomized studies in NMA requires careful consideration of the validity of the studies. The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) working group has developed a framework for assessing the quality of evidence from non-randomized studies in the context of NMA . Other guidelines, such as the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines  and the guidelines for good pharmacoepidemiology practices , also offer useful guidance to assess the quality of non-randomized studies. It is still important to carefully assess potential treatment effect modifiers even in high-quality non-randomized studies.
Another important issue to consider is whether the non-randomized studies address the same research questions or estimate the same treatment effects as the RCTs. The most commonly used analytic approach in RCTs is the intention-to-treat approach, which estimates the effect of treatment initiation. Other analyses that can be done in RCTs or non-randomized studies include as-treated analysis (which compares the treatments that the patients actually receive), per-protocol analysis (which includes only patients who adhere to the trial protocol), and other analyses such as inverse probability weighting that appropriately account for time-varying confounding . Depending on analytic methods used, non-randomized studies that compare the same treatment alternatives may produce treatment effects that are valid but different from that estimated in the RCT [28–31].
Network meta-analysis of RCTs and non-randomized studies
There are various approaches for combining RCTs and non-randomized studies in NMA [9, 13, 32, 33]. Naïve pooling of all randomized and non-randomized study-level data, using either frequentist or Bayesian NMA methods, is the simplest approach and does not differentiate between two study designs .
Another way to include non-randomized studies in NMA is to use them as prior information or in the form of a hierarchical model that allows for bias adjustment . When incorporating them as prior information, non-randomized studies are analyzed separately and results are then used as prior information for the RCT model. The potential biases associated with non-randomized data can be modeled by adjusting the prior distribution. To downweigh the non-randomized information, the variance parameter can be inflated; to adjust for overestimation or underestimation of the treatment effect, the mean of the prior information can be shifted.
Another approach—a Bayesian hierarchical model—is generally considered the most flexible [9, 13, 32, 33]. A Bayesian hierarchical model is a statistical model that estimates the parameters of the posterior distribution using the Bayesian method [9, 13, 32, 33]. In the model, a study-design level (e.g., RCT, non-randomized study) is introduced [9, 13, 32, 33]. This approach allows for bias adjustments discussed above as well as direct comparison of study design-specific estimates to overall estimates. For example, evidence from individual studies of the same design can first be combined to produce study-design level estimates; the study-design level estimates can then be combined to obtain overall estimates [9, 13, 32, 33]. It also gives an estimate of consistency between study designs. There is limited published research in this area, especially the latter two approaches. Furthermore, there is a lack of consensus on what degree of bias adjustment to apply to non-randomized studies.
Figure 3 presents scenarios that may occur when combining RCTs and non-randomized studies in NMA. In some cases (e.g., drug B versus drug A), findings from non-randomized studies align with those reported in RCTs. In other situations (drug D versus drug C), the findings reported in the non-randomized studies do not align with those reported in RCTs. Investigators and decision makers are generally more likely to have confidence in estimates in the scenario where findings from both study designs are consistent compared with the scenario where there are discrepancies. However, the discrepancies may yield insight regarding biases in the non-randomized studies (e.g., residual confounding), effect modification by specific patient characteristic, or differences in various treatment effects (e.g., intention-to-treat effects and as-treated effects) that may not have been noticed had both study designs not been considered.
Incorporation of both RCTs and non-randomized studies into NMA typically requires considerably more time, effort, and costs compared to including only RCTs. The decision to include non-randomized studies should carefully consider the expected additional benefits given the additional time, effort, and costs. Restricting the analysis to specific types of non-randomized design or analysis (i.e., propensity score matching) may sometimes reduce time, effort, and costs to conduct NMA but may introduce bias due to exclusion of otherwise eligible studies.
Network meta-analysis of non-randomized studies in large distributed data networks
Over the past number of years, we have seen an increase in the development of distributed data networks to assist in conducting non-randomized studies. In the USA, the Mini-Sentinel program  has developed a distributed network of 18 data partners with information from over 178 million individuals , while the Canadian Network for Observational Drug Effect Studies (CNODES)  includes health and prescription records of over 40 million people from eight jurisdictions in Canada and abroad. Other examples of distributed networks include the “Exploring and Understanding Adverse Drug Reactions by integrative mining of clinical records and biomedical knowledge” (EU-ADR) project in Europe  and the Asian Pharmacoepidemiology Network (AsPEN) . These networks permit comparative safety and effectiveness assessment of medical products across multiple databases without creation of a central data warehouse [34, 36, 39].
Both pair-wise meta-analysis and NMA are well-suited for distributed data networks. Traditionally, non-randomized studies for meta-analysis are identified by systematic review of published and unpublished studies. However, these studies often include a broad array of studies with different study questions, study designs, analytic methods, and completeness of information. Combining such heterogeneous information in meta-analysis can sometimes be problematic and challenging. On the other hand, the studies performed in distributed data networks often use common protocols, data models, or both, which improves the comparability of analysis performed at each site [34, 36, 39]. Both CNODES and Mini-Sentinel have used pair-wise meta-analysis to combine data across data sources [36, 40–43]. NMA is well-suited for incorporating data from these networks when the study compares multiple treatment options, as in a Mini-Sentinel assessment of anti-hyperglycemic agents and acute myocardial infarction .
Further, access to data from large distributed data networks may allow more detailed assessment and adjustment for heterogeneity and inconsistency. Larger sample sizes derived from these networks will allow detailed assessment of the benefits and harms of treatments in sub-populations that may have been understudied in RCTs. Further, access to patient-level data will facilitate the conduct of meta-regression analyses to adjust for differences in characteristics between studies. This may be particularly important, because even if the estimate from a non-randomized study is unbiased, the population may differ from those studied in RCTs.
Currently, data from most distributed data networks are only available to those involved in the networks; future work is needed investigating the advantages and disadvantages of making de-identified or summary-level data from these networks more accessible for analysis by others.
Discussion and conclusions
The interest in and need for incorporating both RCTs and non-randomized studies in NMA will likely increase in the future due to the growing need to assess multiple treatments simultaneously, improvement in the quality and validity of non-randomized data and analytic methods, and the global movement towards progressive licensing  and product listing agreements  where information on a medical product is monitored throughout its life cycle for regulatory and reimbursement purposes. Incorporating both types of data in NMA may improve precision, allow for a wider array of treatments to be considered (i.e., expand network or connect otherwise “disconnected network”), and provide real-world and more generalizable evidence on the risks and benefits of medical treatments. However, the inclusion of low-quality, non-randomized studies with inadequate control for biases may threaten the validity of the NMA findings. More studies are needed to compare the validity of different approaches that combine RCTs and non-randomized studies in NMA. Although the inclusion of both types of data in NMA poses several methodological challenges, it also offer promises to provide more timely, comprehensive, and generalizable evidence on the comparative safety and effectiveness of medical treatments.
Canadian Network for Observational Drug Effect Studies
randomized controlled trials
Sutton a J, Abrams KR. Bayesian methods in meta-analysis and evidence synthesis. Stat Methods Med Res. 2001;10:277–303.
Caldwell DM, Ades a E, Higgins JPT. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ. 2005;331:897–900.
Dias S, Sutton AJ, Ades a E, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Making. 2013;33:607–17.
Nikolakopoulou A, Chaimani A, Veroniki AA, Vasiliadis HS, Schmid CH, Salanti G. Characteristics of networks of interventions: a description of a database of 186 published networks. PloS ONE. 2014;9:1–10.
Hutton B, Joseph L, Fergusson D, Mazer CD, Shapiro S, Tinmouth A. Risks of harms using antifibrinolytics in cardiac surgery: systematic review and network meta-analysis of randomised and observational studies. BMJ. 2012;345:e5798–8.
Vlaar PJ, Mahmoud KD, Holmes DR, van Valkenhoef G, Hillege HL, van der Horst ICC, et al. Culprit vessel only versus multivessel and staged percutaneous coronary intervention for multivessel disease in patients presenting with ST-segment elevation myocardial infarction: a pairwise and network meta-analysis. J Am Coll Cardiol. 2011;58:692–703. Elsevier Inc.
Stegeman BH, de Bastos M, Rosendaal FR, van Hylckama Vlieg A, Helmerhorst FM, Stijnen T, et al. Different combined oral contraceptives and the risk of venous thrombosis: systematic review and network meta-analysis. BMJ. 2013;347:f5298.
Robertson C, Close A, Fraser C, Gurung T, Jia X, Sharma P, et al. Relative effectiveness of robot-assisted and standard laparoscopic prostatectomy as alternatives to open radical prostatectomy for treatment of localised prostate cancer: a systematic review and mixed treatment comparison meta-analysis. BJU Int. 2013;112:798–812.
Verde PE, Ohmann C. Combining randomized and non-randomized evidence in clinical research: a review of methods and applications. Res Synth Methods. 2015;6:45–62.
Ioannidis JP a. Integration of evidence from multiple meta-analyses: a primer on umbrella reviews, treatment networks and multiple treatments meta-analyses. CMAJ. 2009;181:488–93.
Jansen JP, Naci H. Is network meta-analysis as valid as standard pairwise meta-analysis? It all depends on the distribution of effect modifiers. BMC Med. 2013;11:159.
Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, et al. The PRISMA Extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med. 2015;162:777.
Schmitz S, Adams R, Walsh C. Incorporating data from various trial designs into a mixed treatment comparison model. Stat Med. 2013;32(17):2935–2949.
Van Valkenhoef G, Tervonen T, Zwinkels T, de Brock B, Hillege H. ADDIS: a decision support system for evidence-based medicine. Decis Support Syst. 2013;55:459–75.
Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med Decis Making. 2013;33:641–56.
Freemantle N, Marston L, Walters K. Making inferences on treatment effects from real world data: propensity scores, confounding by indication, and other perils for the unwary in observational research. BMJ. 2013;6409:1–5.
Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342:1887–92.
Vandenbroucke J, Psaty B. Benefits and risks of drug treatments: how to combine the best evidence on benefits with the best data about adverse effects. JAMA. 2008;300:2417–9.
Sørensen HT, Lash TL, Rothman KJ. Beyond randomized controlled trials: a critical comparison of trials with nonrandomized studies. Hepatology. 2006;44:1075–82.
Ioannidis JP, Haidich a B, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA. 2001;286:821–30.
Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
Walker AM. Confounding by indication. Epidemiology. 1996;7:335–6.
Grimes D a, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359:248–52.
Deeks JJ, Dinnes J, D’Amico R, Sowden a J, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7:iii – x, 1–173.
Puhan M a, Schunemann HJ, Murad MH, Li T, Brignardello-Petersen R, Singh J a, et al. A GRADE Working Group approach for rating the quality of treatment effect estimates from network meta-analysis. BMJ. 2014;349:g5630–0.
Von Elm E, Altman D, Pocock D, Gotzsche, Peter; Vandenbroucke J. Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ Br Med. 2007;335:20–2.
International Society for Pharmacoepidemiology. Guidelines for good pharmacoepidemiology practices (GPP). Pharmacoepidemiol Drug Saf. 2008;17:200–8.
Hernán M, Hernández-Díaz S. Beyond the intention-to-treat in comparative effectiveness research. Clin Trials. 2012;9:48–55.
Hernán M a, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, et al. Observational studies analyzed like randomized experiments. Epidemiology. 2008;19:766–79.
Kurth T, Walker AM, Glynn RJ, Chan KA, Gaziano JM, Berger K, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol. 2006;163:262–70.
Toh S, Manson JE. An analytic framework for aligning observational and randomized trial data: application to postmenopausal hormone therapy and coronary heart disease. Stat Biosci. 2013;5:344–60.
McCarron CE, Pullenayegum EM, Thabane L, Goeree R, Tarride J-E. The importance of adjusting for potential confounders in Bayesian hierarchical models synthesising evidence from randomised and non-randomised studies: an application comparing treatments for abdominal aortic aneurysms. BMC Med Res Methodol. 2010;10:64.
McCarron CE, Pullenayegum EM, Thabane L, Goeree R, Tarride JE. Bayesian hierarchical models combining different study types and adjusting for covariate imbalances: a simulation study to assess model performance. PLoS One. 2011;6(10):e25635. doi:10.1371/journal.pone.0025635.
Behrman RE, Benner JS, Brown JS, McClellan M, Woodcock J, Platt R. Developing the Sentinel System—a national resource for evidence development. N Engl J Med. 2011;364:498–9.
Mini-Sentinel Data Core. Mini-Sentinel Distributed Database Summary Report—Year 4. 2014.
Suissa S, Henry D, Caetano P, Dormuth CR, Ernst P, Hemmelgarn B, et al. CNODES: the Canadian network for observational drug effect studies. Open Med. 2012;6:134–40.
Coloma PM, Schuemie MJ, Trifirò G, Gini R, Herings R, Hippisley-Cox J, et al. Combining electronic healthcare databases in Europe to allow for large-scale drug safety monitoring: the EU-ADR Project. Pharmacoepidemiol Drug Saf. 2011;20:1–11.
Andersen M, Bergman U, Choi N-K, Gerhard T, Huang C, Jalbert J, et al. The Asian Pharmacoepidemiology Network (AsPEN): promoting multi-national collaboration for pharmacoepidemiologic research in Asia. Pharmacoepidemiol Drug Saf. 2013;22:700–4.
Brown J, Holmes J, Shah K, Hall K. Distributed health data networks: a practical and preferred approach to multi-institutional evaluations of comparative effectiveness, safety, and quality of care. Med Care. 2010;48:45–51.
Toh S, Reichman ME, Houstoun M, Southworth MR. Comparative risk for angioedema associated with the use of drugs that target the renin-angiotensin-aldosterone system. PloS ONE. 2014;172:1582–9.
Filion KB, Chateau D, Targownik LE, Gershon A, Durand M, Tamim H, et al. Proton pump inhibitors and the risk of hospitalisation for community-acquired pneumonia: replicated cohort studies with meta-analysis. Gut. 2014;63:552–8.
Dormuth C, Hemmelgarn B. Use of high potency statins and rates of admission for acute kidney injury: multicenter, retrospective observational analysis of administrative databases. BMJ. 2013;880:1–10.
Dormuth CR, Filion KB, Paterson JM, James MT, Teare GF, Raymond CB, et al. Higher potency statins and the risk of new diabetes: multicentre, observational study of administrative databases. BMJ. 2014;348:g3244–4.
Fireman B, Toh S, Butler M. A protocol for active surveillance of acute myocardial infarction in association with the use of a new antidiabetic pharmaceutical agent. Drug Saf. 2012;21:282–90.
Eichler H-G, Oye K, Baird LG, Abadie E, Brown J, Drum CL, et al. Adaptive licensing: taking the next step in the evolution of drug approval. Clin Pharmacol Ther. 2012;91:426–37.
Morgan SG, Thomson PA, Daw JR, Friesen MK. Inter-jurisdictional cooperation on pharmaceutical product listing agreements: views from Canadian provinces. BMC Health Serv Res. 2013;13:34.
CC is a recipient of a Vanier Canada Graduate Scholarship through CIHR (funding reference number—CGV 121171). He also received a CIHR Canada Graduate Scholarship—Michael Smith Foreign Study Supplement (funding reference number—FFS 134035) and University of Ottawa student mobility bursary to study at the Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute under the supervision of ST. CC is also a trainee on the CIHR Drug Safety and Effectiveness Network Meta-Analysis team grant (funding reference number—116573). BH is funded by a New Investigator award from the Canadian Institutes of Health Research and the Drug Safety and Effectiveness Network.
CC is now a Director at Cornerstone Research Group Inc., a health care consultancy which consults for the pharmaceutical, biotech, and medical device industry. He was not employed by Cornerstone Research Group Inc. when this manuscript was initially drafted.
CC wrote the first draft, incorporated suggested revisions/edits by co-authors, approved the final version, and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. ST aided in the design, edited early versions of the manuscript, approved the final version, and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. CC, BF, TC, DC, BH, GW, CD, and RP revised the manuscript, approved the final version, and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
About this article
- Network meta-analysis
- Randomized controlled trials
- Observational studies
- Comparative effectiveness research
- Distributed research networks