 Methodology
 Open Access
 Published:
A likelihood ratio test for the homogeneity of betweenstudy variance in network metaanalysis
Systematic Reviews volume 10, Article number: 310 (2021)
Abstract
Background
Network metaanalysis (NMA) is a statistical method used to combine results from several clinical trials and simultaneously compare multiple treatments using direct and indirect evidence. Statistical heterogeneity is a characteristic describing the variability in the intervention effects being evaluated in the different studies in network metaanalysis. One approach to dealing with statistical heterogeneity is to perform a random effects network metaanalysis that incorporates a betweenstudy variance into the statistical model. A common assumption in the random effects model for network metaanalysis is the homogeneity of betweenstudy variance across all interventions. However, there are applications of NMA where the single betweenstudy assumption is potentially incorrect and instead the model should incorporate more than one betweenstudy variances.
Methods
In this paper, we develop an approach to testing the homogeneity of betweenstudy variance assumption based on a likelihood ratio test. A simulation study was conducted to assess the type I error and power of the proposed test. This method is then applied to a network metaanalysis of antibiotic treatments for Bovine respiratory disease (BRD).
Results
The type I error rate was well controlled in the Monte Carlo simulation. We found statistical evidence (p value = 0.052) against the homogeneous betweenstudy variance assumption in the network metaanalysis BRD. The point estimate and confidence interval of relative effect sizes are strongly influenced by this assumption.
Conclusions
Since homogeneous betweenstudy variance assumption is a strong assumption, it is crucial to test the validity of this assumption before conducting a network metaanalysis. Here we propose and validate a method for testing this single betweenstudy variance assumption which is widely used for many NMA.
Background
Network metaanalysis (NMA) is an approach to combining evidence from multiple studies of multiple interventions and obtaining estimates of all possible intervention comparisons using indirect and direct evidence. Common approaches to network metaanalysis include a fixed effect model and a random effects model. The random effects model assumes that the true effect size can differ from study to study, because the effect size in each study is derived from a random distribution of effect sizes. Several assumptions about the data generating mechanism are made in network metaanalysis. Common to the fixed effect model and random effects model is the exchangeability assumption. The exchangeability assumption relates to the study populations and states that the randomized trials are similar on average, in all important factors other than the intervention comparison being made [1, 2]. The exchangeability assumption implies the consistency condition is valid [3], i.e., the relative effect of A to B, can be derived from the difference using data from C compared to A and C compared to B for any treatments A, B, and C. A commonly used assumption unique to the random effects model is a single betweenstudy variation for all treatments [4]. Assuming that all effects sizes across all treatments have the same betweenstudy variation is a strong assumption. However, there are applications of NMA where the single betweenstudy assumption is potentially incorrect and instead the model should incorporate more than one betweenstudy variance estimate. A few approaches have been proposed to allow different betweenstudy variation across treatment comparisons. Lu (2009) proposed a Bayesian approach to modeling betweenstudy variance structures under the consistency assumption [5]. White (2012) proposed a partially structured heterogeneity model that allows for two betweenstudy variances but did not have a practical reason for doing so [6]. Although these approaches have been proposed, the single betweenstudy variation assumption remains widely used. In practice, there is a lack of guidance for when the homogeneous assumption should be challenged. The decision to assume one or more betweenstudy variance should be informed primarily by the reviewers’ knowledge of the data generating mechanism. However, the results from statistical testing, comparison of results of the NMA under both assumptions and the magnitude of variance estimates can also support any decisions made about the structure of betweenstudy variance.
Recently we conducted several network metaanalyses of interventions to prevent bovine respiratory disease in feedlot cattle, where the assumption of a single betweenstudy variance was questionable based on our knowledge of the biology of the disease and interventions included in the metaanalysis. Turner et al. [7] found heterogeneity might be related to the type of comparison and models with heterogeneous variances have been proposed with different informative priors under the Bayesian framework [8]. However, this is not applicable in frequentist framework. Additionally, limited work has been reported on testing the assumption of a single betweenstudy variance across all treatment comparisons. Therefore, the objective of this project was not to model the betweenstudy variance structure, but to develop an approach to testing the homogeneity of betweenstudy variance in a network metaanalysis based on the likelihood method. For network metaanalysis, several different methods of calculating the single betweenstudy variance have been proposed [9–11]. However, we were unable to identify any commonly used approaches to testing this assumption compared to an alternative that two or more betweenstudy variances exist based on a characteristic of the underlying studies. The sequence of the paper is as follows:

Section 2: The motivating example

Section 3: The model and proposed likelihood ratio test (LRT)

Section 4: The evaluation of the LRT using two methods

Section 5: Discussion of the evaluation and application.
Motivating example
The motivating example involved bovine respiratory disease, a multiagent disease of cattle. Bovine respiratory disease (BRD) is the most economically important disease of feedlot cattle and therefore knowledge of the comparative efficacy of interventions to prevent, control and treat BRD is critically important. One common approach to preventing bovine respiratory disease is to administer antibiotics to all cattle at arrival at the feedlot. The aim of administering antibiotics at arrival is to preemptively treat animals with subclinical BRD and to prevent BRD in animals at risk. Trials conducted to assess how effective antibiotics are for this purpose, use the proportion of treated animals detected with BRD after a period of time, usually 28 days, as the outcome. The data available for assessing the comparative efficacy of antibiotics for this purpose included comparisons of antibiotic to antibiotic, and comparisons of an antibiotic to notreatment. For BRD prevention, the assumption of a single betweenstudy effect for both types of comparisons is biologically questionable. It is known that some antibiotics are highly effective at treating and preventing BRD because the mechanism of action is very broad spectrum. An example of such a group of antibiotics is the macrolide group. Antibiotics in this group have consistent high quality evidence of low BRD risk after 28 days when administered at arrival [12, 13]. This means that trials that compare a macrolide to a macrolide would be expected to have a comparative effect size near zero, if the effect size is measured as the log odds ratio (log OR). The betweenstudy variation of macrolide to macrolide trials is therefore expected to be small. However, for trials that compare a board spectrum antibiotic, such as a macrolide, to a nontreated control, the expected variation in the effect size is much larger, because the risk of BRD in the 1st 28 days in cattle is highly variable in nontreated cattle. The data suggests that some groups of untreated cattle have close to zero animals detected with BRD after 28 days while other groups have 50% or more animals with BRD. The result of this naturally expected variation in BRD risk in the 1st 28 days of feedlotting in nontreated animals is a wider variation in the comparative effect sizes when active drugs such as macrolides are compared to nontreated groups. For example, if the macrolide is highly effective, we expect that the number of animals treated for BRD after 28 days will be close to zero regardless of the underlying risk of BRD in the group. However, the non treated group may have anywhere from zero to 100%. When these data are converted to a distribution of the comparative effect sizes (log OR), it is natural that more variation is expected between these active to notreatment trials than the trials that are macrolide to macrolide. There are several other scenarios in BRD, where the assumption of a single betweenstudy variance for all comparisons is questionable. For example, to prevent BRD in animals arriving at the feedlot, antibiotics or vaccines might be used. As with a notreatment group, the response to vaccination is highly variable, yet the response to broad spectrum antibiotics like, macrolides is highly consistent. Therefore in a network of evidence that compared the efficacy of antibiotic and vaccines to prevent BRD, we would naturally expect the vaccine to vaccine comparisons to be more variable than board spectrum antibiotic to broad spectrum antibiotic comparison. It is these examples, that motivated the work described below.
Methods
The likelihood for a random effects model of network metaanalysis under consistency assumption
This section provides the basic model form used for formulating the likelihood ratio test. In the following, we consider T treatments that are compared in I studies each with n_{i} arms. The set of treatments included in study i is given by T_{i}. Let y_{i} denote the estimates of relative effects for the ith study, \(\phantom {\dot {i}\!}\boldsymbol {y}_{i} = (y_{i,1},..., y_{i,n_{i}1})^{T}\) and y=(y_{1},...,y_{T}). The study specific treatment effects of study i are given by θ_{i} where θ=(θ_{i},...,θ_{I}). Then we have
where ε_{i} represents the vector of errors of study i. ε_{i} is assumed to be normally distributed and independent across studies and its covariance is cov(ε_{i}) = S_{i}. S_{i} is a diagonal matrix of size (n_{i}−1)×(n_{i}−1) and is a scalar if study i only has two arms. The distribution of y is
where S is a block diagonal matrix with each block S_{i},i=1,...,I. As the consistency assumption is made in the random effects model, all treatment effects are uniquely determined by T−1 basic treatment comparisons with a common reference (usually a placebo). These basic parameters are denoted by the vector d. The relative effect size of all other possible treatment comparisons in the network are called functional parameters which can be obtained from the basic parameters. For example, if d_{1,2} and d_{1,3} are basic parameters in the network, then d_{2,3}, a functional parameter, can be obtained by
Let X denote the design matrix of size I×(T−1). Each row of X corresponds to one study specific comparison and the columns represent the basic comparisons and. 1, 0, and 1 are the possible values in the design matrix. If one row of X only has one element of 1 and other elements are 0, then this study specific comparison is a basic comparison. If 1 and 1 occur in one row, then the relative effect parameter of the corresponding comparison is a functional parameter. For each study i, the design matrix is denoted by X_{i}. Then,
where δ_{i} is the vector of between–study heterogeneity of study i. The random effects model usually assume δ_{i} to be normally distributed. If study i only has two arms, then δ_{i}∼N(0,τ^{2}), otherwise, δ_{i}∼MVN(0,V_{i}), where the values of the diagonal elements of V_{i} are τ^{2} and off–diagonal values are τ^{2}/2 [5, 14]. The values of the off–diagonal elements are determined by the assumption that every source of direct evidence has the same betweenstudy variance. The distribution of θ is
where V is a block diagonal matrix with each block V_{i},i=1,...,I. The betweenstudy heterogeneity is assumed to be independent of withinstudy errors. Hence, the marginal distribution of y is
If we know τ^{2}, then the maximum likelihood estimate of d is
Likelihood ratio test for the betweenstudy variance parameter
Here we discuss an approach to testing the assumption of a single τ^{2}. Based on our motivating example, the betweenstudy variance parameter for non–active to active treatment comparisons and active to active treatment comparisons are denoted by \(\tau _{n}^{2}\) and \(\tau _{a}^{2}\) respectively. The hypotheses to be tested are
The loglikelihood function under the null hypothesis is
Under the null hypothesis, the structure of V_{i} is discussed in section 3. There are two potential data forms for V_{i} under the alternative hypothesis. If study i only contains active treatments, then the values of diagonal elements of V_{i} are \(\tau _{a}^{2}\) and offdiagonal values are \(\tau _{a}^{2}/2\). If nonactive controls are included in study i, then the diagonal values (variance) are \(\tau _{n}^{2}\) and the offdiagonal values (covariance) are \(\tau _{n}^{2}  \tau _{a}^{2}/2\).
For example, suppose study i is a threearm trial that compares a nonactive control (denoted by N) with two active treatments (denoted by A_{1},A_{2}). The betweenstudy variancecovariance matrix for study i is
Since \(\phantom {\dot {i}\!}\text {Var}(\theta _{i, N A_{2}}  \theta _{i, N A_{1}}) = \text {Var}(\theta _{i, N A_{2}}) + \text {Var}(\theta _{i, N A_{1}})  2\text {Cov}(\theta _{i, N A_{2}}, \theta _{i, N A_{1}})\), the covariance (offdiagonal) is given by
To make the variancecovariance matrix semipositive definite, the covariance should follow the following inequality:
To meet this inequality the following constrains are placed on \(\tau ^{2}_{n}\) and \(\tau ^{2}_{a}\):
Here a threearm trial is used to illustrate the covariance matrix structure and the constrains. Similar structures and the same constrain are applicable to trials with more than three arms. The likelihood ratio test (LRT) statistic is
where the estimates of the parameters are the maximum likelihood estimates. The asymptotic distribution of this test statistic is \(\chi _{1}^{2}\). Given \(\hat {\tau }^{2}\), the maximum likelihood estimate of \(\boldsymbol {\hat {d}}\) is
Real data implementation and simulation results
The data used are from a network meta–analysis of antibiotic treatments for BRD in feedlot cattle [15]. The evidence network consists of 204 trial arms from 98 studies. Eight of the 98 trials have three arms. The total number of participants in all studies is 26,132, with the number of participants in a study ranging between 34 and 1726. Among the total 26,132 participants, 9467 had the event. There are 13 treatments in the network: nonactive control (NAC), ceftiofur hydrochloride (CEFTH), ceftiofur bollus in pinna (CEFTP), ceftiofur sodium (CEFTS), danofloxacin (DANO), enrofloxacin (ENFO), florfenicol (FLOR), gamithromycin (GAMI), oxytetracycle (OXY) used at multiple doses, tildipirosin (TILD), tilmicosin (TILM), trimethoprim (TRIM), and tulathromycin (TULA). The outcome is the log odds ratio of the proportion of treated animals detected with BRD. A negative log OR means treatment benefit for the numerator treatment compared to the referent. The within–study variance is obtained using delta method. For example, in a 2–arm study with reported number of events r_{1} and r_{2} and sample sizes N_{1} and N_{2}, the withinstudy variance is calculated by 1/r_{1}+1/(N_{1}−r_{1})+1/r_{2}+1/(N_{2}−r_{2}). The number of pairwise comparisons is 106 in total with 66 nonactive control to active treatments (N2A) comparisons and 40 active to active treatments (A2A) comparisons. The network plot is shown in Fig. 1. The size of the node is proportional to the number of arms and the thickness of the edges represents the total size of direct comparisons between each treatment pair. The number in the parentheses after a treatment abbreviation is the number of studies containing that treatment. The absence of a line means that there is no direct comparison in the evidence network.
To evaluate the performance of the proposed LRT, two methods have been used. The first method is based on the asymptotic distribution (χ^{2}) of the LRT statistic and the second method is established on the Monte Carlo simulation. Maximum likelihood estimation is applied to obtain the basic effect size parameters and τ^{2} under the null and alternative hypothesis. We simulated 1000 data sets under the null hypothesis being true (a single between–study variance for all treatment comparisons) to assess the type I error rate and another 1000 data sets where the alternative hypothesis was true (two betweenstudy variance, one for N2A and one for A2A) to evaluate the power given the significance level of 0.05. Under the null hypothesis, the simulated data \(\boldsymbol {y}_{H_{0}}\) is generated from the real data y by
where \(\hat {\boldsymbol {d}}_{H_{0}}\) is the maximum likelihood estimate given \(\hat {\tau }^{2}\). Since the LRT statistic under the null hypothesis follows a chi square distribution when the sample size goes to infinity, we also assessed the type I and power for the scenario where the number of studies is five times the original to determine if the type I error can be well controlled when the sample size per comparison is larger. This increasedsize dataset has the same network structure as the real data. For example, in the original network, there is no study comparing treatment TRIM with NAC, and this is also the case in the simulated network. Only one study compares TRIM with TILM as shown in Fig. 1, whereas for the increasedsize data set there are five studies simulated for this comparison.
Assessing type I error rate and power of the test based on the chi square distribution
For each simulated dataset where the null hypothesis was true (a single between–study variance for all treatment comparisons), the maximum likelihood estimates were obtained and the LRT statistic calculated. The proportion of these 1000 LRTs that are beyond the 95% quantile of the \(\chi _{1}^{2}\) distribution is the estimated type I error rate. The power can be obtained by applying the same procedure on each simulated dataset where the alternative hypothesis is true (two betweenstudy variance, one for N2A and one for A2A).
Assessing type I error and power of the likelihood test based on the Monte Carlo simulation
An alternative approach to the chisquare approach is a simulation based approach to testing. This procedure is as follows:

1
For each simulated dataset where the null hypothesis is true, the maximum likelihood estimates are obtained under both hypotheses and LRT is calculated, denoted by LRT_{b}(b∈{1,...,1000}).

2
One thousand data sets are generated given the estimates in this simulated dataset under the null hypothesis. We used the MLE to obtain parameter estimates under both hypotheses and calculate LRT statistics, denoted by LRT_{b,m}m∈{1,...,1000}

3
The p value of the LRT_{b} is \(\frac {1}{1000}\sum _{m=1}^{1000} \text {LRT}_{b,m} > \text {LRT}_{b}\), denoted by p_{b}.

4
The proportion of rejection is the type I error which is obtained by \(\frac {1}{1000}\sum _{b=1}^{1000} I_{p_{b}<0.05}\), where I is the identity function.
For estimating power, the only change is to use each simulated dataset under the alternative hypothesis being true in the step 1.
Results
The values of τ^{2} observed in the original BRD dataset are shown in Table 1. The p value of the likelihood ratio test based on the asymptotic distribution of the test statistic is 0.028 indicating a significant difference between the two heterogeneity parameters but the type I error rate inflates in this case. The simulationbased p value is very close to 0.05. Hence, making decision only relies on the cutoff of 0.05 for the p value of the LRT is not convincing. The heterogeneity parameters values estimated under two models are meaningfully different. The estimated betweenstudy variance for the nonactive control to active treatments comparison is four times larger than that for active to active treatments. This difference would have an impact on the confidence intervals of the relative effects of the comparisons in the network, especially for comparisons with fewer studies. Then the Wald 95% confidence interval of \(\hat {\tau }^{2}_{n}  \hat {\tau }^{2}_{a}\) is calculated and given by (0.0282,0.8469) which indicates a significant difference from 0.
The effect of models with different heterogeneity parameters on the point estimates and confidence intervals of the relative effect sizes, are presented in Fig. 2. Figure 2 shows the 95% confidence intervals of the log odds ratios of the treatment pairs presented in the network plot under the models with one and two betweenstudy variance parameters. Treatment comparisons that involve only one study which has small study size tends to have wider confidence interval because of the large withinstudy variance. It can be seen in Fig. 2 that some confidence intervals change markedly in width under the different models. Some of the point estimates of the relative effect sizes shift because of the change of estimates in betweenstudy variances which would vary the weight of direct and indirect comparisons. The estimate of τ^{2} of N2A comparison in two τ^{2}s model is greater than that in one τ^{2} model and the τ^{2} of A2A comparison is opposite. Therefore, the width of confidence intervals tends to be narrower for A2A comparisons in the two τ^{2} model than in the one τ^{2} model. Also, most of the point estimates of the effect sizes of N2A comparisons shift to the right under the two τ^{2} model. It is not easy to predict the direction of the change of the point estimate of effect size or the width of the confidence interval in the two τ^{2} model for each comparison since it is a mixed weight change of direct and indirect comparisons.
The results of the study the likelihood ratio test performance in Table 2 shows the type I error rate and the power analysis results. The simulation based on the original data is labeled (60, 40) to indicated the number of studies. While the increased size data is labeled (330, 200). The increasedsize data set have the same network structure as the real data. For the asymptotic distribution of the test statistics, the type I error is above 5%, i.e., 8.3%. Increasing the number of studies reduced the type I error drop to 5%, i.e., 4.4%. While in the Monte Carlo simulationbased evaluation, the type I errors are controlled in both settings. The power was suitable for all methods and datasets. By combining the results in Table 2 with those in Table 1, we can say there is a significant difference between the heterogeneity parameter of nonactive control to active treatments comparisons and of active to active treatments comparisons. In practice, if the p value of the LRT statistic is very close to the cutoff (i.e., 0.05 in this paper) like in this example, depending on p value only to make decision is not conclusive. Visual inspection of the results from the two models and how these results differ is helpful in reaching a conclusion.
Conclusions
We have proposed a likelihood ratio test for testing the homogeneity of the betweenstudy variance parameter for the random effect network metaanalysis model. We illustrate this method with an example for testing the homogeneity between the nonactive control to active treatments comparisons and of active to active treatments comparisons. Our example applied this likelihood ratio test in a network metaanalyses which contained a nonactive control (or placebo or notreatment) and our understanding of the biology of this example, raised concerns about the single betweenstudy variance esti mate. There are many other situations that this method can be applied, for example, the between–study heterogeneity for a pharmacological treatment vs surgery comparison might be different from that of a comparison of two pharmacological treatments. We also developed the variancecovariance matrix structure of the model with two heterogeneity variance parameters. In the motivating example, we applied the test and found the significant difference of the betweenstudy variance of two types of comparisons. We have explored two ways to define the p value based on the same LRT statistic, one using the asymptotic χ^{2} distribution and the other using a Monte Carlo simulated sampling distribution. In practice, we would recommend using the Monte Carlo p value, which has a better control of the type I error, especially when the number of studies is limited. The estimation method for the basic parameters and betweenstudy variance is MLE. There are many literature comparing different methods of estimating the betweenstudy variance parameter[16–19]. Different estimators may have different distributions and our method is based on the MLE. That is not to say MLE is the best estimator but we just propose a possibility that the betweenstudy variance may not be the same across all comparisons and we use MLE and likelihood ratio test to show the single heterogeneity parameter assumption may not hold in some cases. Tests for this assumption using other estimators are possible extensions. Our likelihood ratio test is developed based on a model where the consistency condition is considered valid. If the consistency condition is not met, alternative models can be used to address inconsistency and the likelihood ratio test can be developed under the new model in an analogous fashion. Testing the homogeneity of betweenstudy variance in network metaanalysis with inconsistency is an interesting topic that we leave as a possible future work.
Availability of data and materials
We provide the R code and data we used in this paper in https://github.com/dapengh/test_the_heterogeneity_of_the_betweenstudy_variance.
Abbreviations
 NMA:

Network metaanalysis
 BRD:

Bovine respiratory disease
 LRT:

Likelihood ratio test
 OR:

Odds ratio
 A2A:

Active to active
 N2A:

Nonactive to active
 NAC:

Nonactive control
 CEFTH:

Ceftiofur hydrochloride
 CEFTP:

Ceftiofur bollus in pinna
 CEFTS:

Ceftiofur sodium
 DANO:

Danofloxacin
 ENFO:

Enrofloxacin
 FLOR:

Florfenicol
 GAMI:

Gamithromycin
 OXY:

Oxytetracycle
 TILD:

Tildipirosin
 TILM:

Tilmicosin
 TRIM:

Trimethoprim
 TULA:

Tulathromycin
References
 1
Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published metaanalyses. BMJ. 2003; 326(7387):472.
 2
Glenny A, Altman D, Song F, Sakarovitch C, Deeks J, D’Amico R, Bradburn M, Eastwood A, et al.Indirect comparisons of competing interventions. Health Technol Assess. 2005; 9(26):1–134.
 3
Dias S, Ades AE, Welton NJ, Jansen JP, Sutton AJ. Network Metaanalysis for Decisionmaking. Hoboken: John Wiley & Sons; 2018.
 4
Dias S, Welton NJ, Sutton AJ, Ades A. Nice dsu technical support document 2: a generalised linear modelling framework for pairwise and network metaanalysis of randomised controlled trials. 2011. https://www.nicedsu.org.uk.
 5
Lu G, Ades A. Modeling betweentrial variance structure in mixed treatment comparisons. Biostatistics. 2009; 10(4):792–805.
 6
White IR, Barrett JK, Jackson D, Higgins JP. Consistency and inconsistency in network metaanalysis: model estimation using multivariate metaregression. Res Synth Methods. 2012; 3(2):111–25.
 7
Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JP. Predicting the extent of heterogeneity in metaanalysis, using empirical data from the cochrane database of systematic reviews. Int J Epidemiol. 2012; 41(3):818–27.
 8
Turner RM, DomínguezIslas CP, Jackson D, Rhodes KM, White IR. Incorporating external evidence on betweentrial heterogeneity in network metaanalysis. Stat Med. 2019; 38(8):1321–35.
 9
DerSimonian R, Laird N. Metaanalysis in clinical trials. Control Clin Trials. 1986; 7(3):177–88.
 10
Kontopantelis E, Springate DA, Reeves D. A reanalysis of the cochrane library data: the dangers of unobserved heterogeneity in metaanalyses. PloS ONE. 2013; 8(7):69930.
 11
Hunter JE, Schmidt FL. Methods of Metaanalysis: Correcting Error and Bias in Research Findings. London: Sage; 2004.
 12
O’Connor AM, Hu D, Totton SC, Scott N, Winder CB, Wang B, Wang C, Glanville J, Wood H, White B, et al. A systematic review and network metaanalysis of injectable antibiotic options for the control of bovine respiratory disease in the first 45 days post arrival at the feedlot. Anim Health Res Rev. 2019; 20(2):163–81.
 13
Abell KM, Theurer ME, Larson RL, White BJ, Apley M. A mixed treatment comparison metaanalysis of metaphylaxis treatments for bovine respiratory disease in beef cattle, 2. J Anim Sci. 2017; 95(2):626–35.
 14
Salanti G, Higgins JP, Ades A, Ioannidis JP. Evaluation of networks of randomized trials. Stat Methods Med Res. 2008; 17(3):279–301.
 15
O’Connor A, Yuan C, Cullen J, Coetzee J, Da Silva N, Wang C. A mixed treatment metaanalysis of antibiotic treatment options for bovine respiratory disease–an update. Prev Vet Med. 2016; 132:130–39.
 16
Jackson D, Veroniki AA, Law M, Tricco AC, Baker R. Paulemandel estimators for network metaanalysis with random inconsistency effects. Res Synth Methods. 2017; 8(4):416–34.
 17
van Aert RC, Jackson D. Multistep estimators of the betweenstudy variance: The relationship with the paulemandel estimator. Stat Med. 2018; 37(17):2616–29.
 18
Langan D, Higgins JP, Simmonds M. Comparative performance of heterogeneity variance estimators in metaanalysis: a review of simulation studies. Res Synth Methods. 2017; 8(2):181–98.
 19
Langan D, Higgins JP, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M. A comparison of heterogeneity variance estimators in simulated randomeffects metaanalyses. Res Synth Methods. 2019; 10(1):83–98.
Acknowledgements
We thank Dr. Chaohui Yuan for cleaning and providing the data as the example.
Funding
None reported.
Author information
Affiliations
Contributions
DH proposed the method, wrote the code used to conduct the data analysis. CW coordinated the project team, assisted with the data analysis, and interpreted the procedure and results of the analysis. AOC provided the data, assisted with the data analysis. The manuscript was primarily prepared by DH, with secondary input from all other authors. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hu, D., Wang, C. & O’Connor, A.M. A likelihood ratio test for the homogeneity of betweenstudy variance in network metaanalysis. Syst Rev 10, 310 (2021). https://doi.org/10.1186/s13643021018593
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13643021018593
Keywords
 Heterogeneity
 Betweenstudy variance
 Network metaanalysis
 Hypothesis testing