NMA can be used to generate relative efficacy estimates of competing treatments in situations where more than two treatment options are available and direct head-to-head evidence from RCTs does not exist for all comparators. The NMA approach allows all relevant evidence to be considered and addresses research questions in the absence of direct comparative evidence, improving the precision of estimates by combining direct and indirect evidence.
One of the key assumptions underpinning this method is that the studies included in the analysis are homogeneous (that is, the trials are sufficiently similar on study and patient characteristics). The similarity assumption is violated if one or more study-level covariates act as modifiers of the relative treatment effects and their distribution is not balanced across the studies being compared [49, 50]. In this case, NMA may be affected by confounding bias, unless one explicitly controls for these covariates in the statistical analyses.
Controlling for covariates is particularly important in cases where response to treatment is defined in terms of post-treatment level of a measure, and when that baseline level of this measure is known to vary across studies. If one study recruits patients with worse levels of a variable that is known to modify the relative impact of treatment, then the level of response achieved is likely to be smaller compared with another study which primarily includes patients with better baseline levels, other things being equal.
The motivation for our work was the belief that such baseline covariate imbalances had occurred for patients recruited into studies looking at interventions for CHB. In particular, it was noted that there were differences in mean baseline viral load (expressed in terms of log10 copies/ml when measured using the PCR assay) with values for entecavir and tenofovir differing by approximately 1 log10 copies/ml (Table 1). We hypothesised that failure to account for these differences in previous analyses may have led to biased estimates of relative efficacy.
The work contained in this paper supports this hypothesis. When no adjustment was made to account for differences in baseline viral load among trials, tenofovir was shown to be significantly better than entecavir in terms of achieving UVL at 1 year (fixed effects RR 1.43, 95% CrI 1.30 to 1.54). However, when we accounted for the impact of baseline viral load the difference between the two treatments was not significant (fixed effects RR 1.27, 95% CrI 0.96 to 1.47; random effects RR 1.21, 95% CrI 0.48 to 1.51). The fixed effects adjusted model best fitted the underlying data, although the difference was minor (fixed effects DIC, 35.56; random effects DIC, 35.86).
Sensitivity analyses highlighted that the relative efficacy of tenofovir versus entecavir was contingent on the choice of studies included in the meta-analysis, and in particular whether or not data reported by one study group  were used. When these data were excluded, there is no significant difference between the two interventions (RR 1.08, 95% CrI 0.22 to 1.52). A subsequent sensitivity analysis, whereby this study was removed but two other studies were included (AI463023 and TBVIG), generated similar non-significant results (RR 1.15, 95% CrI 0.39 to 1.50). In both sensitivity analyses the most appropriate model, based on DIC, consisted of random as opposed to fixed effects approaches. Close examination of the published paper  has identified no reason why this result should occur, and so there may be some other form of study level heterogeneity as yet unaccounted for that is influencing the results.
Our paper is the first to generate baseline viral load adjusted and unadjusted NMA results using data from the same set of studies, and the results from the unadjusted analyses are very similar to those generated by other research groups [51, 52]. Accepting that NMA is based on relative efficacy, the results from all three unadjusted analyses for UVL appear to be at odds with those provided by the clinical trials included in the NMA. The systematic review identified one study of tenofovir  and the observed response rate was 76%. The corresponding value arising from our NMA was 93.2% (95% CrI 85.6% to 97.6%). Similar values were generated by two other research groups [51, 52]. One other NMA has been recently published . This analysis, however, contains a number of methodological flaws, the most notable being the pooling of data from HBeAg-positive and -negative individuals. We have therefore not extracted results from this paper for the purposes of discussion.
In contrast, with the exception of placebo and interferon-based therapies, the CrIs for the values derived in the adjusted analyses all contain the observed trial values, and the RR estimates are close to the trial values once the 018 Study Group data are removed (Figures 3 and 4). Hence, we would argue that the adjusted results are of greater clinical relevance than the unadjusted results.
Generating ‘like-for-like’ estimates of relative efficacy by controlling for covariates believed to be modifiers of relative treatment effects is not just of clinical interest but is essential for the purposes of reimbursement decisions. Such estimates are used by agencies such as the National Institute for Health and Care Excellence in their appraisal processes when assessing the clinical efficacy in a given disease area . In addition, such values are also used in economic models to evaluate the cost-effectiveness of interventions. A number of such models have been developed in CHB [55–59], of which one  used the results from their unadjusted analysis directly as model inputs. Another  used UVL as a surrogate variable for risk of cirrhosis using information from the REVEAL-HBV study , which quantified the relationship between HBV DNA and the likelihood of being diagnosed with cirrhosis. Overestimation of virologic response would thus correspond to underestimation of the likelihood of cirrhosis, which has been identified as a key driver of cost-effectiveness.
Despite the review finding a decent number of studies overall, as can be seen from Figure 2, the presence of a large number of treatment options means that the majority of the branches in the network are informed by the findings of a single study. This increases the uncertainty surrounding all results and means that baseline imbalances in other potential treatment effect modifiers may have influenced the results.
Further work is needed to complement the work contained in this paper in connection with the achievement of UVL at 1 year in order to explore the impact of other potentially clinically relevant covariates on the relative effects of comparators and the probability of achieving UVL. Exploring the impact of other areas of potential heterogeneity (for example, study design, impact of different LLOQ definitions) is also important. In addition, Ali and collagues  identified the time of assessment as a treatment effect modifier in addition to baseline viral load. The studies included in this analysis were very similar in terms of assessment times and so the exclusion of this variable is likely to have had a modest effect. Nonetheless, it would be interesting to replicate the analyses contained in this paper when controlling for these slight differences. Furthermore, expanding this type of analysis to other clinically relevant endpoints is also worthwhile.