Equivalence and switching between biosimilars and reference molecules in rheumatoid arthritis: protocol for two systematic reviews and meta-analyses

Background Biologic drugs such as adalimumab, etanercept, and iniximab represent major rst-line and second-line treatments for rheumatoid arthritis (RA) patients. However, their high cost poses a massive burden on healthcare systems worldwide. The expiration of patents for these biologics has driven to the production of biosimilar drugs, which are potentially less costly and remarkably similar, albeit not identical to the reference molecules. These two systematic reviews aim to investigate the ecacy and safety prole of biosimilars compared to biologics (systematic review 1) and the impact of switching between biosimilar drugs and reference biologics on the management of RA patients (systematic review 2). Methods Electronic searches will be performed through MEDLINE (via Pubmed), EMBASE, LILACS, and CENTRAL (from inception to September 2020). Risk of bias assessments will be carried out with the Cochrane risk of bias tool, supplemented with specic domains from equivalence trials. Random-effects models will be tted to obtain summary estimates using either relative risk or standardized mean difference as a metric. Between-trial heterogeneity will be tested and quantied with the Q-test and I 2 metric, respectively, whereas assessment of small-study bias will be examined through contour-enhanced funnel plots and Egger and Harbord's tests. Meta-regression models will be tted when appropriate. The primary outcome will be the rate of treatment success according to the American College of Rheumatology 20 (ACR20) and the co-primary outcome will be the Patient's self-assessment of physical function (Health Assessment Questionnaire - Disability Index). Conclusions will be based on equivalence hypothesis testing using predened margins of equivalence elicited from a group of experienced rheumatologists and prior studies. The overall certainty of evidence will be assessed based on the GRADE system. elucidate the ecacy, safety,


Background
Rheumatoid Arthritis (RA) is a chronic in ammatory joint disease that affects up to 20 million people worldwide; thereby representing a major public health burden with important socioeconomic consequences (1-3).
Biological drugs, commonly known as "biologics", are invaluable resources in the treatment of patients RA. Synthetic disease-modifying antirheumatic drugs (DMARDs) are the rst line of therapy and associated with biologics DMARDs have changed clinical outcome reducing the in ammatory burden of disease and therefore chronic articular deterioration (4). The effectiveness and safety of biologic DMARDs have been robustly established (5)(6)(7), and several studies have identi ed factors that affect the patient's response to these DMARDs (8-11). However, biologics pose an important challenge for the sustainability of healthcare systems worldwide, given the high direct costs associated with this drug category (1). For instance, expenses related to biologic treatments can represent almost 40% of the net drug spending in the United States (12).
Given the rapid evolution of pharmaceutical technologies over the past decade and patent expiration of previously approved biologic molecules, biosimilar drugs have been developed as less costly alternatives to their reference biologics (13). According to the U.S. Food and Drug Administration (FDA), biosimilars possess clinically similar bene ts and safety pro le compared to the existing FDA-approved biologics (14). In this regard, it is believed that biosimilars can accelerate the rheumatic disorders drug market competition, positively impacting the global health-care system though improved health-care affordability and increased patients' access to effective and safe drugs (13,15) However, despite the cost-savings potential of biosimilar drugs, there is still uncertainty as to whether currently marketed biosimilar drugs are equivalent to reference molecules in terms of e cacy, safety, and immunogenicity. Besides, the switching and interchangeability between biologics and biosimilars drugs are still topics of great debate in the treatment of RA (16)(17)(18)(19)(20).
Herein, we describe the protocols of two systematic reviews that will address the e cacy, safety and immunogenicity of biosimilars compared to biologics, and the impact of switching between biosimilar drugs and reference biologics on the management of RA patients. Unlike previous reviews, we will establish acceptable equivalence margins elicited from clinical specialists to conclude on the equivalence of biosimilars compared to biologics.

Methods/ Design
Reporting guidelines used in this protocol The present protocol has been developed using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols guidelines (PRISMA-P) 2015 statement (Additional le 1) (21). We will refer separately to the systematic review of e cacy and safety (systematic review 1) and switching (systematic review 2) because the two systematic reviews address two separate objectives and, therefore, need varying methodologies and approaches. PROSPERO synopses Synopses for the two systematic reviews were prospectively registered in the International Prospective Register of Systematic Reviews (https://www.crd.york.ac.uk/PROSPERO/). The rst systematic evaluation will focus on the e cacy and safety of biosimilars compared to biologics (PROSPERO number: CRD42019137152), whereas the second systematic review will examine the clinical impact of switching from reference biologics to biosimilars on the management of RA patients whose treatment has already been started (PROSPERO: CRD42019137155). Adopted reporting standards Both systematic reviews will be conducted and presented in accordance with Preferred reporting items for systematic review and meta-analysis (PRISMA Statement) (22). Besides, since there are speci c aspects related to the conduct, interpretation and reporting of equivalence and noninferiority trials, we will also adopt the US Agency for Healthcare Research and Quality recommendations (23). Electronic searches Search strategies were built using controlled vocabulary according to each database and free-text terms based on the research question. We will use the following electronic databases (from inception to September 2020): MEDLINE via PubMed, EMBASE, Cochrane Central Register of Controlled Trials (CENTRAL), and Latin American and Caribbean Health Science (LILACS). A detailed description of the search strategy is available in the additional le 2.
Other sources We will also search for non-published or ongoing trials in the EU Clinical Trial Register (https://www.clinicaltrialsregister.eu), International Clinical Trials Registry Platform-World Health Organization (http://apps.who.int/trialsearch/) and Clinicaltrials (https://clinicaltrials.gov/). The search strategies to be used in these platforms are described in the additional le 2. When necessary, we will contact corresponding authors for supplementary information. Additionally, we will manually screen the references of all included trials as well as previous systematic reviews. Finally, we will employ Google Scholar and Epistemonikos (https://www.epistemonikos.org/) to retrieve relevant reports citing all relevant included articles. No language limitation will be imposed.

Types of biosimilars
We will assess any biosimilars of adalimumab, etanercept and in iximab. We chose these three main biologics because they are the most prescribed rst-line biologics DMARDs in RA (24). Also, these three DMARDS have the highest numbers of approved biosimilars for RA in the market (13,25).

Types of control interventions
We will consider as control interventions the reference biologic drugs (i.e, adalimumab, etanercept and in iximab originals). No restrictions on dosages, treatment schedules, co-treatment or combined therapies will be imposed. Types of trials Types of trials: Systematic review on e cacy and safety (systematic review 1) To assess the e cacy and safety of biosimilars ("biosimilarity") (26), we will include randomized controlled trials (RCTs) or quasi-randomized controlled trials. We will include all trials comparing biosimilars to biologic drugs irrespective of the type of statistical design (superiority, equivalence, or noninferiority). A quasi-randomised trial was de ned as a prospective interventional study whose allocation sequence was not truly random (e.g., consecutive order, day of the week, date of birth etc.). For trials with a 2-part study design, we will consider results from the rst period (biosimilarity) only to avoid carry over effects.
Types of trials: Systematic review on switching (systematic review 2) To assess the impact of switching on clinical outcomes of RA patients, we will include RCTs with two or multiple-part designs. The following four main designs of switching trials will be considered: Single-switch design (27,28): Trials in which there is a single switch from each treatment to the other. All patients receive the study interventions in successive periods. Firstly, patients are randomly allocated to either a biosimilar or a biologic drug ( rst period). Then, in the second period, treatments are randomly switched in both directions (Group 1: biologic → biosimilar, Group 2: biosimilar → biologic; Group 3: biologic→ biologic; Group 4: biosimilar→ biosimilar).
Transition design 1 (two non-switching groups as a control): Trials in which there is a single switch from one treatment (biologic drug) to another (biosimilar drug), but not the contrary. Firstly, patients are randomly allocated to either a biosimilar or a biologic drug ( rst period). Then, in the second period, the trial becomes a three-arm trial in which patients in the biologic drug group are rerandomized to either continue in the biologic group or to switch to the biosimilar drug treatment. Patients initially allocated to the biosimilar group continue to receive a biosimilar throughout the study period (Experimental group: biologic → biosimilar; Control arm 1: biologic → biologic; Control arm 2: biosimilar → biosimilar).
Transition design 2 (randomized trials with an open label extension; single non-switching group as a control): Trials in which there is a single switch from a biologic drug to a biosimilar drug, but not the contrary. Firstly, patients are randomly allocated to either a biosimilar or a biologic drug ( rst period).
Multiple switches design: Also known as interchangeability design (27,28), in which multiple switches between treatments are allowed throughout the trial follow-up. Type of participants Trials will be included if patients with RA had been diagnosed with validated and established international criteria. No limitation will be imposed on age, baseline RA severity, sex, lines of treatment (e.g., treatment-naïve patients or second line of treatment), or any other major demographic characteristics.
Types of outcomes measures All outcomes were prespeci ed in the registered PROSPERO synopses and were categorized into three types: e cacy (encompassing outcomes related to disease activity, functional capacity, quality of life and structural damage progression), safety, and immunogenicity. For e cacy outcomes, we will extract data at the following timepoints: 1 month (± 2 weeks), 3 months (± 4 weeks), 6 months (± 4 weeks), 8 months (± 4 weeks), 12 months (± 4 weeks), 36 months (± 4 weeks) and 48 months (± 4 weeks). For safety and immunogenicity outcomes, we will collect data from longest follow-up available.

Primary outcomes (e cacy)
For both systematic reviews, we prespeci ed a primary outcome, a co-primary outcome, and all secondary outcomes. A co-primary outcome was adopted because the demonstration of superiority or equivalence in a single outcome is insu cient to support clinical decisions. The choice of primary and co-primary outcomes was decided on a panel composed of two RA specialists supervised by two researchers with experience in evidence synthesis. The rationale was to evaluate equivalence between biosimilars, and reference biologic drugs based both on physician-reported and patient-reported outcomes. Similar approaches have been used previously in RA trials (29,30).\ Systematic review on e cacy and safety (systematic review 1) The primary outcome will be treatment success at 6 months according to the American College of Rheumatology 20 (ACR20) (31). If trials report results at different time points, we will use the time point closest to 6 months. ACR20 is a composite and binary outcome, requiring patients to have ≥ 20% improvement in the number of swollen and tender joint plus ≥ 20% improvement in at least three out of ve domains: Patient's assessment of pain (measured on a 100 mm visual analog scale [VAS]).
Patient's global assessment of disease activity (measured on a 0-to-10 Likert scale).
Physician's global assessment of disease activity (measured on a 0-to-10 Likert scale).
C-Reactive Protein levels.
The co-primary outcome will be HAQ-DI, which assesses the functional status of patients through the evaluation of eight domains of daily-life activities (dressing and grooming, arising, eating, walking, hygiene, reach, grip, and activities) with 20 questions in total. For each question, there are four possible responses: 0 = without di culty, 1 = with some di culty, 2 = with much di culty, and 3 = unable to do.
The highest score reported for any component question in each domain determines the nal score for that domain. By convention, the overall disability index is expressed on a 0 to 3 scale, representing an average score across the domains. A HAQ-DI of 0 indicates no functional disability, whereas a HAQ-DI of 3 denotes severe functional disability (32). Systematic review on switching (systematic review 2) The primary outcome will be rate of treatment success at 6 months after the rst switch (i.e. 6 months after re-randomization or 6 months after the rst switch on the open-label extension phase) de ned by the ACR20 (dichotomous outcome). The co-primary outcome will be the HAQ-DI index also measured at 6 months after the rst switch (continuous outcome). If outcome data are reported at different time points, we will use the time point closest to 6 months. Secondary outcomes (e cacy, safety, and immunogenicity) Secondary outcomes: E cacy Secondary outcomes of e cacy will include disease activity, prevention of structural damage progression and quality of life measures: Measures of disease activity: the American College of Rheumatology criteria with 50% (ACR50) and 70% (ACR70) responses, simpli ed disease activity score (SDAI), clinical disease activity score (CDAI), disease activity score in 28 joints based on the erythrocyte sedimentation rate (DAS28-ESR), disease activity score in 28 joints with four components based on C-reactive protein (DAS28-CRP) and the numeric index of the ACR response (ACR-N).
Functional capacity/quality of life: scores of HAQ-DI and the Medical Outcomes Study 36-item Short-Form Health Survey (SF-36) (physical and mental components summaries).

Prevention of structural damage progression: scores of Sharp/ Van der Heijde or Sharp-Van Der
Heidje Modi ed Score Method (mTRSS). A full description of secondary outcomes can be found in the additional le 3. Secondary outcomes: Safety We will evaluate the safety of biosimilars compared to biologics by the proportion of patients with treatment-emergent adverse events (TEAEs), serious TEAEs, infusion-related reactions (IRRs), hypersensitivity, malignancies, active tuberculosis, serious infections, all-cause mortality, and treatmentrelated mortality. Also, we will evaluate discontinuation rates in both treatments. A full list of safety outcomes can be found in the additional le 3.
Secondary outcomes: Immunogenicity Immunogenicity will be evaluated by the proportion of patients with positive anti-drug antibodies (ADAs) and the proportion of patients with positive neutralizing antibodies (Nabs).

Study Screening And Selection
We developed a customized web platform for data extraction and curation using Ragic (https://www.ragic.com/). This database was carefully designed to simultaneously allow for study screening and selection, and data extraction for both systematic reviews 1 and 2.
During the screening phase, two review authors will independently evaluate titles and abstracts. Disagreements will be solved by a consensus. Next, for each study selected, full-length articles will be downloaded, and two independent reviewers will re-assess the eligibility of each pre-selected trial. In cases of disagreements, a third reviewer will be consulted. Reasons for exclusions will be described in detail in subsequent publications.

Data Extraction And Management
Analysis population Trials may report two populations for the analysis: an intention-to-treat (ITT) population and a perprotocol (PP) population (23). In both systematic reviews 1 and 2, preference will be given to results based on the PP population, because of the conservative effect of the per-protocol approach on equivalence testing (23). Since there may be a substantial variety in the de nition of what constitutes a PP population or an ITT analysis, we will collect and tabulate in detail the de nition of PP and ITT used in each trial.

Numerical and graphical results
All data will be extracted independently by two investigators. Discrepancies will be solved via a consensus. We will extract all pertinent quantitative information, including the number of participants at baseline, the number of participants analyzed, and measures of central tendency, variability, and precision. Speci cally, whenever available, we will collect means, mean changes, the difference between means at follow-up, medians, standard deviations, interquartile ranges, standard errors, con dence intervals (and their coverage, e.g., 90% or 95%), P-values (one-or two-sides), and t statistics. These data will be used to approximate means and standard deviations when necessary (33). For continuous outcomes, we will use preferentially follow-up data but will use the mean change from baseline when follow-up values are not available (34).
Quantitative data from gures and graphs will be extracted independently using digitizing software (Digitizelt 2.2.2, Germany, https://www.digitizeit.de/). Estimates from the digitizing software will be averaged out to generate the nal value. When necessary, data for the same trial will be extracted from multiple sources (e.g., multiple related publications and trial registries).

Ongoing trials
We will summarize all identi ed ongoing trials, detailing the primary author, research question(s), methods, outcome measures, study start date along with an estimate of study completion date.

Assessment Of Risk Of Bias
Two review authors will independently assess the risk of bias in the included studies. Each domain will be classi ed as being at a low, unclear, or high risk of bias. Disagreements will be resolved by consensus or discussion with a third reviewer. The studies will be assessed by outcome level. If the trial has one or more domains with a high risk of bias, it will be considered as a high risk of bias study. If the trial has more than two domains at uncertain risk of bias, we will judge the risk of bias to be uncertain. If the trial has a low risk of bias in all domains or one domain as uncertain bias, it will be considered as a low risk of bias study. Assessment of risk of bias in e cacy and safety trials We will use criteria recommended by the Cochrane collaboration (Cochrane Risk of bias tool 1.0) (35).
The following domains will be evaluated: random sequence generation, allocation concealment, blinding of participants and investigators, blinding of outcome assessors, and incomplete outcome data (PP and ITT population analysis). To speci cally address equivalence or non-inferiority trials, we will refer to the recommendations by the US Agency for Healthcare Research and Quality (23) ( Table 1). Speci cally, we will assess inconsistent application of inclusion/exclusion criteria, patients selected for anticipated nonresponse or good response in one arm, patient behavior changes (poor adherence, use of concomitant treatments, and protocol violations), inadequate outcome measurement techniques and incomplete outcome data (PP and ITT population analysis: ITT population analysis may underestimate the treatment effect in equivalence/non-inferiority trials). More information on the criteria used in each domain can be found in the additional le 4.

Assessment of risk of bias in switching trials
For switching trials, we will use the recommendations of Moots et al. (27) and the FDA guidance for Considerations in demonstrating interchangeability with a reference product (14). The six speci cs domains to be evaluated are: Randomized and blinded design with appropriate control arms; At least 1-way switch from originator to biosimilar; The assessment of immunogenicity; The washout period between treatment; Enough power to assess e cacy and safety (equivalence phase); Enough follow-up periods.
More information about the criteria of judgments can be seen on the additional le 5.

Data synthesis
Effect size measures For binary outcomes, we will combine study estimates using the relative risk (RR) as a measure of effect. For continuous outcomes, we will use the standardized mean difference (SMD) de ned as the biasadjusted method of Hedges. SMD will be used because has similar statistical power and is more generalizable than the mean difference (36).

Meta-analysis models
Main analyses will be based on the random-effects model with the restricted maximum-likelihood estimator for the between-study variance (37). A random-effects model was prespeci ed as the primary model of analysis since we anticipated variability in the design and population characteristics of the included trials. Results for a xed-effects model (inverse-variance method) will be presented simultaneously as a sensitivity analysis.

Statistical heterogeneity
We will test for the presence of statistical heterogeneity across trial estimates using Cochran's Q test (38) and the magnitude of the between-trial heterogeneity will be quanti ed with the I 2 metric (39). When feasible (i.e., 10 or more trials), we will investigate potential sources of statistical heterogeneity with the random-effects meta-regression analysis and subgroups analysis. Explanatory variables to be included in meta-regression models are described below.

Small-study and publication biases
We will investigate the association between trial size (precision) and treatment effects in contourenhanced funnel plots, contrasting the effect estimates on the horizontal axis against their standard errors on the vertical axis, accompanied by a regression test for asymmetry. Furthermore, for continuous outcomes, small-study biases will be investigated by Egger's regression test, whereas for binary outcomes we will use Harbord's test (40). Equivalence testing Criteria to claim equivalence Equivalence will be evaluated and interpreted using prede ned margins of equivalence (Fig. 1). Upper and lower equivalence bounds were speci ed based on the smallest effect size of clinical importance. These values were computed from large placebo-controlled trials and validated by two rheumatologists with extensive experience treating patients with RA. Prespeci ed boundaries of equivalence will be applied to the primary and co-primary outcomes only.
Based on random-effects models, lower and upper con dence limits will be calculated. For a speci c outcome (i.e. either ACR20 or HAQ-DI), if the two-sided 95% con dence interval for the difference in effect is completely contained within the prespeci ed boundaries of equivalence, biosimilars and biologics will be considered equivalent. However, the rejection of non-equivalence (for both ACR20 and HAQ-DI outcomes) will be required for biosimilars to be declared overall equivalent to biologic drugs. Secondary outcomes will be examined through standard superiority tests (two-tailed). Figure 1. Boundaries of equivalence (dashed lines) for a two-sided 95% con dence interval of the treatment difference.
Panel A, equivalence margins for ACR20 (primary outcome). Panel B, equivalence margins for HAQ-DI (coprimary outcome). If the summary 95% CI lies within the gray regions, the null hypothesis will be rejected, and equivalence will be claimed.

Margins of equivalence: ACR20 criteria
For the ACR20 outcome, we prespeci ed equivalence margins to preserve 90% of the effects observed with biologics using the xed-margin method (41). Speci cally, we calculated the equivalence margins as: where PE stands for the preserve effect (range: 0 to 1 [100%]). Based on a network meta-analysis by Guyott et al. (42) that included 11 randomized trials with 3762 patients who were unresponsive to methotrexate, the RR under a random-effects model for ACR20 at 6 months for any biologics (adalimumab/etanercept/in iximab) vs placebo was approximately 1.80. Similar estimates were obtained considering a combination of both methotrexate-naïve and methotrexate unresponsive patients, in which the frequentist summary RR of achieving ACR20 was 1.81 (random-effects model, 13 trials, 7087 patients) for the comparison adalimumab/etanercept/in iximab vs placebo (43). Thus, biosimilars will be considered equivalent to biologics if the 95% con dence limits for the summary RR lies within the 0.94 and 1.06 interval ( Figure 1A).

Margins of equivalence: HAQ-DI
For HAQ-DI, which is a continuous outcome (with higher scores meaning worse function status), equivalence margins were constructed under the clinical assumption that an increase equal or larger than 0.15 units over 1 year on the 0-to-3 HAQ-DI scale is considered clinically perceptible by the patients (44 Figure 1B).

Analysis of subgroups and meta-regression
We will perform pre-speci ed subgroup analyses. When feasible, the following subgroup analysis will be conducted: Type The above-mentioned variables will also serve as explanatory variables in meta-regression models.

Assessment of Overall Certainty of Evidence
The overall certainty of evidence will be assessed by two investigators and will be based on the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system (48). Assessments will be conducted by outcome. Disagreements will be settled by consensus or discussion with a third reviewer. The certainty of evidence of each outcome will be graded as very low, low, moderate, or high.
The following domains will be assessed: Study design and risk of bias Inconsistency Indirectness Imprecision Other factors (e.g., reporting bias, publication bias).

Ranking of outcomes by their relative importance
We have adopted the recommendations from the GRADE handbook for selecting and rating the importance of outcomes (26). Speci cally, for systematic review 1 and 2, we ranked each outcome as "critical", "important but not critical", and "limited importance to decision-making" ( Table 2). The ranking was conducted through consultations with two clinical specialists (rheumatologists) and a physical therapist specialized in evidence synthesis. These professionals were invited to participate based on their clinical experience, academic background, and the lack of any con ict of interest. Brie y, before the scoping meeting, based on previous systematic reviews, we screened the list of outcomes (both primary and secondary outcomes) of 14 trials that we knew a priori that met all the eligibility criteria. Subsequently, we created an integrated list of all outcomes and categorized them into ve main domains: Disease activity/Clinical response Through an iterative approach in a single scoping meeting, each member of the collaborative working group ranked outcomes independently. Con icting ranking cases were discussed jointly until a consensus was reached.

Discussion
At the time of writing this manuscript, more than 16 biosimilars of adalimumab, etanercept and in iximab had been approved in the United States, Europe, Canada, and Latin America for the treatment of RA (49-52). Our two systematic reviews will comprehensively assess the e cacy, safety, and immunogenicity of these biosimilars compared to their originator molecules in the management of RA patients and examine the clinical consequences of switching from biologics to biosimilars for RA patients.
Of note, our two systematic reviews will apply an integrated approach by combining previously accumulated data and clinical expert information, which will enable us to evaluate the equivalence between biosimilars and reference biologics for the treatment of RA with a clinically oriented perspective.
Furthermore, by examining the randomized evidence on the effects of a wide range of biosimilars, we will be also able to address whether switching from reference biologics to biosimilars or vice and versa, in general, results in similar clinical bene ts with acceptable immunogenicity and safety pro les. As a result, our ndings hold great potential to affect not only the therapeutic regimen of RA patients that will use a DMARD for the rst time but also the treatment of those that will have their treatment substituted from a biologic to a biosimilar, or from a biosimilar to a biologic.
Overall, we expect that our results will guide clinicians, researchers, decision-makers, stakeholders, and policymakers about the e cacy, safety, immunogenicity, and substitution and interchangeability of currently marketed biosimilars for the treatment of RA patients and assist health care systems to employ more e ciently the scarce existing resources.