We describe the study methods in seven steps adapted from the 2015 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)-P reporting guidelines for systematic reviews and meta-analysis protocols [41]. The PRISMA-P checklist is provided as an additional file (see Additional file 1). As this review focuses on methodological issues rather than on health-related outcomes, it was not eligible for inclusion in the PROSPERO registry [42]. In the event of protocol amendments, we will provide the date of each amendment, a description of the change, and the rationale for the change.
Eligibility criteria
Studies will be selected from the peer-reviewed scientific literature according to the following study and report characteristics.
Study characteristics
Study designs
We will include all CRTs. For the purposes of this review, a CRT is defined as a trial in which intact social units or clusters of individuals, rather than independent individuals, are randomly allocated to intervention groups [38]. CRTs may include trials employing parallel group, stepped wedge, or factorial designs; cluster randomised adaptive trials; and cluster randomised pragmatic trials, among others. CRTs with an adaptive design allow modifications based on data accumulated following trial start, while preserving the integrity of the trial [37]. Pragmatic trials are designed to evaluate the efficacy of an intervention in routine clinical practice in order to maximise applicability and generalizability of the results of the study [43, 44].
Participants
Study participants will be human adults or children living in LMICs. LMICs will be defined according to the 2016 World Bank country classifications [45].
Interventions
This review focuses on “public health interventions”. We employ the definition of “public health” proposed in the World Health Organization (WHO) health promotion glossary as “The science and art of promoting health, preventing disease, and prolonging life through the organized efforts of society” [46]. Adapting the definition proposed by Rychetnik and colleagues, we define a public health intervention as a disease prevention or health promotion intervention applied to many, most, or all members in a community, which aims to deliver a net benefit to the community or population, as well as benefits to individuals [47, 48]. Public health interventions are distinguished from clinical interventions aimed at preventing or treating diseases at the individual level [47, 48].
In order to operationalise this definition and guide selection of specific studies, we will use the “Intervention Wheel”, a graphic model of population-based public health practice illustrated with specific examples, developed by the Minnesota Department of Health [49]. The intervention wheel provides 17 public health interventions, selected to meet five criteria. To be considered as public health interventions, interventions should as follows: (i) focus on entire populations (or particular subgroups within a population), (ii) be grounded in an assessment of community health, (iii) consider the broad determinants of health, (iv) emphasise health promotion and prevention, and (v) intervene at multiple levels [49]. We used these five criteria to aid in decisions concerning study inclusion.
According to Rychetnik and colleagues, public health interventions are inherently “complex, programmatic, and context dependent” and these characteristics raise challenges for their evaluation [47]. The assessment of intervention fidelity may be especially important for public health interventions, and this consideration underlies our choice to focus on them in this review.
Comparators
Comparators will be defined as planned per the original CRT. Given the nature of public health interventions and the pragmatic orientation of CRTs in LMICs, we anticipate that a large proportion of studies included in the review will define the comparison group as receiving the “usual care”.
Outcomes
The focus of this methodologically-oriented review is on comparisons of planned and reported outcomes related to IF. For studies that assessed IF in either the trial protocol or the main trial report, we will include both the study protocol and the main trial report. Recognising that word limits for scientific journal articles are highly constrained and that the current CONSORT reporting guidelines for CRTs do not require description of elements related to IF, we also decided to include CRTs reporting the assessment of IF in a complementary document such as a published article, an online appendix to the main paper, or a grey literature report, in lieu of reporting the assessment of IF in the main trial report. These elements will be verified by checking the bibliography for the main trial report and additional sources.
For the purposes of study selection, we considered that studies evaluated IF if they either proposed methods to assess or reported results related to the evaluation of at least one of the four key fidelity components: (1) content, (2) coverage, (3) frequency, and (4) duration. For CRTs taking an adaptive approach, we will consider if these trials respect pre-established decision rules regarding changes to their design. In addition, we will include all CRTs that reported a per-protocol analysis.
Report characteristics
Setting
Eligible studies will be implemented in LMICs as classified by the World Bank [45].
Availability of the study protocol
To ensure availability of a study protocol, we will include CRTs reporting a registration number in the abstract for any trial registry meeting the WHO criteria [50]. The WHO trial registration data set (TRDS) is an internationally agreed-upon set of items that provide information on the design, conduct, and administration of clinical trials. The WHO International Clinical Trials Registry Platform (ICTRP) facilitates the publication of the TRDS on a publicly accessible website, through a network of partner registries that have agreed to adopt the TRDS as a common standard. The TRDS will be used in this review to evaluate planned assessment of intervention fidelity, either alone, or in conjunction with a published study protocol. TRDS field 20 “Key secondary outcomes” is particularly pertinent for this assessment.
Publication dates
We will include studies for which the main trial report was published on or after January 1, 2012. We chose this date because the last update of the CONSORT Statement to improve reporting of CRTs was published in 2012, and we wanted to analyse current practices. No restriction will be applied to the publication date for the protocol.
Language
We will include studies published in English, Spanish, or French, which are languages known by the research team.
Exclusion criteria
We will exclude studies that are (i) not cluster randomised trials, studies that (ii) do not plan or report the assessment of IF and (iii) are not public health interventions, (iv) conducted in a high-income country as defined by the World Bank 2016 country classification [45], (v) are published before 2012, (vi) do not have a publicly available protocol for comparison, or (vii) for which only the protocol but not the main trial report has been published. Manuscripts will be also excluded if they are (viii) pilot studies, (ix) secondary reports of a main study for which the relevant findings were published prior to 2012, (x) studies published in a language other than English, Spanish, or French, or (xi) studies from the grey literature.
Information sources and search strategy
Literature search strategies were developed in collaboration with an academic librarian experienced in conducting systematic review searches. Search strategies use Medical Subject Headings (MESH) and text words related to cluster randomised trials, developing countries, and public health interventions. The electronic database search was developed first for MEDLINE (Ovid) (for the full search strategies, see Additional file 2) and then adapted for the following electronic databases: EMBASE (Ovid), CINAHL (Ovid), PubMed, and EMB Reviews (Ovid). Search terms are a combination of “cluster-randomized”, “cluster analysis”, “health program”, “public health service”, “health education”, “public health”, “health promotion”, “health behavior”, “health knowledge/attitudes practice”, “Preventive health services”, “health care system”, “health education”, and “developing countries”. The search strategy will span the time period from January 2012 to May 2016 and will be updated towards the end of the review. Searches will be filtered to articles concerning humans and written in English, French, or Spanish. To augment this list, we will add relevant studies suggested by members of the systematic review team. Identified records will be uploaded into the EndNote reference management software (version X7.5.3, Thompson Reuters, 2016), and duplicates will be eliminated.
Study screening and selection
Study screening and selection will be done manually within the EndNote based on the inclusion and exclusion criteria for this systematic review. To ensure the availability of study protocols, we will limit the search to CRTs that have the word stem “regist*” in the abstract and use these results to begin the process of screening and selection. We validated this procedure by examining a subset of excluded articles. Screening and selection will be done in two stages by two independent reviewers (MCP and NM). In the first stage, reviewers will independently screen the titles and abstracts of each identified reference against the inclusion criteria to eliminate irrelevant publications. In the second stage, we will screen the full text of all studies that appear to meet the inclusion criteria or for which there is uncertainty as to eligibility. For each study, we will identify additional articles of potential relevance, such as a published protocol or a process evaluation, by reviewing references from the main trial report, consulting the trial registry record, and searching the PubMed database for new publications by the lead trial author. To aid in article screening and selection, the team will develop and test a screening sheet for full-text review. Any disagreement between reviewers will be resolved through discussion and, as necessary, through arbitration by a third author (MJ). The process of study selection will be documented in a flow diagram describing studies identified and excluded at each stage. We will also provide a summary table describing studies excluded at the stage of full-text review, along with reasons for their exclusion.
Outcomes and prioritisation
The search and selection process for this review is designed to identify two quantities required for calculation of outcomes based on proportions: (1) numerator: These are studies that meet all the inclusion and exclusion criteria. As for all systematic reviews, these studies are our principal focus and will be included in the review and given detailed analysis. (2) Denominator: This is the total N for the study, which we defined as all studies that satisfy all the inclusion and exclusion criteria, with the exception of the outcome criterion (planned or reported IF assessment). It is essentially the universe of cluster randomised trials of public health interventions in LMICs. Both quantities will be clearly indicated in the study flow diagram.
Primary outcome
The primary outcome for this study will be the proportion of overall agreement between the protocol and trial report concerning occurrence of IF assessment. This corresponds to research question 4a.
Data will be summarised in a two-by-two table comparing the assessment of intervention fidelity in the trial report to that in the protocol. N represents the set of recent CRTs of public health interventions in LMICs that have registered the study protocol in a publicly availably trial registry. For each CRT in N, we will determine whether IF was assessed in the registered (or published) protocol or in the trial report (or associated documents). Studies judged to have assessed IF will be coded as “1”; others will be coded as “0”. Judgements will represent reviewer consensus (MCP and NM, with appeal to MJ in case of divergences). The proportion of overall agreement is defined as the proportion of eligible CRTs for which judgements concerning the occurrence of implementation fidelity assessment agree in the protocol and in the trial report (i.e. both positive or both negative). It will be computed as (a + d)/N.
| | Protocol | | |
| | + | − | |
Trial report | + |
a
|
b
| a + b |
| − |
c
|
d
| c + d |
| | a + c | b + d |
N
|
Secondary outcomes
To address research questions 1, 2, and 3, we will also calculate the following:
-
The frequency and proportion of trial protocols reporting the assessment of intervention fidelity, out of N
-
The frequency and proportion of trial reports reporting the assessment of intervention fidelity, out of N
-
The proportion of positive agreement among those that agree, computed as a/(a + d)
-
The frequency counts and percentages summarising fidelity components examined and data collection methods proposed or employed
To address research question 4b, for all studies included in the trial, we will also record the authors’ judgments as to whether the intervention was effective. Studies that concluded that the intervention is more effective than the control will be coded as “1”; studies that were unable to reject the null hypothesis that there are no significant differences between groups will be coded as “0”. We will calculate as follows:
-
The conditional probability that a PP analysis is performed given that the ITT analysis shows no difference between groups.
-
The conditional probability that a PP analysis is performed given that the ITT analysis shows a positive intervention effect.
These measures will be calculated using a standard formula for conditional probabilities:
$$ P\left(B\Big|A\right)=\frac{P\left(A\ \mathrm{and}\ B\right)}{P(A)} $$
To address research questions 4c and 4d, we will examine the subset of trial reports containing both ITT and PP analyses. For studies comparing several interventions (e.g. factorial design), data on each intervention will be extracted separately.
To address research question 4c, we will study the proportion of the overall agreement between the ITT and PP analyses concerning intervention effectiveness.
Data will be summarised in a two-by-two table comparing the assessment of intervention effectiveness in the ITT analysis to that in the PP (intervention fidelity) analysis. T is the total number of included CRTs reporting both an ITT and PP analysis. Studies that concluded in favour of the intervention group will be coded as “1”; those that are unable to reject the null hypothesis that there is no significant difference between groups will be coded as “0”. Judgements will represent reviewer consensus (MCP and NM, with appeal to MJ in case of divergences). The proportion of overall agreement is defined as the proportion trial reports for which judgements concerning intervention effectiveness agree in ITT and PP analyses (i.e. both positive (favour the intervention group) or both negative (unable to reject the null hypothesis of no difference between groups)). It will be computed as (w + z)/T.
| | ITT analysis | | |
| | + | − | |
PP analysis | + |
w
|
x
|
w + x
|
− |
y
|
z
|
y + z
|
| |
w + y
|
x + z
|
T
|
We will also calculate
-
The frequency and proportion of ITT analyses that conclude in favour of the intervention, out of T
-
The frequency and proportion of PP analyses that conclude in favour of the intervention, out of T
To address research question 4d, we will compare intervention effect sizes reported for ITT and PP analyses. Comparisons will be summarised as the percentage change in effect size, computed as the effect size for the PP analysis/effect size for the ITT analysis *100.
Risk of bias in individual studies
To assess possible risk of bias for included studies, we will use the Cochrane Collaboration tool to assess the risk of bias in randomised trials [51] based on the following factors: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, and other sources of bias. Because the Cochrane Collaboration tool was developed for individually randomised studies whereas our study focuses on CRTs, we will also include several additional criteria specifically relevant to assessing risk of bias in CRTs, recommended by the Cochrane Collaboration [51] and other key sources [51–53]. These additional criteria will consider issues related to the following: recruitment bias (potential for participant self-selection to occur if individuals are recruited to the trial after the clusters have been randomised); baseline imbalances (because CRTs generally randomise a limited number of clusters, chance imbalances may affect comparability of intervention and control groups); loss of clusters (complete clusters may sometimes be lost from a trial and thus be omitted from the analysis; these missing data may lead to biased outcome assessments); and unit of analysis (failure to properly account for clustering in the analysis) [51]. For each domain or criterion of interest, we will assess each criterion as low risk, high risk, or uncertain risk and provide a sample text that illustrates the reasons for this judgement. This evaluation will be done independently by two reviewers (MCP and NM). Disagreements between reviewers will be resolved by consensus or, if consensus cannot be achieved, by consulting a third reviewer (MJ). Judgements related to risk of bias will be summarised graphically using RevMan 5.1 [51]. Risk of bias assessments will be used to create categories of high-, uncertain-, and low-risk studies to be used in subgroup analyses.
Systematic reviews of health outcomes often assess the quality of a body of evidence using standardised tools such as the GRADE system [54]. However, as this review focuses on methodological issues rather than on health-related outcomes, we will not use this tool.
Data extraction and data items
Two review authors will extract data independently (MCP and NM). From each study protocol and trial report, reviewers will extract data on (i) the study characteristics (study location, aims, intervention); (ii) all applicable descriptors of the CRT trial design (for example, parallel group, stepped wedge, factorial, adaptive, pragmatic); (iii) concepts related to the assessment of IF (assessment of fidelity reported in protocol and/or main study, fidelity components and moderating factors evaluated, data collection methods, and any dimension used by the authors to evaluate intervention fidelity distinct from those proposed by Caroll and Hasson [24, 32]); (iv) whether events taking place in the control group were monitored, as these can influence the effectiveness of the intervention [27, 55, 56]; and (v) information for assessing the risk of bias of included studies. We will also extract (vi) statistical results concerning the intervention effectiveness and the authors’ qualitative conclusions regarding the intervention effect for the primary (generally, ITT) analysis and one or more subgroup analyses relevant for intervention fidelity (generally, the PP analysis). If studies investigate more than one intervention, we will extract data relevant for each comparison. To reduce bias and errors in data extraction, reviewers will use a pre-defined template pilot tested on a subset of studies and a guide for data extraction. To ensure consistency, reviewers will receive training prior to commencing extraction for the review and undertake calibration exercises. Reviewers will resolve disagreements by discussion and by appeal to a third author (MJ) where necessary. All data extraction tools will be available as online supplementary documents.
Data synthesis
Results will be presented in accordance with the PRISMA Statement [41]. A narrative synthesis will be provided, with information presented in tables to summarise key data. The narrative synthesis will explore relationships and findings within and between the included studies. It will highlight the four key dimensions of intervention fidelity identified from the literature (content, coverage, frequency, and duration), moderating factors for intervention fidelity (participant responsiveness, comprehensiveness of policy, strategies to facilitate implementation, quality of delivery, recruitment, and context), any new dimensions explored, and data collection method used to evaluate each key dimension.
We will present quantitative data for all primary and secondary outcomes proposed. Where appropriate, data will be presented in tabular form.
We will investigate the possible sources of heterogeneity by performing subgroup analysis. Specifically, we will recompute the main quantitative outcomes for subgroups of studies with high, uncertain, and low risk of bias to better understand potential sources of variation in results. If the data permit, we will conduct a sensitivity analysis to explore whether studies at lower risk of bias undertake more comprehensive assessment of intervention fidelity. Because of the study question and the nature of the outcomes assessed, we do not intend to perform meta-analyses.
Planned assessment of meta-biases
We recognize that data may be biased due to non-study-related processes and plan to assess specific meta-biases. This study compares results for protocols and published trial reports, and is thus designed to address potential reporting bias and to investigate potential outcome bias. As our review focuses on methodological issues rather than on outcome assessment, we will not assess potential publication bias.