This planned systematic review has a priori been registered with PROSPERO (CRD42021232469). This protocol is being reported in accordance with the reporting guidance provided in the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols (PRISMA-P) statement  (see checklist in Additional file 1). In reporting the methods and results in the final report, we will follow the PRISMA 2020 statement , and the extension for reporting network meta-analysis of health care interventions . This protocol has been developed following Cochrane guidance for preparing a protocol for a systematic review with multiple interventions . Any amendments to this protocol will be stated in the report of this systematic review.
We will include randomized controlled trials (RCTs) and cluster-RCTs fulfilling the following criteria.
Types of participants
We will include studies investigating populations of
People within a formal employment-related context. We expect eligible studies to include participants aged 15 or older (reflecting the working population). However, in recognition of the known variability across settings and countries, we will not apply any explicit age limits. No restrictions regarding type of work will be made; however, military settings will be excluded due to their special work requirements and the anticipated difficulty of comparing them with other occupational groups.
People without back pain at baseline. In recognition of the widespread prevalence and recurrent nature of back pain in the working population , we anticipate the inclusion of mixed samples (i.e., participants with and without back pain) in many otherwise eligible studies. We will therefore also consider studies including a proportion of participants with back pain, as long as all participants were able to work at baseline, or, where this is unclear or not reported, the back pain was, per definition of the study authors, of mild severity. We will exclude studies specifically designed to investigate people with back pain, i.e., studies in which all participants initially had back pain and/or were unable to work.
Types of interventions
We will include studies investigating work-related interventions aimed at preventing back pain and specifically designed for and delivered within a workplace setting. We plan to include the following types of interventions (categorization is based on the findings of two previous systematic reviews [9, 18]).
Exercise programs (e.g., including strengthening exercises, endurance training, or flexibility training) [9, 18]
Education (e.g., back schools, ergonomic training, advice on behavior change, or provision of information material) [9, 18]
Ergonomic aids and adaptations (e.g., workplace adjustments, lifting aids, or scheduled breaks ) [9, 18]
Orthoses, bandages, shoe insoles, or similar aids worn on the body [9, 18]
Stress management interventions 
Multicomponent interventions  including two or more of the interventions mentioned above (e.g., exercise program plus education).
Eligible comparators include no intervention/usual care or any of the interventions mentioned above. We plan to include these interventions in the NMA clustered according to the categories listed above. The interventions may be supervised or unsupervised and may be delivered individually or in groups. We will also include studies comparing different modes (e.g., different durations, frequencies, or intensities) of an intervention. Whether different modes can be considered separately for the NMA will be decided based on the available studies . There may be other interventions relevant to our research question of which we are not aware at the time of writing this protocol . We therefore plan to include relevant additional interventions “post hoc” in our NMA if we consider them to be comparable with the other pre-specified interventions .
Types of outcome measures
The primary outcomes of interest for our review are the following:
Numbers of participants with at least one new episode of non-specific back pain.
Intensity of non-specific back pain, measured with, e.g., a numeric or visual analog scale (NRS or VAS).
Ability to work:
Numbers of participants put on sick-leave due to their back pain and/or due to any cause.
Numbers of days with work absenteeism due to back pain and/or due to any cause.
We define non-specific back pain as pain in any part of the back, e.g., low back or neck pain, without a known specific pathology such as trauma, inflammatory conditions, or neoplastic causes . The pain may extend from the back to other parts of the body, e.g., from the neck to the shoulder.
The secondary outcomes of interest for this review are the following:
Intervention-related adverse events, such as injuries or temporary soreness, defined as the numbers of participants who experienced an adverse event.
Self-reported satisfaction with the intervention, measured, e.g., with a Likert-type scale.
Information sources and search strategy
We will search the following electronic databases from their inception onwards: Cochrane Library (CENTRAL), MEDLINE via PubMed, Web of Science, CINAHL, PsycINFO, PEDro, SPORTDiscus, and Academic Search Premier. A sensitive search strategy was developed, using search terms (both MeSH terms and relevant keywords) related to the population/health problem of interest (back pain) and intervention setting (workplace). An example for the PubMed/MEDLINE search strategy can be viewed in Additional file 2.
We will conduct all searches without applying any language restrictions. We will, however, restrict the inclusion of studies to reports published in English or German. We will document any potentially relevant studies published in any other language for which an English title or abstract is available that allows for a preliminary judgement. We will exclude studies for which only a conference abstract or poster is available.
Searching other resources
We will check the reference lists of all eligible studies and of relevant existing systematic reviews for further relevant studies. Furthermore, we will conduct searches of the following sources to identify ongoing or yet unpublished studies: The International Clinical Trials Registry Platform of the World Health Organization (ICTRP), the German Clinical Trials Register (DRKS), and ClinicalTrials.gov. Additionally, we plan to contact experts in the field to enquire on the availability of further relevant studies.
Data collection and analysis
Selection of studies
The selection process will be conducted in compliance with international standards  and using appropriate software (e.g., Endnote for the management of references and Covidence  for the screening process). Two review authors will independently assess potentially eligible trials for inclusion through a two-stage approach (screening of titles/abstracts followed by screening of potentially relevant full texts). Any disagreements at each stage will be resolved through discussion and involving a third review author, where needed.
Data extraction and management
Two review authors will independently extract data using a purpose-developed and piloted data extraction sheet. We plan to extract the following key details for each included study: first author and publication year (study ID), study design, country, study duration (from the first enrollment of participants to the last follow-up), sample sizes in intervention groups, participants’ age, gender and highest level of education, type of work, work setting, description of interventions and comparisons, outcomes (including information on measurement instruments, assessors and timing of measurement), and results for each outcome of interest and each study group (for continuous outcomes: mean values with standard deviations (SDs) or standard errors (SEs), mean differences (MDs) with 95% confidence intervals (CIs) or p values; for dichotomous outcomes: numbers and percentages of outcome events, risk ratios (RRs) with 95% CIs, or p values). We will further extract data on funding sources and potential conflicts of interest for each included study. Any discrepancies will be resolved through discussion, or, if needed, by involving a third review author. In case of multiple reports for a single study, we will aggregate the available information.
If repeated measurements of the same outcomes are reported, we will extract outcome data for all time points available. We expect high variability in the length of follow-ups between the original studies. As combining outcomes assessed at different time points might be not informative , we plan to divide the follow-up measurements into appropriate categories, i.e., short term (< 6 months from baseline), medium term (6 to < 12 months from baseline), and long term (≥ 1 year from baseline) . We plan to conduct our main analysis using the long-term outcomes. The remaining follow-up categories will be considered in sensitivity analyses. Depending on the available measurement points in the original studies, we may decide to choose a different follow-up category for our main analysis in order to include more data. This decision will be made before any meta-analyses are conducted.
Assessment of risk of bias in included studies
Each included study will be independently assessed for risk of bias (RoB) by two review authors using the revised Cochrane tool for assessing RoB in randomized trials, RoB 2 . The RoB assessment will be conducted and documented separately for each outcome of interest. The RoB 2 tool comprises five RoB domains: (1) bias arising from the randomization process, (2) bias due to deviations from intended interventions, (3) bias due to missing outcome data, (4) bias in measurement of the outcome, and (5) bias in selection of the reported result . The RoB 2 assessment of cluster RCTs includes an additional domain: bias arising from identification or recruitment of individual participants within clusters . RoB is rated for each domain, and the ratings for all domains are then used for an overall RoB judgement, which may be “low RoB”, “some concerns” or “high RoB” . Any discrepancies in ratings between the review authors will be discussed and resolved, if necessary, involving a third review author.
Measures of treatment effect
We will use RRs with 95% CIs for dichotomous data and MDs with SDs for continuous data. In cases where an outcome has been obtained using different measurements (e.g., pain measured using different pain scales), we will use Hedges’ g as standardized mean difference (SMD). Where available, we will give preference to mean change scores from baseline to follow-up over mean scores at follow-up.
Unit of analysis issues
For cluster RCTs, we will check whether study authors applied an appropriate analysis method to account for clustering, such as an analysis of covariance that takes account of baseline cluster differences . If this is not the case, we will adjust the data as described in the Cochrane Handbook .
Dealing with missing data
We plan to contact study authors in case of missing or unclear data. In cases where SEs are reported instead of SDs, we will convert them to SDs . In cases where neither SDs nor SEs are available, we will calculate SDs using reported CIs or p values as described in the Cochrane Handbook . In cases where none of the procedures described above are possible, we will estimate missing SDs using a validated imputation technique based on the reported SDs of the other included studies .
Assessment of risk of publication bias
To assess risk of publication bias, we plan to examine funnel plots for each pairwise meta-analysis and comparison-adjusted funnel plots for each NMA . Additionally, we plan to conduct Egger’s linear regression test for funnel plot asymmetry . We will only assess publication bias for outcomes for which at least 10 studies are available .
In pairwise meta-analyses, the effects of an intervention versus a control condition are directly compared . Since work-related preventive interventions are diverse and often contain several intervention components , an alternative approach is needed that allows comparison of multiple intervention strategies. We therefore plan to perform a NMA to combine all available evidence from direct and indirect comparisons across a network of trials .
We plan to conduct three steps of analysis:
First, we will conduct pairwise meta-analyses for all direct comparisons of interventions with other interventions or comparators that have been investigated in at least two original studies. In cases where study effects vary considerably in terms of the direction and/or size, we may, though, decide against meta-analysis and for a descriptive analysis. Based on the expectation that the intervention effects of the included studies are likely to vary to some extent, we plan to conduct the meta-analyses using a random-effects model. The ultimate decision about the most appropriate approach for each analysis, though, will be made based on the characteristics of the available studies and their observed effects, i.e., in particular based on the number of available studies for the meta-analyses and the extent of variation in the direction and size of the observed effects. Separate meta-analyses will be conducted for each pre-specified outcome. We will present the results of each pairwise meta-analysis using forest plots. Statistical heterogeneity among the studies will be investigated using Cochran’s Q test and the I2 statistic . For the interpretation of the I2 statistic, we will follow the guidance provided in the Cochrane Handbook . If we find considerable statistical heterogeneity for a direct comparison, we will not perform meta-analysis and instead conduct a descriptive analysis for the respective comparison.
In a next step, we plan to conduct NMAs using a frequentist approach . We will include all available interventions characterized according to their content (as described above). Comparisons for which no meta-analysis will be performed (due to the reasons mentioned above) will be excluded from the NMAs. The network structure for each outcome will be illustrated using network graphs . The size of each node in the network graph will correspond to the total number of participants allocated to the respective intervention and the width of each line will correspond to the number of studies contributing to the respective direct comparison . Network graphs with colored edges according to quality items will be generated to evaluate potential biases . Quality items will include the domains of RoB 2  described above. In a league table, we will provide the summary effect size along with its 95% CI for each available comparison of interventions . For each NMA, we will calculate P scores for interventions, which are a frequentist version of the surface under the cumulative ranking curve (SUCRA) . P scores may take values between 0 and 1, with 0 indicating that an intervention is “always worst” and 1 indicating that an intervention is “always best” compared to the other interventions considered . Interventions will be ranked according to their P scores for each outcome considered.
To evaluate the effects of single components in multicomponent interventions, we plan to perform an additional CNMA . In classical NMA, combined interventions such as exercise program plus education are considered as independent interventions (i.e., exercise program plus education = intervention AB) . CNMAs break down interventions into their individual components (e.g., exercise program = intervention component A; education = intervention component B) . The additive CNMA model assumes that the effect of a combined intervention AB is the sum of the effects of the individual components A and B . We will apply the additive model to investigate the relative effects of intervention components of the included studies.
All analyses will be performed with the statistical software R using the R packages meta  and netmeta .
To investigate possible differences in the effects of interventions for specific subgroups, we plan to perform NMAs for the following subgroups:
Job exposure: we plan to determine job exposure in a two-step procedure: first, we will classify the occupations of the study populations using the International Standard Classification of Occupations (ISCO-08) . Participants’ overall job exposure (including physical and psychosocial exposure) will then be rated using the Job Exposure Matrices (JEM) for ISCO by Kroll  resulting in ratings of either low, medium, or high. The JEM have been shown to be a valid instrument to classify work demands in the context of health science .
Intervention duration: for this subgroup-analysis, we plan to categorize interventions according to their total duration, e.g., < 1 month, 1–3 months, > 3 months.
Baseline back pain: we plan to differentiate between samples without back pain at baseline and mixed samples including both, people with and without back pain at baseline.
Localization of back pain: for this subgroup-analysis, we will subdivide studies according to the reported localizations of back pain, measured as an outcome of the intervention (e.g., low back pain and neck pain).
Gender: this subgroup-analysis will include all studies that exclusively investigated a specific gender or reported gender-specific subgroup-analyses.
Furthermore, we plan to conduct NMA regressions for mean age and the proportion of female participants to explore the impact of these possible moderator variables using WinBUGS .
To assess the impact of RoB on our results, we plan to conduct sensitivity analyses in which we will exclude studies with high RoB in at least one domain from the NMAs. To account for a possible impact of follow-up duration, we plan to conduct additional NMAs for the remaining follow-up categories described beforehand, i.e. short term and medium term.
Assessment of transitivity
The assumption of transitivity, also referred to as “similarity”, implies that studies comparing different sets of interventions are sufficiently similar to allow indirect comparisons (i.e., comparing two interventions via a third one) [16, 38]. To detect potential intransitivity, we will assess the distribution of possible effect modifiers across all direct comparisons prior to conducting NMA , e.g., age, gender, duration of interventions, mode of delivery of interventions (e.g., supervised or self-directed), and study setting. If distributions are comparable across the available direct comparisons, we will assume that the assumption of transitivity is met.
Assessment of consistency
Consistency indicates that results of direct and indirect comparisons are in agreement so that they can reasonably be combined in a NMA . We will evaluate consistency using the node splitting approach that separates comparisons into direct and indirect information from each node . Furthermore, we will consider a design-by-treatment interaction model to test for design inconsistency in the whole network . Any inconsistency will be explored using subgroup analyses or meta-regressions as described before. In case of substantial unexplained inconsistency, we will not report results of the respective NMA.
Rating the certainty of the evidence
We will apply the GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach to rate the certainty of the evidence derived from our NMAs [51, 52]. Following the approach for rating NMA results described by the GRADE working group , the certainty of direct effect estimates will be rated in a first step, considering RoB, inconsistency, indirectness, and publication bias. These ratings will inform the rating of the indirect estimates; additionally, intransitivity will be considered for indirect evidence . Based on the ratings of direct and indirect evidence, the certainty of the network effect estimates will be rated [51, 52]. In case of considerable imprecision or incoherence, we will rate down certainty of the network estimates [51, 52]. Applying the GRADE approach results in four possible levels of certainty: high, moderate, low, and very low [51, 52]. The rating will be conducted independently by two review authors for each outcome considered. Any disagreement will be discussed and resolved by consensus, if necessary with a third review author. If no NMA can be performed, we will only rate the certainty of the evidence of our pairwise meta-analyses.