It was often difficult to identify ‘included’ studies and much deduction was needed in explaining why some primary studies may not have been included in a specific review.
We found little overlap of included studies within the eight reviews, despite the similarity of the research question. Studies with multiple publications were more likely to be included in reviews than shorter term studies which generated single publications. The results of studies with multiple publications were also more likely to be reported differently by different review authors.
Although search strategies in the majority of cases did not meet our quality threshold, the inclusion criteria of the reviews appeared to justify the lack of inclusion of specific primary studies. Unsurprisingly, it was easier to explain the exclusion of studies in better quality reviews, as they had clearer inclusion criteria and search strategies.
Reviews of longitudinal and multi-stage interventions were more likely to find larger studies, but less likely to report their findings comprehensively because these are dispersed across many publications, not all of which were necessarily reported.
Discrepancies in findings did not lead to discrepancies in conclusions. This may be because it is particularly challenging to show an impact arising from complex interventions and reviewers tended to be cautious with their interpretations.
There was little cross-citation between reviews and only the lower quality reviews cited other reviews in our analysis.
It was possible to explain why all non-included studies were absent from the systematic reviews, but more difficult to do so for the non-systematic reviews. (Since two out of the three systematic reviews were ‘empty’ we were unable to compare differences in terms of how reviews of different quality treated their included studies.)
To some extent, we were surprised by our findings. We had expected to find greater overlap between reviews and, where overlap was limited, diversity in findings. The similarity in findings can be explained by the fact that no reviews found compelling evidence of effectiveness in the studies they included; they were all therefore cautious in their conclusions. This finding echoes the results of a similar study, that, even though the scope and quality assessment methods employed in health promotion reviews differed, this is ‘unlikely to divide opinion radically about effectiveness amongst cautious reviewers’ . In contrast, two reviews with a similar research question came to very different conclusions about the effectiveness of interventions for childhood obesity . In these reviews, conclusions were based on the results of randomized controlled trials (RCTs) and it may be that reviewers tend to be more cautious, and therefore their conclusions less divergent, when interpreting observational data.
The lack of overlap of primary studies warrants further examination, because it cannot be explained (entirely) in terms of deficiencies in the search strategies of the reviews, but rather seems to be due to differences in the scope (inclusion criteria) of the reviews which relates to heterogeneity in their review questions. This finding is consistent with other methodological studies that found that many apparent inconsistencies in the citation and selection of primary studies, especially non-RCTs, could be attributed to differences in inclusion criteria and outcome assessments of the reviews (rather than being due primarily to problems in their search strategies) [65, 66]. Even though we had determined our sample of reviews to be as similar as possible in scope so that we could investigate overlap, in practice, the scope of the reviews did not overlap very much. This has important implications for the utility of reviews to inform policy and practice.
First, in areas where evaluation and impact measurement is known to be difficult and where research and policy interest is relatively recent, it is likely that the findings of reviews will reflect uncertainties in the primary studies and be less enlightening about the substantive topic. Review conclusions can only ever be as good as the available data on the topic ; this was certainly the case in the reviews that we examined. Across the topic of community interventions to promote physical activity, reviews were necessarily cautious in their findings because of uncertainties in the evidence base. While this is useful for researchers and research commissioners to know, it is less useful for people involved in determining policy and practice.
Second, dealing with linked publications (multiple publications from the same study) was complicated and confusing, both for ourselves and seemingly for the reviewers of our eight included reviews. To improve fidelity of reporting and ensure that all relevant evidence informs review results and conclusions, it is important to identify all publications from studies with multiple or staged evaluations. We therefore recommend that study authors aid researchers by clearly citing all previous and intended work in each publication and that this is also something that editors check before publication. Larger studies might consider keeping a website for the study which details all related publications (as some already do). Reviewers can search for multiple publications from a study by searching for papers by authors, studies and research groups that feature in the provisional list of included studies for the review. In order to build on existing knowledge, review authors should search for existing relevant reviews in the area and use this knowledge to contextualize their aims and findings. Inclusion (and citation) of relevant reviews will also help direct readers to relevant resources.
The study has also highlighted some of the unavoidable complexities that face potential users of systematic reviews. We placed ourselves in a hypothetical situation, but one that is similar to that faced by many policymakers and practitioners who would like their decisions to be informed by evidence; for example, a newly formed Health and Wellbeing board in the UK, tasked with reducing obesity among young people, might well want to examine what works in terms of promoting physical activity. If they used the map of community interventions and identified these eight reviews as being relevant, they would find that: while all the reviews were about the promotion of physical activity, they each had a particular ‘angle’, which determined the range of research they included; where the same studies were included in reviews, their findings were not always reported consistently; the concept of ‘community’ was often discussed in reviews, but there were also differences in its conceptualization; and on the whole, the reviews did not position themselves as contributing to a wider evidence base around the promotion of physical activity (as evidenced by the lack of inter-citation between them).
There was an inevitable tension in this analysis between a narrowness that ensured that all reviews were on exactly the same topic, and a breadth that ensured all potentially relevant reviews were included; the same tension concerning homogeneity of focus as exists in many systematic reviews in public health. Given that most public health decisions are about identifying solutions to a problem (in this case, increasing levels of physical activity), obtaining a range of reviews is to be expected; and the question that this paper begins to unpick emerges: ‘how coherent is the picture that emerges?’
Reviews which give a limited ‘slice’ of the evidence are extremely valuable if the policy/practice question is closely aligned to the scope of the review, but are less useful if they give only a partial picture. In our topic area however, even with the findings of all eight reviews at our disposal, we would not be confident that we were building on the results of all research about community interventions to promote physical activity, because each review contains a limited portion of the evidence and there may well be relevant studies that fall outside the scope of any of our reviews. (We should reiterate the point made above, that systematic review methods are developing quickly, and that some of these ‘gaps’ may now be filled.)
The above points relate to wider and unsolved issues about the amount of ‘work done’ in a review . Some reviews have a relatively narrow focus, undertaking a detailed look at a relatively small area; there is additional ‘work’ to be done by users in identifying a range of such reviews and ‘synthesizing’ them to inform their particular decision. Other reviews are broader in scope which means that, potentially, less ‘work’ needs to be done by their users, though there is a tension between achieving both breadth and depth in the same review the risk being that broad reviews may suffer from a lack of focus and be deficient in essential detail . While a detailed discussion of these issues is beyond the scope of this paper, we have highlighted areas within which review authors might usefully assist potential users.