How explicable are differences between reviews that appear to address a similar research question? A review of reviews of physical activity interventions

Background Systematic reviews are promoted as being important to inform decision-making. However, when presented with a set of reviews in a complex area, how easy is it to understand how and why they may differ from one another? Methods An analysis of eight reviews reporting evidence on effectiveness of community interventions to promote physical activity. We assessed review quality and investigated overlap of included studies, citation of relevant reviews, consistency in reporting, and reasons why specific studies may be excluded. Results There were 28 included studies. The majority (n = 22; 79%) were included only in one review. There was little cross-citation between reviews (n = 4/28 possible citations; 14%). Where studies appeared in multiple reviews, results were consistently reported except for complex studies with multiple publications. Review conclusions were similar. For most reviews (n = 6/8; 75%), we could explain why primary data were not included; this was usually due to the scope of the reviews. Most reviews tended to be narrow in focus, making it difficult to gain an understanding of the field as a whole. Conclusions In areas where evaluating impact is known to be difficult, review findings often relate to uncertainty of data and methodologies, rather than providing substantive findings for policy and practice. Systematic ‘maps’ of research can help identify where existing research is robust enough for multiple in-depth syntheses and also show where new reviews are needed. To ensure quality and fidelity, review authors should systematically search for all publications from complex studies. Other relevant reviews should be searched for and cited to facilitate knowledge-building.


Background
One of the principles underpinning evidence informed policy and practice is that of knowledge accumulation: that we do the most good, and avoid harms, by basing our decisions on systematic reviews of high quality research [1]. Systematic reviews can synthesize a large amount of sometimes conflicting evidence and can therefore be a potentially important influence on practitioner and policy-makers' decisions [2].
However, how suitable are systematic reviews for informing decision-making when using reviews that were not commissioned for that specific decision? The applicability of review findings has been called into question recently, with some reviews being criticized for lacking the context-specific detail that is essential to translate their findings to specific practical situations [3]. Equally important is the question of whether systematic reviews can be relied upon genuinely to reflect the state of the evidence base. To do this they must: (1) ensure that all relevant studies are identified through the use of exhaustive search strategies; and (2) ensure that their conclusions are based on reliable studies.
In this paper we examine eight reviews of community interventions to promote physical activity in order to investigate the issue of comprehensiveness and reliability in reviews and to consider the problem of applicability; if we were a practitioner wishing to use these reviews to inform our practice, how confident can we be that our decision would be based on all the available evidence and that the conclusions drawn were reliable? And, while the reviews we may seek to draw upon will all appear to address similar questions, are we able to mediate differences between them?
In essence, we placed ourselves in the hypothetical position of wanting to identify evidence about 'what community interventions work' to promote physical activity among children to inform our decision-making. Using a systematic 'map' of reviews we selected a set of reviews that are ostensibly all about the same broad issuethat of community interventions to promote physical activity. Our confidence in the evidence base as portrayed by the reviews would be increased if we could see how they each contributed to an overall understanding of the field; and if reviews addressing the same question identified the same studies and treated them in the same way. If they did not, we might worry that other, equally relevant, studies were missing too, and without considerable effort on our part, we would have no way of mediating between conflicting findings.
We wished to explore any differences between reviews in terms of the studies they included. While there may be legitimate reasons for reviews on the same subject not containing the same studies (for example, differences in scope or population) it may be that it is difficult to understand the inclusion or exclusion of certain studies purely in terms of the scope of reviews; where there are differences between reviews that cannot be explained by their having different scopes or purposes, these differences might be explained in terms of differential review quality. In addition, we would hope that reviews which included the same studies would report the same results from those studies, draw the same conclusions, given their similar scope, and reach similar conclusions about the effectiveness of community interventions for physical activity.
One of the justifications for systematic reviews is encapsulated by the concept 'knowledge accumulation'; that new research should build on previous work and say how it contributes to existing knowledge. Additionally, locating new systematic reviews in the context of other reviews should facilitate the process of piecing together knowledge from multiple reviews in order to inform practical decisions. To see how far the eight reviews facilitated this, we investigated how far the reviews cited one another, since inter-citation may be taken as evidence that reviewers were both aware of previous work and sought explicitly to advance the state of knowledge in their area.
Since it is not always clear whether a review is systematic or non-systematic, and given that literature reviews are often commissioned to inform decisions, we included all types of literature in this area (not just systematic reviews) and assessed the relationship between review methods and included data, reporting, and conclusions.
Our research questions were: 1. To what extent do reviews answering a similar research question include the same primary studies? 2. Where reviews do not contain the same studies, is this explicable in terms of differences in their scope? 3. How similarly do reviews answering a similar research question report the results of the primary studies they have in common? 4. To what extent do reviews answering a similar research question draw the same conclusions? 5. To what extent do reviews answering a similar research question cite other reviews on the same topic? 6. Does the methodological quality of reviews answering a similar research question help us to understand any differences between included studies in terms of results and conclusions?

Methods
Identifying reviews which answer similar research questions In 2008 we published a systematic map of reviews on 'Social and environmental interventions to reduce childhood obesity' which included 33 reviews about the impact of upstream or 'social and environmental' interventions on eating, physical activity, sedentary behavior, and/or associated attitudes [4]. This map included reviews about physical activity (or sedentary behavior), healthy eating (or weight management) with an OECD country focus, and which included children in their topics of focus. In order to investigate study overlap between reviews, we needed a sufficiently large sample of reviews that were as homogenous as possible in terms of their topic areas. We therefore used a subset of the reviews in the above 'map' to be the focus of our investigation. This subset of reviews investigated the effectiveness of community interventions to promote physical activity (either alone or in combination with healthy eating). There were 16 such reviews in the above 'map' but in order to maximize comparability of research question and scope of the reviews, we excluded reviews which: 1. only had very little effectiveness data (for example, were primarily a description of funded interventions); 2. had inclusion criteria that restricted the population in terms of ethnicity, race, or age (for example, only included studies about Aboriginal/Torres Strait Islander people); 3. did not draw any conclusions about physical activity.

Methodological quality of the included reviews
As discussed in the background, systematic reviews are promoted as an important means of ensuring decisions are informed by reliable research evidence. Unfortunately, while some reviews may describe themselves as 'systematic' they may not be; likewise some reviews are systematic without necessarily being described as such. We therefore assessed the quality of the reviews using the AMSTAR quality assessment tool [20] to assess the degree to which they met accepted standards for being systematic reviews (broadly examining their reporting of their inclusion (and/or exclusion) criteria; their search strategy; synthesis methods; quality assessment; details reported about included studies; and quality assurance measures (that is, screening and/or data extraction and/ or quality assessment of studies completed independently by two reviewers (at least in part) and differences resolved) ( Table 1). This tool was developed as a result of a systematic survey of other review quality assessment tools and a consultation exercise; it therefore identifies what are widely held to be the most important characteristics of systematic reviews. We classified reviews which have clear inclusion criteria, an adequate search strategy and quality appraisal of included studies as 'systematic'. We included non-systematic reviews as they are often used for the same purposes as systematic reviews, and are frequently commissioned to inform policy.
Given the challenges of locating data for public health interventions [21,22], we went beyond the AMSTAR criteria and only judged a search strategy to be adequate if the authors reported all of the following: searching more than two databases using both free text and thesaurus terms; searching at least one topic specific database or journal (such as those relating to physical activity, obesity, eating or food, or public health (the scope of the original map)); and using at least one non-database search source (internet searching, website searching, contacting experts, checking reference lists, or handsearching key-journals). Where there was no mention of the quality indicator or where it was unclear, we assumed the quality indicators were not present.

Identifying the studies included in the reviews
We compiled a list of all the studies that were included in the above reviews. We determined whether a study had been 'included' in a review by assessing whether it had its findings about the effectiveness of a community intervention reported by the review authors. We defined 'findings about effectiveness' as being any report of the impact of a social and environmental change or any report of an observational comparison between populations with and without a specific social and environmental factor (for example, access to walking paths). We were broad in our interpretation of 'social and environmental' and only excluded evaluations of purely educational interventions delivered exclusively in the workplace or classroom. Outcomes relevant to 'physical activity' were defined as any measure of activity, sedentary behavior, knowledge, or beliefs, or body-weight, BMI, or energy intake, following an intervention with a physical activity component.

Analysis
We entered each included study onto our review management software EPPI-Reviewer [23] and coded the studies according to: (1) the reviews in which they were included; and (2) whether there was an obvious reason for the study's exclusion from certain reviews, based on information available from the inclusion criteria of each review and the abstract of the included studies. In two cases, the abstract was not available (one study was very old [24] and the other was a conference abstract without any details except the title [25]). We excluded these two studies from the analyses that relied on the abstract. Researcher judgment was needed to determine whether there was a likely reason for exclusion, especially where inclusion criteria were not clearly reported. Despite overlap in scope, the reviews answered different research questions (Table 1). With this in mind, we only classified the reasons for non-included primary studies as 'unclear' ( Table 2) if we could not discern any reason at all, based on their date, scope, and inclusion criteria, why they were not included in the review. In addition, and based on a detailed reading of the full text, we described how each review reported the results of the included studies and summarized the conclusions that each review drew about the effectiveness of community interventions for promoting or increasing physical activity. Finally, we checked the reference lists of each review to establish Although these reviews did not contain any included primary studies at all, the review authors stated that they had intended to carry out a formalized data extraction process and quality assurance measures. the frequency with which the reviews cited other relevant reviews. As manuscripts are submitted many months before publication, we judged that when publication dates were within 1 year of each other, reviews were not necessarily able to cite one other.
Multiple publications arising from one study were analyzed as a group (that is, our unit of analysis was the included study rather than publication). We found that three studies that had generated multiple publications were included in the eight reviews (Table 3).

Quality assurance
Data about, review quality, research question, and scope (Tables 1 and 4) were extracted as part of the original project [4]. These data were independently extracted by two researchers and discrepancies resolved by recourse to the original publications or, in some cases, to a third reviewer. Identification of 'included' studies in the eight reviews was also carried out independently by two reviewers and differences resolved by discussion and consensus. All other analyses were conducted by one reviewer with quality assurance checks conducted by a second reviewer on a subset of data.

Results
A total of 28 primary studies in the eight reviews met our criteria for being 'included studies' [24][25][26]35,39,. Twenty-six of these studies (93%) had an abstract available. In many cases, especially with less highquality reviews, it was difficult to judge which studies were 'included' (author using result used to answer questions about effectiveness) and which studies were referenced for another reason.
To what extent do reviews answering a similar research question include the same primary studies?
There was little overlap between data included in the eight reviews: the majority of primary studies (n = 22/28; 79%) were only included in one review; four studies were included in two reviews and two of the studies that had generated multiple publications were included in five reviews (Table 4). Of the six studies which were included in multiple reviews, four [39,[61][62][63] were included in two reviews and two were included in five reviews [26,35].
Where reviews do not contain the same studies, can we explain why not?
For most of the 26 included studies with an available abstract, it was possible to justify why primary studies had been excluded from each review, although this involved a high degree of reviewer deduction ( Table 2). Systematic reviews had fewer inexplicable exclusions of studies: it was possible to explain the absence of primary studies in the three systematic reviews. The reason for exclusion was usually research design of the primary data (some reviews specified controlled trials, of which there are few in this field) or outcome ( Table 2).
As we could usually justify why primary studies were not included in reviews, the limited overlap between included primary studies might also be due to slight variations in scope and inclusion criteria (Table 1) rather than only to inadequacy of search strategies.  [24,25,[41][42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60] Included in 2 reviews 4 [39,[61][62][63] Included in 3 reviews 0 Included in 4 reviews 0 Included in 5 reviews 2 [26,35] Total studies included in any review The unit of analysis is the study (not publication). Where a study has multiple publications and review authors have included at least one of the publications, we have judged this study to be 'included' (Table 3). b This table is based on 28 primary studies, including the two without an available abstract. Table 3 Studies with multiple included publications (the first publication in the list is the one that has been used to reference the study in the text and tables above) Young, 1996 [38] Welsh heart project Tudor-Smith, 1998 [39] Nutbeam and Catford [40] This is a list of all publications for each study that are cited by the included reviews. It does not represent a complete list of all publications from each study.
How similarly do reviews answering a similar research question report the results of the primary studies they have in common?
We were able to analyze similarity in reporting of primary study results for six studies which were included multiple times in five reviews (Table 5). Results were reported similarly by different review authors for the three studies which generated only one publication. However, for the remaining three studies (Welsh heart project, Minnesota Heart Health Programmed, and Stanford 5 City; Table 5), there were discrepancies between results reported by different review authors in terms of effectiveness data, subgroup analyses, and emphasis. These studies were conducted over a longer time period, with staged and multiple evaluations and, in one case, adaptation of intervention for subgroups. None of these three reviews referenced the same combination of publications generated by the two studies with multiple publications (Table 5).
To what extent do reviews answering a similar research question draw the same conclusions?
Despite the low levels of overlap of included studies in the eight reviews, the conclusions of the reviews were similar ( Table 6). All review authors made cautious claims about the effectiveness of interventions in this field for increasing physical activity behavior. All reviews except for one concluded that there was limited or no evidence of effectiveness for increasing physical activity. This one review concluded that there was evidence of effectiveness in all studies but that the size of the impact was very modest [16]. Where authors discussed subgroup effects it was either to highlight a need for evidence in this area or to suggest that targeting interventions was likely to be a promising avenue for future interventions [12,16]. Five authors drew conclusions specifically relating to the quality and methods of the evidence. Four of these authors reported that good quality evidence was limited or lacking [13][14][15]18]. Additionally, Dobbins and Beyers suggested that there was good quality but very complex evidence [12]. The three authors who gave clear explanations of their findings [12,13,16] suggested that a lack of strong evidence for the positive impact of community interventions for physical activity might be at least partly due to difficulties in measuring impact and/or design problems such as small sample size. All authors concluded that we should not abandon community interventions to increase physical activity. Instead, they recommended that more research was needed and most gave specific recommendations.
To what extent do reviews answering a similar research question cite other reviews on the same topic?
There was little citation of the eight reviews by one another. Only three reviews [16,18,19] cited any other of the reviews. Of a possible 28 instances where the eight reviews could have cited one other (once date of publication had been taken into account), there were only four instances of citation ( Table 7). The four instances of citation were of the same two non-systematic reviews, one of which was cited by three different reviews (Table 7) [17,18].
Does the methodological quality of reviews answering a similar research question help us to understand any differences between included studies, results, and conclusions?
We found that the methodological quality of the reviews varied (Table 1). There were three 'systematic reviews' (Table 1) [12,14,15]. Only the two Cochrane reviews [14,15] met our criteria for an adequate search strategy. However, the searches by one other review met all search criteria except reporting that it searched using both free text and thesaurus terms. This can also be thought of as a 'systematic' review [12]. For the three systematic reviews (two of which were 'empty' reviews; that is, they did not contain any included primary studies), it was possible to explain why all non-included primary studies were not included [12,14,15]. However, in the lower quality reviews, it was more difficult to explain reasons for exclusion and almost half the exclusions in one such review could not be explained (n = 12/26 not explained; 46%) [17].
As two of the three 'systematic' reviews were 'empty' , we could not meaningfully compare differences between included studies, results and conclusions in systematic and non-systematic reviews.

Main findings
It was often difficult to identify 'included' studies and much deduction was needed in explaining why some primary studies may not have been included in a specific review.
We found little overlap of included studies within the eight reviews, despite the similarity of the research question. Studies with multiple publications were more likely to be included in reviews than shorter term studies which generated single publications. The results of studies with multiple publications were also more likely to be reported differently by different review authors.
Although search strategies in the majority of cases did not meet our quality threshold, the inclusion criteria of the reviews appeared to justify the lack of inclusion of specific primary studies. Unsurprisingly, it was easier to explain the exclusion of studies in better quality reviews, as they had clearer inclusion criteria and search strategies. Table 5 How the results about physical activity from the seven studies that were included in more than one review were reported in each review Heart Health Study were somewhat more physically active (self-reported) at the end of the follow-up. The increased physical activity was apparently due to an increase in activities with a low intensity' with the intervention communities slightly exceeding the comparison communities at the last follow-up survey' And 'mass media approaches were most successful in heightening physical activity-related awareness and knowledge whereas setting-specific programs strategies that occurred over a period of time, such as those conducted in schools and worksites, were more cost-effective in increasing actual levels of physical activity participation'; Author referenced two publications [26,31] and 'the exposure data suggest that the Minnesota Heart Health Program may not have added a great deal to the level of risk reduction activity that would have been expected without the program'; Author also reported an intervention effect for girls in the school based sub-study For a special school-based element of the study 'Girls in the intervention communities reported significantly greater amounts of exercise than girls in control communities. Boys showed a similar tendency, but the difference [. . .] was smaller." Author referenced one publication [31] Author referenced three publications [29][30][31] Table 5 How the results about physical activity from the seven studies that were included in more than one review were reported in each review (Continued) Stanford 5 city (Table 3 'One project reported a statistically significant treatment 'In the Stanford Five-City Project, the intervention had a positive effect 'Some positive, albeit modest, treatment effects were found after 6 years of intervention in the physical activity area relative to the control communities'; Author referenced two publications [28,38] 'They clearly represent the feasibility of a community based approach to the promotion of healthy eating and physical activity'; Author referenced one publication [35] 'The educational intervention had little, if any, impact on physical activity'; Author referenced one publication [38] Different on physical activity in the independent, cross-sectional samples, but not in the cohort survey'; Author referenced two publications [36,37] Woodman et al. Systematic Reviews 2012, 1:37 'Heart health interventions not likely to produce statistically significant effects on increasing the percentage of the population who are regularly physically active, or in decreasing the percentage of the population who are physically inactive' We do not know if heart health interventions had an impact on specific groups of people. There is some to suggest effectiveness for high-risk populations The community-based heart health literature is methodologically strong, although complex and conflicting Populations might be growing more physically active regardless of community heart health interventions. Lack of statistically significant results might be due partly to design and measurement We should examine the differences in the strategies used by those projects that found a positive effect and those that did not. More research is needed, especially on populations at high risk for low physical activity The results on physical activity were positive in most studies but the effects on body weight were disappointing   Total cited/ total possible King, 1998 This was the first review of the eight to be published so we would not expect it to cite any of the others. 0/0 Reviews of longitudinal and multi-stage interventions were more likely to find larger studies, but less likely to report their findings comprehensively because these are dispersed across many publications, not all of which were necessarily reported.
Discrepancies in findings did not lead to discrepancies in conclusions. This may be because it is particularly challenging to show an impact arising from complex interventions and reviewers tended to be cautious with their interpretations.
There was little cross-citation between reviews and only the lower quality reviews cited other reviews in our analysis.
It was possible to explain why all non-included studies were absent from the systematic reviews, but more difficult to do so for the non-systematic reviews. (Since two out of the three systematic reviews were 'empty' we were unable to compare differences in terms of how reviews of different quality treated their included studies.)

Strengths of this study
There are several strengths of this study. First, our searches were far-reaching and sensitive and our definition of 'community intervention' was broad. Consequently, the eight reviews analyzed here are likely to represent fully the group of reviews available at the time of the searches which aim to evaluate the effectiveness of community interventions for promoting physical activity. Secondly, by excluding reviews which were mainly descriptive, which did not draw conclusions specifically about physical activity or which restricted their population of interest, we ensured that the scope of the reviews was similar enough to warrant a comparison. Thirdly, we assessed the quality of the reviews and were able to comment on the relationship between review quality and our findings. It was necessary to use high levels of researcher judgment at several key stages of analysis: when classifying primary studies as 'included' , when extracting authors' conclusions and when assessing whether exclusion of primary data could be ascertained. We implemented quality assurance measures to minimize the potential for inconsistencies when extracting and analyzing data, especially for the lower-quality reviews which had less defined boundaries.

Weaknesses of this study
Our analyses of reasons for exclusion of primary studies were based on the abstract of the included studies. It is possible that our analyses of the reasons for exclusion would have been different had we used the full text of the included studies and/or had contacted the review authors for data. We assumed that a primary study had been found and excluded by a review if we could justify its exclusion by the inclusion criteria or search/ publication date. We cannot quantify how much primary data were never found by the reviews and cannot, therefore, comment on whether it is the scope of the review or the methods used that led to the lack of inclusion of specific primary studies.
We also acknowledge that since the searches for the original review of reviews were carried out in November/December 2007, other reviews on this topic have been published. These may reflect developments in review method that overcome some of the weaknesses in the reviewed evidence base; however, the general messages contained in this paper about understanding how different reviews on the same subject relate to one another will remain important to understand.

Methodological issues
To some extent, we were surprised by our findings. We had expected to find greater overlap between reviews and, where overlap was limited, diversity in findings. The similarity in findings can be explained by the fact that no reviews found compelling evidence of effectiveness in the studies they included; they were all therefore cautious in their conclusions. This finding echoes the results of a similar study, that, even though the scope and quality assessment methods employed in health promotion reviews differed, this is 'unlikely to divide opinion radically about effectiveness amongst cautious reviewers' [64]. In contrast, two reviews with a similar research question came to very different conclusions about the effectiveness of interventions for childhood obesity [65]. In these reviews, conclusions were based on the results of randomized controlled trials (RCTs) and it may be that reviewers tend to be more cautious, and therefore their conclusions less divergent, when interpreting observational data.
The lack of overlap of primary studies warrants further examination, because it cannot be explained (entirely) in terms of deficiencies in the search strategies of the reviews, but rather seems to be due to differences in the scope (inclusion criteria) of the reviews which relates to heterogeneity in their review questions. This finding is consistent with other methodological studies that found that many apparent inconsistencies in the citation and selection of primary studies, especially non-RCTs, could be attributed to differences in inclusion criteria and outcome assessments of the reviews (rather than being due primarily to problems in their search strategies) [65,66]. Even though we had determined our sample of reviews to be as similar as possible in scope so that we could investigate overlap, in practice, the scope of the reviews did not overlap very much. This has important implications for the utility of reviews to inform policy and practice.
First, in areas where evaluation and impact measurement is known to be difficult and where research and policy interest is relatively recent, it is likely that the findings of reviews will reflect uncertainties in the primary studies and be less enlightening about the substantive topic. Review conclusions can only ever be as good as the available data on the topic [67]; this was certainly the case in the reviews that we examined. Across the topic of community interventions to promote physical activity, reviews were necessarily cautious in their findings because of uncertainties in the evidence base. While this is useful for researchers and research commissioners to know, it is less useful for people involved in determining policy and practice.
Second, dealing with linked publications (multiple publications from the same study) was complicated and confusing, both for ourselves and seemingly for the reviewers of our eight included reviews. To improve fidelity of reporting and ensure that all relevant evidence informs review results and conclusions, it is important to identify all publications from studies with multiple or staged evaluations. We therefore recommend that study authors aid researchers by clearly citing all previous and intended work in each publication and that this is also something that editors check before publication. Larger studies might consider keeping a website for the study which details all related publications (as some already do). Reviewers can search for multiple publications from a study by searching for papers by authors, studies and research groups that feature in the provisional list of included studies for the review. In order to build on existing knowledge, review authors should search for existing relevant reviews in the area and use this knowledge to contextualize their aims and findings. Inclusion (and citation) of relevant reviews will also help direct readers to relevant resources.
The study has also highlighted some of the unavoidable complexities that face potential users of systematic reviews. We placed ourselves in a hypothetical situation, but one that is similar to that faced by many policymakers and practitioners who would like their decisions to be informed by evidence; for example, a newly formed Health and Wellbeing board in the UK, tasked with reducing obesity among young people, might well want to examine what works in terms of promoting physical activity. If they used the map of community interventions and identified these eight reviews as being relevant, they would find that: while all the reviews were about the promotion of physical activity, they each had a particular 'angle' , which determined the range of research they included; where the same studies were included in reviews, their findings were not always reported consistently; the concept of 'community' was often discussed in reviews, but there were also differences in its conceptualization; and on the whole, the reviews did not position themselves as contributing to a wider evidence base around the promotion of physical activity (as evidenced by the lack of inter-citation between them).
There was an inevitable tension in this analysis between a narrowness that ensured that all reviews were on exactly the same topic, and a breadth that ensured all potentially relevant reviews were included; the same tension concerning homogeneity of focus as exists in many systematic reviews in public health. Given that most public health decisions are about identifying solutions to a problem (in this case, increasing levels of physical activity), obtaining a range of reviews is to be expected; and the question that this paper begins to unpick emerges: 'how coherent is the picture that emerges?' Reviews which give a limited 'slice' of the evidence are extremely valuable if the policy/practice question is closely aligned to the scope of the review, but are less useful if they give only a partial picture. In our topic area however, even with the findings of all eight reviews at our disposal, we would not be confident that we were building on the results of all research about community interventions to promote physical activity, because each review contains a limited portion of the evidence and there may well be relevant studies that fall outside the scope of any of our reviews. (We should reiterate the point made above, that systematic review methods are developing quickly, and that some of these 'gaps' may now be filled.) The above points relate to wider and unsolved issues about the amount of 'work done' in a review [68]. Some reviews have a relatively narrow focus, undertaking a detailed look at a relatively small area; there is additional 'work' to be done by users in identifying a range of such reviews and 'synthesizing' them to inform their particular decision. Other reviews are broader in scope which means that, potentially, less 'work' needs to be done by their users, though there is a tension between achieving both breadth and depth in the same review the risk being that broad reviews may suffer from a lack of focus and be deficient in essential detail [16]. While a detailed discussion of these issues is beyond the scope of this paper, we have highlighted areas within which review authors might usefully assist potential users.

Conclusions
One possible future way forward is to undertake more systematic 'maps' of research activity. Systematic maps find and describe the research on a given topic and help researchers and policymakers to judge where there is and is not sufficient data to justify a narrow and indepth review which seeks to answer a specific policy or practice question [32]. It is important, however, that systematic maps are kept updated and that funders allocated resources to this end. To maximize access to the knowledge gathered in systematic maps, they should be made freely available to researchers, funders, and policymakers.
Finally, we recommend for further reading the Guidelines for systematic reviews of health promotion and public health interventions [69] that was written by members of the Cochrane Public Health Review Group. This document discusses many of the issues mentioned above and aims to build reviewing capacity among those working in the difficult areas that create a great deal of the complexity identified in this analysis. Also, for those interested in the substantive topic of the reviews discussed here, we refer readers to a recent Cochrane review on the subject [70].