Skip to main content

Judging the quality of evidence in reviews of prognostic factor research: adapting the GRADE framework



Prognosis research aims to identify factors associated with the course of health conditions. It is often challenging to judge the overall quality of research evidence in systematic reviews about prognosis due to the nature of the primary studies. Standards aimed at improving the quality of primary studies on the prognosis of health conditions have been created, but these standards are often not adequately followed causing confusion about how to judge the evidence.


This article presents a proposed adaptation of Grading of Recommendations Assessment, Development and Evaluation (GRADE), which was developed to rate the quality of evidence in intervention research, to judge the quality of prognostic evidence.


We propose modifications to the GRADE framework for use in prognosis research along with illustrative examples from an ongoing systematic review in the pediatric pain literature. We propose six factors that can decrease the quality of evidence (phase of investigation, study limitations, inconsistency, indirectness, imprecision, publication bias) and two factors that can increase it (moderate or large effect size, exposure-response gradient).


We describe criteria for evaluating the potential impact of each of these factors on the quality of evidence when conducting a review including a narrative synthesis or a meta-analysis. These recommendations require further investigation and testing.

Peer Review reports


Prognosis research examines the progression of a health condition over time in order to identify risk and protective factors that can alter the likelihood of a future event during the course of such a condition [1]. The evidence about the progression of a health condition derived from prognosis research is crucial to make informed decisions about the process to identify individuals who are at risk for poor outcomes, to facilitate early intervention and guide the development of preventive interventions that target modifiable prognostic factors [2]. Synthesizing results from prognosis research for these potential uses is, however, often challenging as primary study results are often inconsistent and difficult to interpret [3]. Individual study differences may be due to, for example, small sample sizes, adjustments for different variables in the analyses, or consideration of different subsets within the same population [47]. Standards have been proposed to guide the design, procedures, analysis and reporting of this research in an attempt to minimize the variability and improve the quality of the primary studies [47]. However, these standards are often not followed causing confusion about the prognostic value of individual factors [8] and limiting research application [9]. Due to the important implications of this research for improving health outcomes, providing guidelines about how to judge the evidence of this widely varied research is necessary. This manuscript presents a system that can help reviewers judge the evidence derived from prognostic factor research [10]. We suggest that Grading of Recommendations Assessment, Development and Evaluation (GRADE) [11] can be adapted for the assessment of evidence derived from prognostic factor research.

GRADE: a framework to guide the judgment about the quality of evidence in a systematic review

GRADE was first developed to provide methodological guidance in reviewing intervention research; specifically, how to rate the quality associated with estimated effects of an intervention on a specific outcome, and how to grade the strength of recommendations regarding the intervention as part of a guideline development process [12]. When making judgments about the quality of evidence, the GRADE approach considers five factors that can decrease our confidence in estimates of effects: (1) study design and limitations in study design, (2) inconsistency of results across studies, (3) indirectness of the evidence, (4) imprecision and (5) publication bias; and three factors that can increase our confidence in estimates of effects from observational studies: (1) large estimates of treatment, (2) a dose–response gradient and (3) plausible confounding that would increase confidence in an estimate. The GRADE framework, widely used by researchers working on reviews and guidelines, and groups providing recommendations for health care professionals such as NICE [13], has also been formally adapted for use in grading the quality of evidence and strength of recommendation for diagnostic research [14]. Recently, Goldsmith and colleagues have also proposed using GRADE as a framework for prognostic studies [15]. They used the framework and definitions of GRADE to rate the quality of evidence for prognostic studies evaluating cold hyperalgesia as a prognostic factor in whiplash-associated disorders. They described the general framework that was followed; however, they failed to address all of GRADE's factors or provide enough information to allow replication of their GRADEing process for prognostic studies. The following sections outline specific aspects of the modified GRADE framework (for example, how risk of bias in primary prognosis studies were assessed), the process used to modify the GRADE framework, and suggestions for how to apply the new adapted framework to systematic reviews where meta-analyses are lacking.

Applying the GRADE framework to prognosis research

Consistent with the GRADE principles, when synthesizing the evidence from prognosis research it is important to estimate the effect of a factor on an outcome and also to report the level of confidence in these findings. Therefore, in a systematic review of prognostic factors it is recommended to assess the quality of evidence for each outcome of interest across studies. Our team has applied the GRADE framework and concepts to assess the quality of a body of evidence from prognosis studies into four quality categories (high, moderate, low, and very low) according to the traditional GRADE framework (Table 1). Along with our descriptions of how the GRADE framework can be adapted and used in this kind of research, we provide examples from a recent systematic review of prognostic studies conducted in the field of recurrent pain in children and adolescents (Table 2) (Huguet A, et al., in preparation) Since there are times when it is not appropriate or possible to conduct meta-analysis of prognostic evidence due to diversity within the studies included in the review or to poor methodological quality, or both, we also describe variations on how to implement the GRADE systematic review framework when conducting a narrative synthesis. We are not presenting a formalized guideline. Our recommendations are based on the discussions between the co-authors and our experience conducting systematic reviews of prognostic research on pain.

Table 1 Definitions of the four quality categories according to the original Grading of Recommendations Assessment, Development and Evaluation (GRADE)[16], applicable to the modified GRADE
Table 2 Summary of a systematic review of prognostic studies in the field of recurrent pain in children and adolescents

GRADE framework for prognosis

We think most of the factors taken into consideration by the GRADE framework to rate the quality of evidence from intervention research are conceptually applicable when applying the GRADE to judge the quality of evidence from prognostic research. However, our proposed assessment approach to determine how much each of these factors influence the quality of evidence is different from the approach originally suggested for intervention research. Table 3 compares the factors that may lead to rating down or up the quality of evidence from intervention research with the factors that may lead to rating down or up the quality of evidence from prognosis research. Notable differences when rating the quality of evidence from prognostic research include the following. (1) When judging the quality of prognostic evidence the study design is not considered since longitudinal research designs are the only acceptable ones that provide prognostic evidence (design characteristics are considered in the risk of bias assessment). (2) Using the GRADE framework to evaluate the quality of evidence from prognosis research should begin with the phase of investigation. (3) Plausible confounding does not need to be considered as an additional factor to rate up the quality of evidence. Two main reasons lead us to this decision. First, the potential effects of confounding in both intervention and prognosis research are not interpreted in the same direction. In intervention research, the assumption is that lack of control for confounding can inflate the reported effect sizes, so that the intervention appears to be more effective than it actually is when evidence is from studies that do not adequately control for confounding. This assumption is not made in prognostic research. It is often unclear how confounders alter effect sizes in prognostic studies. Second, plausible confounding is indirectly considered when assessing risk of bias. When assessing risk of bias, we are not evaluating how this will influence the strength of the effect; rather, we are evaluating the internal validity of the studies. As we describe below in more detail, when considering the potential impact of study limitations on the quality of evidence from prognostic research, the reviewers should consider downgrading the quality of evidence for methodological limitations when analyses are not adequately adjusted for confounders (which may be responsible for spurious or attenuated relations between the factor and the outcome).

Table 3 Factors that may increase and decrease the quality level of evidence

Next we describe the assessment approach for considering the potential effect of each of these factors on the quality of evidence.

Phase of investigation

When evaluating the overall quality of evidence, we suggest that researchers consider the phase of investigation. A high level of evidence for prognosis is derived from a cohort study design that seeks to generate understanding of the underlying processes for the prognosis of a health condition, called a phase 3 explanatory study, or a cohort study design that seeks to confirm independent associations between the prognostic factor and the outcome, called phase 2 explanatory studies [4, 7]. Prospective or retrospective cohort studies that test a fully developed hypothesis and conceptual framework without serious study limitations, and confirmatory studies without serious limitations constitute high-quality evidence on prognosis. These studies should be prioritized as primary studies. In emerging areas of research, there may be a lack of available primary studies that meet this criterion. In this instance, predictive modeling studies or explanatory studies conducted in the earlier phase of investigation to generate a hypothesis (phase 1 explanatory studies) may be included. These studies should be judged as providing weaker evidence. Therefore, we propose that the starting point for the quality level of the evidence should be based on phase of investigation (see Table 4). Table 5 illustrates an example of the effect of phase of investigation as a starting point for judging the quality of evidence.

Table 4 Guide to judge the quality of evidence for prognosis
Table 5 The effect of phase of investigation when judging quality of evidence

Following phase of investigation considerations, the evidence can be upgraded or downgraded according to the following additional criteria.

Reasons for downgrading the quality of evidence

Study limitations

The findings derived from individual prognostic studies are often limited for their methodological shortcomings. There are several tools available for assessing methodological limitations [1820]. We recommend using the Quality in Prognosis Studies (QUIPS) tool [19, 20], which rates individual studies according to the potential risk of bias associated with six domains: (1) study participation, (2) study attrition, (3) prognostic factor measurement, (4) outcome measurement, (5) confounding measurement and account, and (6) analysis. This tool, designed for use in prognostic factor studies to comprehensively assess risk of bias based on epidemiological principles, has demonstrated acceptable reliability [20]. The level of risk of bias associated with each domain can be rated as 'low’, 'moderate’ and 'high’ based on the responses that reviewers give to each item.

We suggest that, when assessing the risk of bias of a prognostic factor across studies for a specific outcome, reviewers should rate the evidence as having: (1) no serious limitations when most evidence is from studies at low risk of bias for most of the bias domains; (2) serious limitations when most evidence is from studies at moderate or unclear risk of bias for most of the bias domains; or (3) very serious limitations when most information is from studies at high risk of bias with respect to almost all of the domains.

Table 6 illustrates an example. As is similar to what happens when GRADE is used for intervention research, the study limitations are outcome and factor specific [21]. Consequently, one study may be associated with higher risk of bias when referring to one outcome or to one prognostic factor than one referring to another outcome or factor of interest. We advise conducting subgroup analyses to explore the impact of studies with high risk of bias on specific domains and from the overall review (that is, serious or very serious limitations). Sensitivity analyses may be conducted to restrict the synthesis to studies with lower risk of bias. These steps will inform how the study limitations influence the size of the effect. If studies with 'serious limitations’ or 'very serious limitations’ are included in the body of evidence to be evaluated, specific justification in a footnote for the relevant tables is suggested.

Table 6 The effect of considering study limitations when judging quality of evidence


Inconsistency occurs when there is unexplained heterogeneity or variability in results across studies. When this happens, the quality of evidence decreases. Different approaches to assess inconsistency can be applied if the systematic review incorporates narrative synthesis or meta-analysis.

To evaluate whether inconsistency exists, we recommend that reviewers employing meta-analysis base their decisions on their judgment of whether a clinically meaningful difference exists between the point estimates and their confidence intervals of primary studies in the review context, as well as statistical parameters derived from their meta-analyses. Reviewers may consider downgrading the quality of evidence for inconsistency when they observe the following statistical parameters as long as these differences are clinically meaningful: (1) estimates of the effect of the prognostic factor on the outcome vary across studies with the points of effect on either side of the line of no effect and their confidence intervals show minimal or no overlap; (2) the statistical test for heterogeneity, which tests the null hypothesis that all studies in a meta-analysis have the same underlying magnitude effect, shows a low P value; and (3) the I2, which quantifies the proportion of the variation in point estimates due to true study variations in effect size from one study to the next, is substantial (that is, 50% or greater [25]).

Potential inconsistency related to differences in the magnitude of effects, as described above, should ideally be explored with a priori defined subgroup analyses. Differences in the population, the duration of follow-up, the outcome, the prognostic factor, or study methods across studies may explain differences. In this case, we propose to estimate separate effects accordingly. If, after running separate subgroup meta-analyses, this hypothesis is supported by the data, we suggest that reviewers consider presenting separate pooled estimates instead of estimating an overall combined effect.

If a meta-analysis is not conducted, we recommend that reviewers consider downgrading for inconsistency when estimates of the prognostic factor association with the outcome vary in direction (for example, some effects appear protective whereas others show risk) and the confidence intervals show no, or minimal overlap. See an example in Table 7.

Table 7 The effect of considering inconsistency when judging quality of evidence

Inconsistency cannot be assessed when only a single study within the existing body of literature has estimated the effect. In these cases, this criterion may be considered as 'not applicable’. However, we still recommend the reviewers to downgrade the quality of evidence since this is an indicator that the literature is not well established in the area. If observed inconsistency is unexplained, reviewers should decide whether the inconsistency is serious or very serious and justify why they have made this decision with a footnote.


Indirectness exists when the participant population, prognostic factor(s) and/or outcomes considered by researchers in the primary studies do not fully represent the review question defined in the systematic review. The judged quality of evidence decreases because the results derived from the primary studies are less generalizable for the purpose of the systematic review.

Regardless of whether reviewers perform a meta-analysis or not, downgrading the quality of evidence for indirectness is appropriate when: (1) the final sample only represents a subset of the population of interest (an example of indirectness in population is displayed in Table 8); (2) when the complete breadth of the prognostic factor that is being considered in the review question is not well represented in the available studies (an example is illustrated in Table 8); or (3) when the outcome that is being considered in the review question is not broadly represented (an example is displayed in Table 8).

Table 8 The effect of considering indirectness when judging quality of evidence

Downgrading the quality of the evidence with respect to indirectness depends on how extreme the differences are and how much these differences can influence the magnitude of effect. Reviewers should make this judgment based on the purpose of their review.


Random error or imprecision exists when the evidence is uncertain, leading to different interpretations about the relationship between the prognostic factor and its associated risk or protective value.

To judge whether the results of meta-analysis have sufficient precision a reviewer should first consider whether the number of participants included in the meta-analysis is appropriate through sample size estimation (similar to sample size estimates for a single study, but accounting for between-study heterogeneity; see Bull [33] for a discussion of the need of an adequate sample size in a meta-analysis). Second, if the number of participants included in the meta-analysis is appropriate, based on their best judgment, reviewers should consider the results precise when the confidence interval around the estimated effect size is not excessively wide while including values implying that the prognostic factor is associated with protection or increased risk.

Evaluating the imprecision when a meta-analysis is not possible is challenging. The best approach is for reviewers to judge the overall precision based on precision of results within each study while taking into account the number of studies and participants involved. If the majority of studies included in the review are precise, regardless of the number of studies and sample size, reviewers should not downgrade the quality of evidence for imprecision. To estimate whether the primary studies included in the review are unpowered or not, reviewers should take into account the sample size that each of the primary studies used. To do this, they should explore whether the authors of the primary studies provided any rationale for the sample size. Authors of prognostic studies should estimate an effect size and select the desired power beforehand, and then calculate the sample size needed in order to achieve that power and to detect the specified effect size [5, 33]. If authors do not provide such rationale, reviewers can consider the sample size appropriate for studies using dichotomous outcomes using the 'rule of thumb’, when there are at least 10 outcome events for each potential prognostic variable considered in the analysis [4, 34]. If insufficient information is available to determine appropriateness of sample size, or studies use continuous outcomes, the sample size can be considered appropriate when there are at least 100 cases that reached the endpoint [35]. If the sample sizes are large enough, the reviewers should then evaluate the width of the interval. If most of the confidence intervals reported in each study include both no effect and appreciable risk and protective values, the evidence derived from that particular study is imprecise. At that stage, to evaluate the overall imprecision for the explored prognostic factor association, we recommend reviewers to also consider the number of studies, and number of participants across studies, because there is likely to be more imprecision with a fewer number of studies and/or participants. Consequently, we recommend downgrading the quality of evidence if the evidence is generated by a few studies involving a small number of participants and most of the studies provide imprecise results. See an example in Table 9.

Table 9 The effect of considering imprecision when judging quality of evidence

Publication bias

This is a very important factor in prognostic study evidence because investigators often fail to report relationships that show no effect between potential prognostic factors and outcomes. This happens when published evidence is restricted to only a portion of the studies conducted on the topic [36]. The current lack of an existing register for prognostic research studies prevents reviewers from making an informed judgment about whether there is evidence that publication bias is a potential problem. Consequently, a prudent default position at this moment is to assume that prognosis research is seriously affected by publication bias until there is evidence to the contrary [6]. Reviewers should consider that publication bias exists across all factors except in those cases in which they find that a determinate prognostic factor has been investigated in a large number of cohort studies. Ideally, most of these large numbers of studies should have been designed to purposefully confirm the hypothesized independent effect of the factor on the outcome (phase 2 study) or to test a conceptual model which explains its underlying mechanisms (phase 3 study). However, since phase of investigation is already taken into account as a factor that can downgrade the overall quality of evidence, we do not recommend downgrading again for publication bias due to only phase of investigation. For these cases, reviewers conducting a systematic review with or without meta-analysis may judge that there is less likely to be publication bias (see Table 10).

Table 10 The effect of considering publication bias when judging quality of evidence

Reasons for rating up the quality of evidence

Moderate or large effect size

Multiple prognostic factors often contribute to the prognosis of health conditions. Therefore, finding a moderate or large effect size is one of the key criteria for rating up the quality of evidence. A moderate or large effect size increases the likelihood that a relationship between the prognostic factor and the outcome does in fact exist.

Reviewers should rate up the quality of evidence when they find a moderate or large pooled effect of the meta-analysis. 'Rules of thumb’ have been proposed to judge the effect moderate or large (for example, standardized mean difference statistic = around 0.5 for moderate effect or around 0.8 or larger for large effect [40], odds ratio = around 2.5 for moderate effect or 4.25 or greater for large effect [41, 42]). However, because these are arbitrary guidelines, a sensible decision should be made and justified by the reviewers taking into account the study context (for example, background risk and unit of measurement).

If it is not possible to conduct a meta-analysis, reviewers should rate up the quality of evidence when they find moderate or large similar effects reported by most of the primary studies (see an example in Table 11).

Table 11 The effect of considering moderate or large effect size when judging quality of evidence

Exposure-response gradient

An exposure-response gradient exists when elevated levels of the prognostic factor (for example, larger amount, longer duration, higher intensity) lead to a larger effect size over lower levels of the factor. The presence of such a gradient increases confidence in the findings that the factor is associated with an increased risk or protective value and therefore raises our rating of the quality of evidence.

When conducting systematic reviews with meta-analysis, the reviewers should observe whether an exposure-response gradient is present between subgroup analyses for factors measured at different doses.

If a meta-analysis or subgroup analyses are not conducted, or only one meta-analysis is conducted for the relationship between a prognostic factor and outcome, reviewers should observe whether a possible exposure-response gradient consistently exists within and between primary studies. The use of the same measures to evaluate prognostic factors and outcomes across studies is a required condition to appropriately evaluate the possible existence of this gradient between studies. Table 12 displays an example.

Table 12 The effect of considering exposure gradient when judging quality of evidence

The findings derived from the GRADE framework to judge the evidence derived from prognostic studies can be presented in the proposed summary of findings tables (see Tables 13 and 14).

Table 13 Example of an adapted Grading of Recommendations Assessment, Development and Evaluation (GRADE) table for systematic reviews with meta-analysis of prognostic studies
Table 14 Example of an adapted Grading of Recommendations Assessment, Development and Evaluation (GRADE) table for narrative systematic reviews of prognostic studies (filled in with examples of our own review illustrated in the boxes throughout this manuscript)


This article is a first attempt to outline how GRADE may be adapted to assess the quality of evidence for prognostic research studies for a systematic review of the literature. To date, a formal system to guide researchers in assessing the evidence for prognostic studies has been lacking. Our adaptation is a first step in developing a systematic approach to evaluate the quality of evidence for prognostic research. We encourage these recommendations to be further developed and tested for GRADEing the quality of evidence when synthesizing findings from prognostic research studies. For instance, further guidance for reviewers is needed to decide when each of these GRADE factors causes the evidence to be down- or upgraded one versus two levels. At this stage, we leave this decision up to the judgment of the review teams to decide how much these factors impact the overall quality of evidence. Empirical research is also needed to explore our hypothesis that risk of bias detrimentally affects the factor-outcome relationship and what the strength of this relationship is.



Grading of recommendations assessment, development and evaluation.


  1. Moons KGM, Royston P, Vergouwe Y, Grobbee DE, Altman DG: Prognosis and prognostic research: what, why, and how?. BMJ. 2009, 338: 1317-1320. 10.1136/bmj.b1317.

    Article  Google Scholar 

  2. Croft PR, Dunn KM, Raspe H: Course and prognosis of back pain in primary care: the epidemiological perspective. Pain. 2006, 122: 1-3. 10.1016/j.pain.2006.01.023.

    Article  PubMed  Google Scholar 

  3. Hemingway H: Prognosis research: why is Dr. Lydgate still waiting?. J Clin Epidemiol. 2006, 59: 1229-1238. 10.1016/j.jclinepi.2006.02.005.

    Article  PubMed  Google Scholar 

  4. Altman DG, Lyman GH: Methodological challenges in the evaluation of prognostic factors in breast cancer. Breast Cancer Res Treat. 1998, 52: 289-303. 10.1023/A:1006193704132.

    Article  CAS  PubMed  Google Scholar 

  5. Simon R, Altman DG: Statistical aspects of prognostic factor studies in oncology. Br J Cancer. 1994, 69: 979-985. 10.1038/bjc.1994.192.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hemingway H, Riley RD, Altman DG: Ten steps towards improving prognosis research. BMJ. 2009, 339: b4184-10.1136/bmj.b4184.

    Article  PubMed  Google Scholar 

  7. Hayden JA, Côté P, Steenstra IA, Bombardier C, QUIPS-LBP Working Group: Identifying phases of investigation helps planning, appraising, and applying the results of explanatory prognosis studies. J Clin Epidemiol. 2008, 61: 552-560. 10.1016/j.jclinepi.2007.08.005.

    Article  CAS  PubMed  Google Scholar 

  8. Sauerbrei W: Prognostic factors - confusion caused by bad quality of design, analysis and reporting of many studies. Adv Otorhinolaryngol. 2005, 62: 184-200.

    PubMed  Google Scholar 

  9. Riley RD, Hayden JA, Steyerberg EW, Moons KGM, Abrams K, Kyzas PA, Malats N, Briggs A, Schroter A, Altman DG, Hemingway H, PROGRESS Group: Prognosis Research Strategy (PROGRESS) 2: prognostic factor research. PLOS Med. 2013, 10: e1001380-10.1371/journal.pmed.1001380.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Hemingway H, Croft P, Perel P, Hayden JA, Abrams K, Timmis A, Briggs A, Udumyan R, Moons KG, Steyerberg EW, Roberts I, Schroter S, Altman DG, Riley RD, PROGRESS Group: Prognosis research strategy (PROGRESS) 1: a framework for researching clinical outcomes. BMJ. 2013, 346: e5596-

    Article  Google Scholar 

  11. Guyatt GH, Oxman AD, Schunemann HJ, Tugwell P, Knotternerus A: GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology. J Clin Epidemiol. 2011, 64: 380-382. 10.1016/j.jclinepi.2010.09.011.

    Article  PubMed  Google Scholar 

  12. Guyatt G, Oxman AD, Akl E, Kunz R, Vist G, Brozek J, Norris S, Falck-Ytter Y, Glasziou P, Debeer H, Jaeschke R, Rind D, Meerpohl J, Dahm P, Schünemann HJ: GRADE guidelines 1. Introduction - GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011, 64: 383-394. 10.1016/j.jclinepi.2010.04.026.

    Article  PubMed  Google Scholar 

  13. Thornton J, Alderson P, Tan T, Turner C, Latchem S, Shaw E, Ruiz F, Reken S, Mugglestone MA, Hill J, Neilson J, Westby M, Francis K, Whittington C, Siddiqui F, Sharma T, Kelly V, Ayiku L, Chamberlain K: Introducing GRADE across the NICE clinical guideline program. J Clin Epidemiol. 2013, 66: 124-131. 10.1016/j.jclinepi.2011.12.007.

    Article  PubMed  Google Scholar 

  14. Schünemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, Williams JW, Kunz R, Craig J, Montori VM, Bossuyt P, Guyatt GH, GRADE Working Group: Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ. 2008, 336: 1106-1110. 10.1136/bmj.39500.677199.AE.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Goldsmith R, Wright C, Bell SF, Rushton A: Cold hyperalgesia as a prognostic factor in whiplash associated disorders: a systematic review. Man Ther. 2012, 15: 402-410.

    Article  Google Scholar 

  16. Balshem H, Helfand M, Schunemann HJ, Oxman AD, Kunz R, Brozek J, Vist GE, Falck-Yttr Y, Meerpohl J, Norris S, Guyatt GH: GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011, 64: 401-406. 10.1016/j.jclinepi.2010.07.015.

    Article  PubMed  Google Scholar 

  17. Huguet A, McGrath PJ, Stinson J, Chambers CT, Miró J: Shaping the future of research on chronic pain in children. Pediatric Pain Letter. 2011, 13: 7-12.

    Google Scholar 

  18. Altman DG: Systematic reviews of evaluations of prognostic variables. BMJ. 2001, 323: 224-228. 10.1136/bmj.323.7306.224.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hayden JA, Côté P, Bombardier C: Evaluation of the quality of prognosis studies in systematic reviews. Ann Intern Med. 2006, 144: 427-437. 10.7326/0003-4819-144-6-200603210-00010.

    Article  PubMed  Google Scholar 

  20. Hayden JA, van der-Windt DA, Cartwright JL, Côté P, Bombardier C: Assessing bias in studies of prognostic factors. Ann Intern Med. 2013, 158: 280-286. 10.7326/0003-4819-158-4-201302190-00009.

    Article  PubMed  Google Scholar 

  21. Guyatt G, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, Montori V, Akl EA, Djulbegovic B, Falck-Yttr Y, Norris SL, Williams JW, Atkins D, Meerpohl J, Schunemann HJ: GRADE guidelines: 4. Rating the quality of evidence-study limitations (risk of bias). J Clin Epidemiol. 2011, 64: 407-415. 10.1016/j.jclinepi.2010.07.017.

    Article  PubMed  Google Scholar 

  22. Wang SJ, Fuh JL, Juang KD, Lu SR, Hsu LC, Chen WT, Pwu RF: Evolution of migraine diagnoses in adolescents: a 3-year annual survey. Cephalalgia. 2005, 25: 333-338. 10.1111/j.1468-2982.2004.00859.x.

    Article  CAS  PubMed  Google Scholar 

  23. Siniatchkin M, Jonas A, Baki H, Van-Baalen A, Gerber WD, Stephani U: Developmental changes of the contingent negative variation in migraine and healthy children. J Headache Pain. 2010, 11: 105-113. 10.1007/s10194-009-0180-9.

    Article  PubMed  Google Scholar 

  24. Brna P, Dooley J, Gordon K, Dewan T: The prognosis of childhood headache: a 20-year follow-up. Arch Pediatr Adolesc Med. 2005, 159: 1157-1160. 10.1001/archpedi.159.12.1157.

    Article  PubMed  Google Scholar 

  25. Higgins JP, Thompson SG, Deeks JJ, Altman DG: Measuring inconsistency in meta-analyses. BMJ. 2003, 327: 557-560. 10.1136/bmj.327.7414.557.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Kienbacher C, Wober C, Zesch HE, Hafferl-Gattermayer A, Posch M, Karwautz A, Zormann A, Berger G, Zebenholzer K, Konrad A, Wober-Bingol C: Clinical features, classification and prognosis of migraine and tension-type headache in children and adolescents: a long-term follow-up study. Cephalalgia. 2006, 26: 820-830. 10.1111/j.1468-2982.2006.01108.x.

    Article  CAS  PubMed  Google Scholar 

  27. Larsson B, Sund AM: One-year incidence, course, and outcome predictors of frequent headaches among early adolescents. Headache. 2005, 45: 684-691. 10.1111/j.1526-4610.2005.05137a.x.

    Article  PubMed  Google Scholar 

  28. Wang SJ, Fuh JL, Lu SR: Chronic daily headache in adolescents: an 8-year follow-up study. Neurology. 2009, 73: 416-422. 10.1212/WNL.0b013e3181ae2377.

    Article  PubMed  Google Scholar 

  29. Stovner LJ, Hagen K, Jensen R, Katsarava Z, Lipton RB, Scher AI, Steiner TJ, Zwart J: The global burden of headache: a documentation of headache prevalence and disability worldwide. Cephalalgia. 2007, 27: 193-210. 10.1111/j.1468-2982.2007.01288.x.

    Article  PubMed  Google Scholar 

  30. Guidetti V, Galli F: Evolution of headache in childhood and adolescence: an 8-year follow-up. Cephalalgia. 1998, 18: 449-454. 10.1046/j.1468-2982.1998.1807449.x.

    Article  CAS  PubMed  Google Scholar 

  31. Battistella PA, Fiumana E, Binelli M, Bertossi E, Battista P, Perakis E, Soriani S: Primary headaches in preschool age children: clinical study and follow-up in 163 patients. Cephalalgia. 2006, 26: 162-171. 10.1111/j.1468-2982.2005.01008.x.

    Article  CAS  PubMed  Google Scholar 

  32. Kroner-Herwig B, Heinrich M, Morris L: Headache in German children and adolescents: a population-based epidemiological study. Cephalalgia. 2007, 27: 519-527. 10.1111/j.1468-2982.2007.01319.x.

    Article  CAS  PubMed  Google Scholar 

  33. Bull SB: Sample size and power determination for a binary outcome and an ordinal exposure when logistic regression analysis is planned. Am J Epidemiol. 1993, 137: 676-684.

    CAS  PubMed  Google Scholar 

  34. Harrell FE, Lee KL, Matchar DB, Reichert TA: Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treat Rep. 1985, 69: 1071-1077.

    PubMed  Google Scholar 

  35. Centre for Reviews and Dissemination: Systematic reviews of clinical tests. Systematic Reviews. CRD’s Guidance for Undertaking Reviews in Health Care. 2008, York: CRD, Centre for Reviews and Dissemination, 109-156.

    Google Scholar 

  36. Dickersin K: The existence of publication bias and risk factors for its occurrence. JAMA. 1990, 263: 1385-1389. 10.1001/jama.1990.03440100097014.

    Article  CAS  PubMed  Google Scholar 

  37. Monastero R, Camarda C, Pipia C, Camarda R: Prognosis of migraine headaches in adolescents: a 10-year follow-up study. Neurology. 2006, 67: 1353-1356. 10.1212/01.wnl.0000240131.69632.4f.

    Article  PubMed  Google Scholar 

  38. Stanford EA, Chambers CT, Biesanz JC, Chen E: The frequency, trajectories and predictors of adolescent recurrent pain: a population-based approach. Pain. 2008, 138: 11-21. 10.1016/j.pain.2007.10.032.

    Article  PubMed  Google Scholar 

  39. Termine C, Ferri M, Livetti G, Beghi E, Salini S, Mongelli A, Blanglardo R, Luoni C, Lanzi G, Balottin U: Migraine with aura with onset in childhood and adolescence: long-term natural history and prognostic factors. Cephalalgia. 2010, 30: 674-681. 10.1177/0333102409351803.

    Article  CAS  PubMed  Google Scholar 

  40. Cohen J: Statistical Power Analysis for the Behavioral Sciences. 1977, New York: Academic

    Google Scholar 

  41. Lipsey MW, Wilson D: Practical Meta-Analysis. 2001, California: SAGE Publications, Inc

    Google Scholar 

  42. Chinn S: A simple method for converting an odds ratio to effect size for use in meta-analysis. Stat Med. 2000, 19: 3127-3131. 10.1002/1097-0258(20001130)19:22<3127::AID-SIM784>3.0.CO;2-M.

    Article  CAS  PubMed  Google Scholar 

  43. Ozge A, Sasmaz T, Cakmak SE, Kaleagasi H, Siva A: Epidemiological-based childhood headache natural history study: after an interval of six years. Cephalalgia. 2010, 30: 703-712. 10.1177/0333102409351797.

    Article  PubMed  Google Scholar 

  44. Anda R, Tietjen G, Schulman E, Felitti V, Croft J: Adverse childhood experiences and frequent headaches in adults. Headache. 2010, 50: 1473-1481. 10.1111/j.1526-4610.2010.01756.x.

    Article  PubMed  Google Scholar 

Download references


This research is funded by the Canadian Institutes of Health Research (grant #226950). PJM’s research is supported by a Canada Research Chair.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Anna Huguet.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

AH is the lead researcher of this project and wrote the first draft of the manuscript. AH, JAH, and MET formulated an initial proposal for the adaptation of the GRADE framework to prognosis research. JS, PJM, and CTC helped to reformulate this first proposal. AH, JAH, JS, PJM, CTC, MET and LW contributed significantly to the methodology, writing and revision of the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Huguet, A., Hayden, J.A., Stinson, J. et al. Judging the quality of evidence in reviews of prognostic factor research: adapting the GRADE framework. Syst Rev 2, 71 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: