General adult population
Search results
The bibliographic search strategies yielded 9165 records. An additional 49 records were found through scanning systematic review bibliographies, full-text publication of protocols from published protocols and clinical registries, and gray literature searching. After de-duplication, 7638 records remained and were screened based on the title and abstract. A total of 253 records were assessed at full text, with three studies meeting the inclusion criteria (Fig. 3) [46,47,48]. One of these studies [48] was identified and included as a result of the re-examination of the excluded studies from the previous review that we updated for this project and suggested by a clinical expert as a missing study that had been included in other systematic reviews and guidelines [2]. This previously excluded study had fulfilled the previous review eligibility criteria and should have been included in the previous review based on the inclusion and exclusion criteria. This study was also included in a related pregnancy and postpartum review. Additional file 1 provides the bibliographic listing of those excluded during full-text assessment, sorted by reason. A list of ongoing trials is provided in Additional file 1.
Study characteristics
Additional file 1 provides study characteristics of the three included studies that assessed screening for depression in the general adult population. Included studies were RCTs conducted in a single health center in the USA [46], general practices in the UK [47], and in Maternal and Child Health (MCH) Centers in Hong Kong, China [48]. The respective populations included participants aged 21 years or older with documented acute coronary syndrome within 2 to 12 months of enrollment [46], participants 45 years or older who consulted for osteoarthritis symptoms in primary care [47], and mothers with 2-month-old babies visiting MCH Centers [48]. The UK study differed from the other two in that it was a pragmatic cluster randomized trial where the general clinical practices were the units of randomization [47]. All studies excluded participants who either had a prior history of depression, who were receiving treatment for depression, or who were participating in other screening programs. Screening interventions and comparators differed between the three studies.
In the US trial, Kronish et al. [46] evaluated systematic screening for depression using the 8-item Patient Health Questionnaire (PHQ-8) compared with usual care; however, baseline data for the screening arm was not recorded. In one intervention arm, 501 participants were screened with notification of primary care clinicians for those with a positive screening result (screen and notify group). In another arm, 499 participants were screened with a primary care clinician notified of any clinically significant depressive symptoms (PHQ-8 score ≥10) and provision of care followed for those with a positive screening result (screen, notify, and treat group); however, this group was not included in the synthesis of this review because the treatment intervention met the predefined exclusion criteria. In the control arm, 500 participants received usual care from their treating clinician and were able to seek mental health screening and/or depression treatment at their own expense (no screening group). The study’s primary outcome of interest was change in quality-adjusted life-years (scores derived from 12-Item Short Form Health Survey (version 2) [SF-12] responses), while the secondary outcome was depression-free days (based on the 10-Item Center for Epidemiologic Studies Depression Scale [CESD-10]). Other reported outcomes included depressive symptoms (measured by the CESD-10 and PHQ-8), harms of depression screening (i.e., loss in appetite, sleep problems, gastrointestinal upset, and bleeding), and mortality (not an outcome of interest for this review). Outcomes were measured at 6, 12, and 18 months.
In the UK trial, Mallen et al. [47] evaluated an electronic template to prompt routine screening for anxiety and depression compared with usual care. In the intervention arm, the electronic template prompted the general practitioner (GP) to ask two questions about depression (Patient Health Questionnaire-2 [PHQ-2]), two questions about anxiety (Generalized Anxiety Disorder-2 [GAD-2]), and one question about pain intensity. With the two PHQ-2 items, the authors utilized a dichotomous yes/no response rather than the standard PHQ-2 scoring. A positive response to either question was deemed a positive screen. In the control arm, the electronic template prompted the GP only to ask the question about pain intensity. In both arms, no additional treatment resources or services for depression, anxiety, or pain management were provided as part of the study. A total of 2042 respondents consented to further contact and were sent post-consultation questionnaires, of whom 1412 returned the questionnaire and were included in the analysis. Study authors reported that participants had broadly similar characteristics at baseline. Participants returned the post-consultation questionnaire on average 24 days after initial consultation in the intervention group (range 9–149 days) and 22 days in the control group (range 3–106 days). In addition, they were sent questionnaires to determine outcomes at 3, 6, and 12 months after the initial appointment with the GP. The authors reported the total number of questionnaires contributing to each outcome, but not specifically for each time point. As the primary outcome of the trial was pain intensity, there is no information provided about post-screening treatment for depression. All adjusted effect estimates and 95% CIs were reported. Analyses were adjusted using general practice and repeated measures as cluster-level random effects and fixed-effect covariates at practice level and patient level (age, sex, and time between consultation and post-consultation response) (Additional file 1). The standardized mean difference (SMD) has been calculated, but it is based on the raw results and does not account for clustering as there was insufficient information to be able to calculate adjusted SMDs.
In the Hong Kong trial, Leung et al. [48] evaluated screening for postnatal depression. Two hundred thirty-one participants in the intervention group were screened for postnatal depression using the Edinburgh Postnatal Depression Scale (EPDS), while 231 participants in the control group received usual care by clinical assessment. The EPDS consisted of 10 questions with scores ranging from 0 to 30, and participants with score above the cut-off (9/10) or suicidal ideation (positive answer to question 10) were offered non-directive counseling by Maternal and Child Health nurses or management by the community psychiatric team as appropriate. The Chinese version of the EPDS was validated with Hong Kong women at 6 weeks postnatal, against the structured clinical interview for DSM-III-R. The outcome of interest was maternal mental health as measured by depression scores calculated from the EPDS measured at 2, 6, and 18 months postpartum.
Outcomes
Results for all three studies are presented in Additional file 1. Ratings of risk of bias by study are included in Additional file 1.
The GRADE Evidence Profiles and Summary of Findings Tables including explanations for all rating are available in Additional file 1. We did not pool results for any of the outcomes due to substantial differences between study populations, approaches to screening for depression, time points, and high risk of bias.
Benefits of screening
Symptoms of depression
Kronish et al. [46] measured symptoms of depression using the CESD-10 and PHQ-8, change in depressive symptoms using CESD-10 scores, and depression-free days converted from CESD-10 scores. We rated the certainty in the evidence as moderate for these outcomes, except for change in depressive symptoms and depression-free days among women, owing to serious concerns of indirectness as the study only included adults who were recently documented with acute coronary syndrome which is not representative of the wider general adult population. Regarding change in depressive symptoms and depression-free days among women, we rated the certainty in the evidence as low due to serious concerns of indirectness as described above and serious concerns of imprecision as the sample size was low (see Additional file 1).
Depression score (CESD-10)
Screening for depression likely results in little to no difference in symptoms of depression at any time point (i.e., baseline, 6, 12, and 18 months) (SMD of 0.06 lower [from 0.18 lower to 0.07 higher] at 18 months)—moderate certainty: serious indirectness. Similarly, screening likely results in little to no difference on the changes in depressive symptoms among men (SMD 0.09 lower [0.24 lower to 0.06 higher]) and women (SMD 0.09 lower [0.32 lower to 0.15 higher])—moderate certainty: serious indirectness and low certainty: serious indirectness, serious imprecision.
Depression score (PHQ-8)
At 18 months, screening for depression likely results in little to no difference in symptoms of depression (SMD of 0.02 lower [from 0.15 lower to 0.10 higher])—moderate certainty: serious indirectness. Screening also likely results in little to no difference in symptoms of depression among men (SMD 0.07 lower [0.22 lower to 0.07 higher]) and women (SMD 0.02 higher [0.21 lower to 0.26 higher]) —moderate certainty: serious indirectness and low certainty: serious indirectness, serious imprecision. Note, baseline data for the screening arm was not measured by study authors.
Depression-free days (CESD-10 score converted to depression day)
Screening for depression likely results in little to no difference in depression-free days (SMD of 0.07 higher [from 0.05 lower to 0.19 higher] at 18 months)—moderate certainty of the evidence: serious indirectness. Similarly, screening likely results in little to no difference in depression-free days among men (SMD 0.08 higher [0.07 lower to 0.23 higher]) and women (SMD 0.06 higher [0.17 lower to 0.30 higher])—moderate certainty: serious indirectness and low certainty: serious indirectness, serious imprecision.
Mallen et al. [47] measured symptoms of depression using the PHQ-8. We rated the certainty in the evidence as very low owing to very serious concerns with the risk of bias because a large proportion of participants were lost to follow-up from those who were screened and indirectness as the study included adults seeking consultation for osteoarthritis and receiving screening for anxiety (see Additional file 1).
Depression score (PHQ-8)
The evidence is very uncertain about the effect of screening for depression on symptoms of depression at any time point (i.e., post-consultation, 3, 6, and 12 months post-consultation) (SMD 0.10 higher [0.03 lower to 0.23 higher] at 12 months)—very low certainty: very serious RoB, very serious indirectness.
Leung et al. [48] measured symptoms of depression with the EPDS and GHQ-12. We rated the certainty in the evidence as very low for these outcomes owing to very serious concerns of risk of bias (lack of blinding and selective outcome reporting), serious concerns of indirectness (limited to postpartum women), and serious concerns of imprecision (small sample size) (see Additional file 1). These concerns are discussed and further explained in the “Discussion” section.
Number identified as depressed among women (EPDS score)
The evidence is very uncertain about the effects of screening for symptoms of depression using EPDS in postpartum women—very low certainty: very serious RoB, serious indirectness, serious imprecision.
At baseline (2 months postpartum), 73 (36.1%) women in the screening arm and 14 (6.0%) women in the no screening arm were assessed as having probable postpartum depression; 58 women in the screening arm scored ≥10 on the EPDS and nine women scored <10 on the EPDS but had a positive response on the suicidal ideation question. Among those with an EPDS score <10 and without suicidal ideation, six were clinically assessed as having probable postpartum depression. All participants were offered treatment; however, 18 (8.0%) in the screening arm (10 defaulted the recommended treatment, eight were inadvertently discharged) and three (1.0%) in the no screening arm (inadvertently discharged) did not receive treatment. At 6 months postpartum (i.e., 4 months after randomization), it was reported that women in the screening arm had a 41% reduced risk of depression with EPDS relative to those in the no screening arm (RR 0.59, 95% CI 0.39 to 0.89); this corresponds to 11 (95% CI 6 to 50) needed to screen to prevent one case of postpartum depression at 6 months postpartum. However, after adjustment for the positive predictive value (44%) of the Chinese EPDS for depression ascertained by clinical interview in the Hong Kong population, the number needed to screen increased to 25 (95% CI 14 to 114) [48]. After adjusting for known predictors of postpartum depression using multiple logistic regression (marital relationship at 2 months, history of psychiatric illness, depression during pregnancy, and relationship with mother-in-law), Leung et al. stated that the effect remained statistically significant, but did not report or provide the adjusted RR. Leung et al. also reported results at 18 months postpartum, but the no screening arm was already screened with the EPDS at 6 months postpartum and offered treatment or follow-up services for those who scored ≥10, thereby removing the screened versus not screened comparison. Furthermore, there is uncertainty in using a cut-off score of ≥10 on the EPDS with no further clinical/diagnostic assessment [49]. Since we were unable to otherwise define a threshold for a clinically important difference, it is not clear whether an important difference is being observed in Additional file 1.
Depression score (EPDS)
The evidence is very uncertain about the effect of screening for depression in mean EPDS scores at 6 months postpartum (the mean EPDS score was 1.36 points lower in the screening group [95% CI −0.63 to −2.09; SMD 0.34, 95% CI −0.15 to −0.52]); very low certainty: very serious RoB, serious imprecision and serious indirectness.
Depression score (GHQ-12)
The evidence is very uncertain about the effect of screening for depression on the mean GHQ score at 6 months postpartum (SMD −0.16, 95% CI −0.35 to 0.02); very low certainty: very serious RoB, serious indirectness, serious imprecision.
Health-related quality of life
Kronish et al. [46] measured quality-adjusted life-years (QALYs) and quality-of-life utility scores.
We rated the certainty in the evidence as moderate for these outcomes, except for change in QALYs among women, owing to serious concerns of indirectness as the study included adults who were recently documented with acute coronary syndrome. Regarding change in QALYs among women, we rated the certainty in the evidence as low due to serious concerns of indirectness as described above and serious concerns of imprecision as the sample size was low (see Additional file 1).
Change in mean QALYs
Screening for depression likely results in little to no difference on the change in mean QALYs from baseline to 18 months (SMD of 0 [from 0.12 lower to 0.12 higher])—moderate certainty: serious indirectness. Screening likely results in little to no difference in depression-free days among men (SMD 0.05 higher [0.09 lower to 0.20 higher]) and women (SMD 0.22 lower [0.45 lower to 0.02 higher])—moderate certainty: serious indirectness and low certainty: serious indirectness, serious imprecision.
Change in quality-of-life utility scores
Screening for depression likely results in little to no difference on the change in quality-of-life utility scores at any time point (i.e., baseline, 6, 12, and 18 months) (SMD 0.04 lower [0.17 lower to 0.08 higher] at 18 months)—moderate certainty: serious indirectness.
Mallen et al. [47] measured the quality of life using the Medical Outcomes Study Short Form 12 Mental Component (SF-MCS) and Physical Component (SF-PCS) scores. We rated the certainty in the evidence as very low owing to very serious concerns with the risk of bias because a large proportion of participants were lost to follow-up from those who were screened and indirectness as the study included adults seeking consultation for osteoarthritis and receiving screening for anxiety (see Additional file 1).
SF-MCS scores
The evidence is very uncertain about the effect of screening for depression on the mental quality of life at any time point (i.e., post-consultation, 3, 6, and 12 months post-consultation) (SMD 0.04 lower [0.16 lower to 0.09 higher] at 12 months)—very low certainty: very serious RoB, very serious indirectness.
SF-PCS scores
The evidence is very uncertain about the effect of screening for depression on the physical health quality of life (adjusted MD −0.66, 95% CI −2.25 to 0.93; SMD 0.08 lower [0.21 lower to 0.04 higher] at 12 months) at post-consultation, 3, and 12 months post-consultation—very low certainty: very serious RoB, very serious indirectness. At 6 months post-consultation, screening for depression may decrease physical health quality of life (adjusted MD −1.77, 95% CI −3.22 to −0.32; p = 0.017; SMD −0.26, 95% CI −0.13 to −0.38), but the evidence is very uncertain—very low certainty: very serious RoB, very serious indirectness.
Harms of screening
Kronish et al. [46] reported harms attributable to antidepressant medications (i.e., any bleeding, changes in appetite, drowsiness, and gastrointestinal upset) among the screened and no screened group.
Bleeding
Screening for depression likely results in little to no difference in bleeding at any time point (i.e., 6, 12, and 18 months) (RR 1.00 [0.69 to 1.44] at 18 months); this corresponds to 0 fewer per 1000 patients (36 fewer to 52 more per 1000 patients)—moderate certainty: serious indirectness.
Changes in appetite
Screening for depression likely results in little to no difference in increased appetite at any time point (i.e., 6, 12, and 18 months) (RR 1.00 [0.75 to 1.34] at 18 months); this corresponds to 0 fewer per 1000 (from 44 fewer to 61 more)—moderate certainty: serious indirectness. Screening for depression may result in a slight reduction in decreased appetite (RR 0.85 [0.63 to 1.15] at 18 months); this corresponds to 27 fewer per 1000 (from 66 fewer to 27 more)—moderate certainty: serious indirectness.
Drowsiness
Screening for depression may result in a slight decrease in drowsiness at any time point (i.e., 6, 12, and 18 months) (RR 0.94 [0.81 to 1.09] at 18 months); this corresponds to 28 fewer per 1000 (from 88 fewer to 42 more)—moderate certainty: serious indirectness.
Gastrointestinal upset
Screening for depression may result in a slight decrease in gastrointestinal upset (RR 0.88 [0.69 to 1.12] at 18 months); this corresponds to 30 fewer per 1000 (from 78 fewer to 30 more)—moderate certainty: serious indirectness.
Leung et al. [48] had reported adverse events; however, the effect size was not estimable—very low certainty: very serious RoB, serious indirectness, serious imprecision.
Outcomes not reported
Many outcomes of interest for this review (as reported in PICOs framework) that were established through consultation with the Task Force, their external clinical experts, and patient partners were not reported in the included studies. Not only is there a paucity of data, but the included studies are not necessarily providing clinically helpful information as they are not examining outcomes that are deemed important to guideline panel experts and patient partners. These include diagnosis of depression using a validated diagnostic interview (e.g., the Structured Clinical Interview for DSM (SCID)) at a follow-up time point, day-to-day functionality, lost time at work/school, impact on lifestyle behavior, suicidality, false positive results, overdiagnosis or overtreatment, and labeling/stigma.
Pregnancy and postpartum women
Search results
The search strategies resulted in 1225 records and an additional 33 records were found through scanning systematic review bibliographies, full-text publication of protocols from published protocols and clinical registries, and gray literature searching. After de-duplication, 1104 records remained and were screened based on the title and abstract. A total of 132 records were assessed at full text, with one RCT by Leung et al. included [48] (Fig. 4). Additional file 1 provides a bibliographic list of the excluded studies on full text assessment, with reasons. A list of ongoing trials is provided in Additional file 1.
Study characteristics
Leung et al. [48] performed a randomized controlled trial among 462 women enrolled 2 months postpartum. Women were seen at the Maternal and Child Health Centres in Hong Kong and were randomized to screening with the Edinburgh Postnatal Depression Scale (EPDS) (n = 231) or no screening (n = 231) in addition to usual practice in which nurses carried out clinical assessment. The EPDS consists of 10 questions and scores may range from 0 to 30, with higher scores identifying those as higher likelihood of depression [50]. All women including the control group received clinical assessment at the 2-month postpartum appointment. Participants in either group identified as potentially depressed (i.e., having a score of ≥10 on EPDS, by answering positive to the suicidal ideation question in the intervention group, or through clinical assessment in either group) were offered non-directive counseling by nurses or management by a community psychiatric team (55/73 in the intervention and 11/14 in the control group). Additional file 1 provides the study characteristics table, results table (binary data), and results table (continuous data), respectively. Groups were reported to be similar at baseline, although there were some differences in some baseline measures (e.g., history of psychiatric illness, separated/divorced/widowed/never married, does not live with child all the time).
Outcomes
Results for are presented in Additional file 1. Ratings of risk of bias by study are included in Additional file 1. The GRADE Evidence Profiles and Summary of Findings Tables including explanations for all rating are available in Additional file 1.
Benefits of screening
Maternal mental health outcomes
We rated the certainty in the evidence as very low for these outcomes owing to very serious concerns of risk of bias (lack of blinding and selective reporting) and serious concerns of imprecision (small sample size) (see Additional file 1) and further explained in the “Discussion” section.
Number identified as depressed among women (EPDS score)
The evidence is very uncertain about the effect of screening for depression using the EPDS in postpartum women—very low certainty: very serious RoB and serious imprecision.
At baseline (2 months postpartum), 73 of the 231 women in the screening arm (36.1%) were identified as potentially depressed. Of these, 58 women scored ≥10 on the EPDS, nine women scored <10 on the EPDS, but had a positive response on the suicidal ideation question, and six, among those with EPDS < 10 and without suicidal ideation, were clinically assessed as having probable postpartum depression. In the control group, 14 of the 231 (6%) women were clinically assessed as having probable postpartum depression. Although all women should have been offered treatment, 18 women (8%) in the screening group (10 women defaulted the recommended treatment, eight inadvertently discharged) and three women (1%) in the no screening group (all inadvertently discharged) did not receive treatment. At 6 months postpartum (i.e., 4 months after randomization), women in the screening group had a 41% reduced risk of depression with EPDS relative to those in the unscreened group (RR 0.59, 95% CI 0.39–0.89; Additional file 1). The number needed to screen to prevent one case of postpartum depression at 6 months postpartum was 11 (95% CI 6 to 50), but after adjustment for the positive predictive value (44%) of the Chinese EPDS for depression ascertained by clinical interview in the Hong Kong population, the number needed to screen became 25 (95% CI 14 to 114) [48]. After adjusting for known predictors of postpartum depression (marital relationship at 2 months, history of psychiatric illness, depression during pregnancy, and relationship with mother-in-law) using multiple logistic regression, the authors reported that the effect remained statistically significant (adjusted RR not provided by study authors). The study authors reported results at 18 months postpartum; however, the control group was given the EPDS at 6 months postpartum and was offered treatment/follow-up services for those who scored ≥10, thereby removing the screened versus not screened comparison. There is uncertainty in applying a cut-off score of ≥10 on the EPDS with no further clinical/diagnostic assessment [49]. Since we were unable to otherwise define a threshold for a clinically important difference, it is not clear whether an important difference is being observed in Additional file 1 and the evidence was rated as very low certainty. Therefore, given the available information, the evidence is very uncertain for symptoms of depression measured with EPDS score ≥10 due to very serious concerns with risk of bias and serious concerns with imprecision.
Depression score (EPDS)
At 6 months postpartum, the mean EPDS score was 1.36 points lower in the screening group (95% CI −0.63 to −2.09; SMD 0.34, 95% CI −0.15 to −0.52; Additional file 1). The evidence is very uncertain about the effect of screening for depression on the mean EPDS score at 6 months postpartum—very low certainty: very serious RoB and serious imprecision.
Depression score (GHQ-12)
The evidence is very uncertain about the effect of screening for depression on the mean score of the General Health Questionnaire (GHQ) at 6 months postpartum (SMD −0.16, 95% CI −0.35 to 0.02) (Additional file 1)—very low certainty: very serious RoB and serious imprecision.
Parenting and relationship outcomes
We rated the certainty in the evidence as very low for these outcomes owing to very serious concerns of risk of bias (selective reporting and lack of blinding) and serious concerns of imprecision (small sample size) (see Additional file 1).
Parenting stress index-short form (PSI-SF) score
Parenting outcomes were measured with the Parental Stress Index (PSI) tool, which is designed to evaluate the magnitude of stress in the parent-child system.
The evidence is very uncertain about the effect of screening for depression in the total mean score on the PSI (SMD of 0.17 lower [from 0.35 lower to 0.01 higher]) or any of its subscales (Parental Distress, Parent-Child Dysfunctional Interaction, Difficult Child) at 6 months postpartum—very low certainty: very serious RoB and serious imprecision.
Marital satisfaction score
Martial satisfaction was measured using the Chinese Kansas Marital Satisfaction Scale. The evidence is very uncertain about the effect of screening for depression in the marital satisfaction score at 6 months postpartum (SMD of 0.15 higher [from 0.03 lower to 0.34 higher])—very low certainty: very serious RoB and serious imprecision.
Infant outcomes
We rated the certainty in the evidence as low and very low for these two outcomes owing to serious or very serious concerns of risk of bias (selective reporting and/or lack of blinding, respectively) and serious concerns of imprecision (small sample size) (see Additional file 1).
Infant body weight
Screening for depression likely results in little to no difference in the mean infant body weight at 6 months postpartum (SMD 0.06, 95% CI −0.12 to 0.24)—low certainty: serious RoB and serious imprecision.
Number of infant hospitalizations
The evidence is very uncertain about the effect of screening for depression in the mean number of infant hospitalizations at 6 months postpartum (SMD 0.06, 95% CI —0.13 to 0.24)—very low certainty: very serious RoB and serious imprecision.
Harms of screening
No adverse events were reported in either group during the study duration and it is unclear how this information was sought. We have not assigned a level of importance for this outcome, as there was no information provided on how adverse events were collected/reported. The certainty of the evidence was rated as very low due to very serious concerns with risk of bias and serious concerns with imprecision. Therefore, the evidence is very uncertain for the harms of screening for depression.
Outcomes not reported
A key outcome was not reported in the one included study [48], i.e., the diagnosis of depression using a validated diagnostic interview (e.g., the Mini International Neuropsychiatric Interview (MINI)). Other not reported outcomes of interest for this review include health-related quality of life, false-positive screens, overdiagnosis or overtreatment, labeling/stigma, mother-child interactions, infant neurodevelopment, and infant responsiveness. Although suicidality was evaluated with question 10 on the EPDS, results were not presented separately for those who were identified as depressed at follow-up based on this question (as was provided in the baseline results).