- Open Access
- Open Peer Review
Limitations of A Measurement Tool to Assess Systematic Reviews (AMSTAR) and suggestions for improvement
Systematic Reviewsvolume 5, Article number: 58 (2016)
A Measurement Tool to Assess Systematic Reviews (AMSTAR) is a commonly used tool to assess the quality of systematic reviews; however, modifications are needed to improve its usability, reliability, and validity. In this commentary, we summarize our experience and the experiences of others who have used AMSTAR and provide suggestions for its improvement. We propose that AMSTAR should modify a number of individual items and their instructions and responses to make them more congruent with an assessment of the methodologic quality of systematic reviews. We recommend adding new items and modifying existing items to assess the quality of the body of evidence and to address subgroup and sensitivity analyses. More detailed instructions are needed for scoring individual items across multiple reviewers, and we recommend that a total score should not be calculated. These suggestions need to be empirically tested prior to implementation.
A Measurement Tool to Assess Systematic Reviews (AMSTAR) is a commonly used tool to assess the methodologic quality of systematic reviews . It has demonstrated satisfactory reliability and construct validity  for systematic reviews of randomized controlled trials of treatment interventions . AMSTAR is widely used to assess the quality of systematic reviews, and some users state it is the most appropriate (and best) tool [4–6], while others have found it problematic [7–17] and therefore modified the tool [7, 11, 15, 18–30]. In this commentary, we summarize our experience using AMSTAR along with the experiences of others, describe several key issues, and provide suggestions for improvement (Table 1).
The stated objective of AMSTAR is to assess the methodological quality of systematic reviews  which refers to whether the authors of a study (or presumably a systematic review) did the best that they could . The items of AMSTAR, however, largely address quality of reporting (e.g., items 5 and 6)  and risk of bias  (e.g., items 8 and 9) rather than the methodological quality. Several items should be amended to be consistent with the stated objective.
AMSTAR encompasses most of the key constructs that are relevant to the assessment of the methodological quality of systematic reviews; however, one critical construct is missing as noted also by other investigators [9, 34–36]: an explicit and reproducible method for assessing the quality of the body of evidence for each important outcome (i.e., the confidence in the estimates of effect ). We suggest revising item 8 to focus on this construct, separating it from the assessment of the quality of individual studies (item 7) (Table 1). AMSTAR also lacks an item that assesses subgroup and sensitivity analyses [9, 36]. Subgroup analyses are important to decision-makers as treatment effects may differ across populations. Similarly, sensitivity analyses specified a priori help to assess the robustness of the review’s findings . Items related to subgroups and sensitivity analyses should be added (new item 12, Table 1).
Some AMSTAR items and their instructions are unclear and need to be revised (Table 1). For example, item 4 regarding the “status of publication” might refer to either the inclusion or exclusion of gray literature. The instructions suggest that gray literature should be included; however, its relevance is closely related to the review question and may not always be necessary. In AMSTAR , foreign language publications are considered gray literature; however, this is not consistent with commonly used definitions .
The response options (yes, no, cannot answer, not applicable) are problematic [9, 39–43]. For example, “cannot answer” can be difficult to interpret and distinguish from “no” when no information is provided. A common approach to quality assessment is to assume that if the authors did not report a step, then it did not happen; thus, “no” would be the appropriate response. The instructions, however, suggest that “cannot answer” should be used when the item is “relevant but not described,” which means a “no” response would rarely be used as authors seldom report explicitly that they did not do something. In addition, “not applicable” is only appropriate to two items (items 9 and 10) when these items are not possible or appropriate; all other items should always be addressed.
The guidance for scoring individual items and for obtaining a total score is unclear. In AMSTAR , if all criterion are met for an individual item (i.e., “yes”), it receives a score of “1” and the sum of all “yes” responses indicates the total score out of 11. Systematic reviews, however, often partially meet the item’s criteria such as listing the search databases and dates but, perhaps due to word limitations of the journal, do not provide the search strategies or keywords. To address the issue of evaluating multiple constructs within a single AMSTAR item, investigators have modified its scoring to allow points for partially fulfilled items [7, 9, 34, 35, 39]. Kung and colleagues developed R-AMSTAR , subdividing each item into four components with a score ranging from 11 to 44, where higher scores indicate better methodological quality. R-AMSTAR has been used by a number of investigators [5, 45–50], and a comparison to AMSTAR concluded that R-AMSTAR provided greater guidance for each item and is more reliable and useful .
In addition, AMSTAR provides no guidance on how to combine individual item scores from multiple assessors other than stating that consensus should be reached for each item. We have averaged AMSTAR scores across assessors to encompass each independent evaluation . Other investigators have used similar approaches such as averaging scores between two assessors when discordant by one or two points and involving a third assessor when scores differed by three or more points [53, 54].
AMSTAR was deliberately developed without guidance on how to translate the total score into categorical ratings for the overall assessment of the systematic review’s quality (e.g., good, fair, poor) [1, 55]. Various thresholds have been used by investigators to define categories for quality (e.g., 0–4 vs. 0–3 for poor quality), making it difficult to compare assessments across reviews. AMSTAR was also designed under the assumption that each item is of equal weight when considering the systematic review’s overall quality . Other investigators have dealt with this issue by assigning different weights to items they consider more important [53, 56–58]. For example, Jacobs and colleagues rated systematic reviews as high quality if items 3, 6, 7, and 8 were met regardless of the total score . An additional problem with the current scoring method is the equivalence of “not applicable,” “no,” and “cannot answer” (all scored as zero) because an item rated as “not applicable” should not be taken into account in the total score. Clearer guidance about calculating a total score is needed along with an acknowledgement of the limitations of scoring across all items should users of AMSTAR choose to calculate a total score. We believe that obtaining a total score should be avoided as it has been shown to be problematic .
AMSTAR is a useful tool for assessing the quality of systematic reviews; however, some modifications would improve its usability, reliability, and validity. The issues discussed in this commentary are not limited to our own experiences but are shared across many investigators who have used this tool. We have provided suggestions for improving AMSTAR; however, any revised tool needs to be empirically tested for reliability and validity, and undoubtedly, additional refinements will be needed. We look forward to further dialog on AMSTAR and to subsequent revisions and evaluations.
A Measurement Tool to Assess Systematic Reviews
Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10.
Shea BJ, Hamel C, Wells GA, Bouter LM, Kristjansson E, Grimshaw J, et al. AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol. 2009;62(10):1013–20. http://dx.doi.org/10.1016/j.jclinepi.2008.10.009.
Shea BJ, Bouter LM, Peterson J, Boers M, Andersson N, Ortiz Z, et al. External validation of a measurement tool to assess systematic reviews (AMSTAR). PLoS ONE. 2007;2(12), e1350.
Chambrone L, Faggion Jr CM, Pannuti CM, Chambrone LA. Evidence-based periodontal plastic surgery: an assessment of quality of systematic reviews in the treatment of recession-type defects. J Clin Periodontol. 2010;37(12):1110–8. http://dx.doi.org/10.1111/j.1600-051X.2010.01634.x.
Klimo Jr P, Thompson CJ, Ragel BT, Boop FA. Methodology and reporting of meta-analyses in the neurosurgical literature. J Neurosurg. 2014;120(4):796–810. http://dx.doi.org/10.3171/2013.11.JNS13195.
Nicolau I, Ling D, Tian L, Lienhardt C, Pai M. Methodological and reporting quality of systematic reviews on tuberculosis. Int J Tuberc Lung Dis. 2013;17(9):1160–9. http://dx.doi.org/10.5588/ijtld.13.0050.
Aziz T, Compton S, Nassar U, Matthews D, Ansari K, Flores-Mir C. Methodological quality and descriptive characteristics of prosthodontic-related systematic reviews. J Oral Rehabil. 2013;40(4):263–78. http://dx.doi.org/10.1111/joor.12028.
Elangovan S, Avila-Ortiz G, Johnson GK, Karimbux N, Allareddy V. Quality assessment of systematic reviews on periodontal regeneration in humans. J Periodontol. 2013;84(2):176–85. http://dx.doi.org/10.1902/jop.2012.120021.
Fleming PS, Koletsi D, Seehra J, Pandis N. Systematic reviews published in higher impact clinical journals were of higher quality. J Clin Epidemiol. 2014;67(7):754–9. http://dx.doi.org/10.1016/j.jclinepi.2014.01.002.
Kamioka H, Tsutani K, Okuizumi H, Mutoh Y, Ohta M, Handa S, et al. Effectiveness of aquatic exercise and balneotherapy: a summary of systematic reviews based on randomized controlled trials of water immersion therapies. J Epidemiol. 2010;20(1):2–12.
Lang LA, Teich ST. A critical appraisal of the systematic review process: systematic reviews of zirconia single crowns. J Prosthet Dent. 2014;111(6):476–84. http://dx.doi.org/10.1016/j.prosdent.2013.10.007.
Macedo CR, Riera R, Torloni MR. Methodological quality of systematic reviews and clinical trials on women’s health published in a Brazilian evidence-based health journal. Clinics. 2013;68(4):563–7. http://dx.doi.org/10.6061/clinics/2013(04)20.
Remschmidt C, Wichmann O, Harder T. Methodological quality of systematic reviews on influenza vaccination. Vaccine. 2014;32(15):1678–84. http://dx.doi.org/10.1016/j.vaccine.2014.01.060.
Kumar A, Galeb S, Djulbegovic B. Treatment of patients with multiple myeloma: an overview of systematic reviews. Acta Haematol. 2011;125(1–2):8–22. http://dx.doi.org/10.1159/000318880.
Prior M, Guerin M, Grimmer-Somers K. The effectiveness of clinical guideline implementation strategies—a synthesis of systematic review findings. J Eval Clin Pract. 2008;14(5):888–97. http://dx.doi.org/10.1111/j.1365-2753.2008.01014.x.
Seo HJ, Kim KU. Quality assessment of systematic reviews or meta-analyses of nursing interventions conducted by Korean reviewers. BMC Med Res Methodol. 2012;12:129. http://dx.doi.org/10.1186/1471-2288-12-129.
Sequeira-Byron P, Fedorowicz Z, Jagannath VA, Sharif MO. An AMSTAR assessment of the methodological quality of systematic reviews of oral healthcare interventions published in the Journal of Applied Oral Science (JAOS). J Appl Oral Sci. 2011;19(5):440–7.
Andersen JH, Fallentin N, Thomsen JF, Mikkelsen S. Risk factors for neck and upper extremity disorders among computers users and the effect of interventions: an overview of systematic reviews. PLoS ONE. 2011;6(5):e19691. http://dx.doi.org/10.1371/journal.pone.0019691.
Berkhof M, van Rijssen HJ, Schellart AJ, Anema JR, van der Beek AJ. Effective training strategies for teaching communication skills to physicians: an overview of systematic reviews. Patient Educ Couns. 2011;84(2):152–62. http://dx.doi.org/10.1016/j.pec.2010.06.010.
Johnson BT, MacDonald HV, Bruneau Jr ML, Goldsby TU, Brown JC, Huedo-Medina TB, et al. Methodological quality of meta-analyses on the blood pressure response to exercise: a review. J Hypertens. 2014;32(4):706–23. http://dx.doi.org/10.1097/HJH.0000000000000097.
Kelley GA, Kelley KS. Effects of exercise in the treatment of overweight and obese children and adolescents: a systematic review of meta-analyses. J Obes. 2013;2013:783103. http://dx.doi.org/10.1155/2013/783103.
Kelley GA, Kelley KS. Effects of exercise on depressive symptoms in adults with arthritis and other rheumatic disease: a systematic review of meta-analyses. BMC Musculoskelet Disord. 2014;15:121. http://dx.doi.org/10.1186/1471-2474-15-121.
Massougbodji J, Le Bodo Y, Fratu R, De Wals P. Reviews examining sugar-sweetened beverages and body weight: correlates of their quality and conclusions. Am J Clin Nutr. 2014;99(5):1096–104. http://dx.doi.org/10.3945/ajcn.113.063776.
Nuckols TK, Anderson L, Popescu I, Diamant AL, Doyle B, Di Capua P, et al. Opioid prescribing: a systematic review and critical appraisal of guidelines for chronic pain. Ann Intern Med. 2014;160(1):38–47. http://dx.doi.org/10.7326/0003-4819-160-1-201401070-00732.
Panic N, Leoncini E, de Belvis G, Ricciardi W, Boccia S. Evaluation of the endorsement of the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement on the quality of published systematic review and meta-analyses. PLoS ONE. 2013;8(12):e83138. http://dx.doi.org/10.1371/journal.pone.0083138.
Pieper D, Mathes T, Eikermann M. Can AMSTAR also be applied to systematic reviews of non-randomized studies? BMC Res Notes. 2014;7:609. http://dx.doi.org/10.1186/1756-0500-7-609.
Saokaew S, Oderda GM. Quality assessment of the methods used in published opioid conversion reviews. J Pain Palliat Care Pharmacother. 2012;26(4):341–7. http://dx.doi.org/10.3109/15360288.2012.734904.
Sardanelli F, Bashir H, Berzaczy D, Cannella G, Espeland A, Flor N, et al. The role of imaging specialists as authors of systematic reviews on diagnostic and interventional imaging and its impact on scientific quality: report from the EuroAIM Evidence-based Radiology Working Group. Radiology. 2014;272(2):533–40. http://dx.doi.org/10.1148/radiol.14131730.
Walton DM, Carroll LJ, Kasch H, Sterling M, Verhagen AP, Macdermid JC, et al. An overview of systematic reviews on prognostic factors in neck pain: results from the International Collaboration on Neck Pain (ICON) project. Open Orthop J. 2013;7:494–505. http://dx.doi.org/10.2174/1874325001307010494.
Wiysonge CS, Ngcobo NJ, Jeena PM, Madhi SA, Schoub BD, Hawkridge A, et al. Advances in childhood immunisation in South Africa: where to now? Programme managers’ views and evidence from systematic reviews. BMC Public Health. 2012;12:578. http://dx.doi.org/10.1186/1471-2458-12-578.
Higgins JPT, Green S. Cochrane handbook of systematic reviews of interventions. West Sussex: The Cochrane Collaboration; 2008.
Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 2009;62(10):1006–12. doi:10.1016/j.jclinepi.2009.06.005.
Whiting P, Savovic J, Higgins JP, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34. doi:10.1016/j.jclinepi.2015.06.005.
Papageorgiou SN, Papadopoulos MA, Athanasiou AE. Evaluation of methodology and quality characteristics of systematic reviews in orthodontics. Orthod Craniofac Res. 2011;14(3):116–37. http://dx.doi.org/10.1111/j.1601-6343.2011.01522.x.
Papageorgiou SN, Papadopoulos MA, Athanasiou AE. Reporting characteristics of meta-analyses in orthodontics: methodological assessment and statistical recommendations. Eur J Orthod. 2014;36(1):74–85. http://dx.doi.org/10.1093/ejo/cjt008.
Brito JP, Tsapas A, Griebeler ML, Wang Z, Prutsky GJ, Domecq JP, et al. Systematic reviews supporting practice guideline recommendations lack protection against bias. J Clin Epidemiol. 2013;66(6):633–8. http://dx.doi.org/10.1016/j.jclinepi.2013.01.008.
Berkman ND, Lohr KN, Morgan LC, Kuo TM, Morton SC. Interrater reliability of grading strength of evidence varies with the complexity of the evidence in systematic reviews. J Clin Epidemiol. 2013;66(10):1105–17. doi:10.1016/j.jclinepi.2013.06.002. e1.
Institute of Medicine. Finding what works in health care: standards for systematic reviews. Washington, D.C.: National Academies Press; 2011.
Faggion Jr CM, Listl S, Giannakopoulos NN. The methodological quality of systematic reviews of animal studies in dentistry. Vet J. 2012;192(2):140–7. http://dx.doi.org/10.1016/j.tvjl.2011.08.006.
Kang D, Wu Y, Hu D, Hong Q, Wang J, Zhang X. Reliability and external validity of AMSTAR in assessing quality of TCM systematic reviews. Evid Based Complement Alternat Med. 2012;2012:732195. http://dx.doi.org/10.1155/2012/732195.
Rookmoneea M, Dennis L, Brealey S, Rangan A, White B, McDaid C, et al. The effectiveness of interventions in the management of patients with primary frozen shoulder. J Bone Joint Surg (Br). 2010;92(9):1267–72. http://dx.doi.org/10.1302/0301-620X.92B9.24282.
de Bot CM, Moed H, Berger MY, Roder E, van Wijk RG, van der Wouden JC. Sublingual immunotherapy in children with allergic rhinitis: quality of systematic reviews. Pediatr Allergy Immunol. 2011;22(6):548–58. http://dx.doi.org/10.1111/j.1399-3038.2011.01165.x.
Miyahara M. Meta review of systematic and meta analytic reviews on movement differences, effect of movement based interventions, and the underlying neural mechanisms in autism spectrum disorder. Front Integr Neurosci. 2013;7:16. http://dx.doi.org/10.3389/fnint.2013.00016.
Kung J, Chiappelli F, Cajulis OO, Avezova R, Kossan G, Chew L, et al. From systematic reviews to clinical recommendations for evidence-based health care: validation of Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) for grading of clinical relevance. Open Dent J. 2010;4:84–91. http://dx.doi.org/10.2174/1874210601004020084.
Faggion Jr CM, Giannakopoulos NN. Critical appraisal of systematic reviews on the effect of a history of periodontitis on dental implant loss. J Clin Periodontol. 2013;40(5):542–52. http://dx.doi.org/10.1111/jcpe.12096.
Deckert S, Kopkow C, Schmitt J. Nonallergic comorbidities of atopic eczema: an overview of systematic reviews. Allergy. 2014;69(1):37–45. http://dx.doi.org/10.1111/all.12246.
Kitsiou S, Pare G, Jaana M. Systematic reviews and meta-analyses of home telemonitoring interventions for patients with chronic diseases: a critical assessment of their methodological quality. J Med Internet Res. 2013;15(7):e150. http://dx.doi.org/10.2196/jmir.2770.
Ramchandani M, Siddiqui M, Kanwar R, Lakha M, Phi L, Giacomelli L, et al. Proteomic signature of periodontal disease in pregnancy: predictive validity for adverse outcomes. Bioinformation. 2010;5(7):300–3.
Schmitter M, Sterzenbach G, Faggion Jr CM, Krastl G. A flood tide of systematic reviews on endodontic posts: methodological assessment using of R-AMSTAR. Clin Oral Investig. 2013;17(5):1287–94. http://dx.doi.org/10.1007/s00784-013-0945-z.
Wells C, Kolt GS, Marshall P, Hill B, Bialocerkowski A. Effectiveness of Pilates exercise in treating people with chronic low back pain: a systematic review of systematic reviews. BMC Med Res Methodol. 2013;13:7. http://dx.doi.org/10.1186/1471-2288-13-7.
Popovich I, Windsor B, Jordan V, Showell M, Shea B, Farquhar CM. Methodological quality of systematic reviews in subfertility: a comparison of two different approaches. PLoS ONE. 2012;7(12):e50403. http://dx.doi.org/10.1371/journal.pone.0050403.
Burda BU, Norris SL, Holmer HK, Ogden LA, Smith ME. Quality varies across clinical practice guidelines for mammography screening in women aged 40-49 years as assessed by AGREE and AMSTAR instruments. J Clin Epidemiol. 2011;64(9):968–76. http://dx.doi.org/10.1016/j.jclinepi.2010.12.005.
Weed DL, Althuis MD, Mink PJ. Quality of reviews on sugar-sweetened beverages and health outcomes: a systematic review. Am J Clin Nutr. 2011;94(5):1340–7. http://dx.doi.org/10.3945/ajcn.111.015875.
Monasta L, Batty GD, Cattaneo A, Lutje V, Ronfani L, Van Lenthe FJ, et al. Early-life determinants of overweight and obesity: a review of systematic reviews. Obes Rev. 2010;11(10):695–708. http://dx.doi.org/10.1111/j.1467-789X.2010.00735.x.
Needleman I, Clarkson J, Worthington H. A practitioner’s guide to developing critical appraisal skills: reviews of research. J Am Dent Assoc. 2013;144(5):527–30.
List T, Axelsson S. Management of TMD: evidence from systematic reviews and meta-analyses. J Oral Rehabil. 2010;37(6):430–51. http://dx.doi.org/10.1111/j.1365-2842.2010.02089.x.
Jacobs WC, Rubinstein SM, Willems PC, Moojen WA, Pellise F, Oner CF, et al. The evidence on surgical interventions for low back disorders, an overview of systematic reviews. Eur Spine J. 2013;22(9):1936–49. http://dx.doi.org/10.1007/s00586-013-2823-4.
Jaspers MW, Smeulers M, Vermeulen H, Peute LW. Effects of clinical decision-support systems on practitioner performance and patient outcomes: a synthesis of high-quality systematic review findings. J Am Med Inform Assoc. 2011;18(3):327–34. http://dx.doi.org/10.1136/amiajnl-2011-000094.
Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282(11):1054–60.
Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924–6. http://dx.doi.org/10.1136/bmj.39489.470347.AD.
Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, et al. Current methods of the US Preventive Services Task Force: a review of the process. Am J Prev Med. 2001;20(3 Suppl):21–35.
The authors thank Carrie D. Patnode, Ph.D. for reviewing the draft manuscript and Lauren A. Ogden, B.A. and Keshia D. Bigler, B.S. for the administrative support.
This manuscript was the result of work performed for the Agency for Healthcare Research and Quality under grant HS018500-01 (S. L. Norris). The funder played no role in drafting this manuscript.
B.U. Burda, H.K. Holmer, and S.L. Norris used and published results of AMSTAR in the assessment of systematic review quality. S.L. Norris is an active member of the GRADE Working Group. The authors have no other conflicts of interest to declare.
BUB, HKH, and SLN conceived the design of the study, collected, analyzed, and interpreted the data, and drafted, reviewed, and approved the manuscript.