- Open Access
- Open Peer Review
Assessment of a method to detect signals for updating systematic reviews
Systematic Reviewsvolume 3, Article number: 13 (2014)
The Erratum to this article has been published in Systematic Reviews 2014 3:22
Systematic reviews are a cornerstone of evidence-based medicine but are useful only if up-to-date. Methods for detecting signals of when a systematic review needs updating have face validity, but no proposed method has had an assessment of predictive validity performed.
The AHRQ Comparative Effectiveness Review program had produced 13 comparative effectiveness reviews (CERs), a subcategory of systematic reviews, by 2009, 11 of which were assessed in 2009 using a surveillance system to determine the degree to which individual conclusions were out of date and to assign a priority for updating each report. Four CERs were judged to be a high priority for updating, four CERs were judged to be medium priority for updating, and three CERs were judged to be low priority for updating. AHRQ then commissioned full update reviews for 9 of these 11 CERs. Where possible, we matched the original conclusions with their corresponding conclusions in the update reports, and compared the congruence between these pairs with our original predictions about which conclusions in each CER remained valid. We then classified the concordance of each pair as good, fair, or poor. We also made a summary determination of the priority for updating each CER based on the actual changes in conclusions in the updated report, and compared these determinations with the earlier assessments of priority.
The 9 CERs included 149 individual conclusions, 84% with matches in the update reports. Across reports, 83% of matched conclusions had good concordance, and 99% had good or fair concordance. The one instance of poor concordance was partially attributable to the publication of new evidence after the surveillance signal searches had been done. Both CERs originally judged as being low priority for updating had no substantive changes to their conclusions in the actual updated report. The agreement on overall priority for updating between prediction and actual changes to conclusions was Kappa = 0.74.
These results provide some support for the validity of a surveillance system for detecting signals indicating when a systematic review needs updating.
Systematic reviews are a cornerstone of evidence-based care, either by themselves or through their incorporation into practice guidelines, performance measures or other evidence-based practice. To be useful, however, systematic reviews need to be up-to-date.
The science of determining when systematic reviews need updating has been developing for the past decade. Prior to 2001, no method or criterion existed to determine whether evidence-based products remained valid or whether the evidence underlying them had been superseded by newer work. Since then, several groups have begun developing methods to determine signals for updating reviews [1–5]. Most methods involve some form of limited literature searches and the use of expert opinion, although some methods use statistical methods and are applicable only to meta-analytic results [6, 7]. Two of these methods have been formally compared and found to produce similar results . To date, however, no method has been assessed for predictive validity, meaning there is no way of determining whether the presence or absence of signals does in fact predict whether the review is out-of-date. In addition to the more easily assessed situation of a false-positive (that is, a signal that detects that a review is out-of-date, but the subsequent update does not result in any important changes in the conclusions), such a study requires being able to assess for false-negatives, which requires updating reviews for which no signals are detected. In 2008, we were asked to determine which of 11 systematic reviews sponsored by the Agency for Healthcare Research and Quality (AHRQ) Comparative Effectiveness Review (CER) program might be in need of updating. We took advantage of a natural experiment to assess the predictive validity of our method for assessing for signals for updating.
In this study, we assessed the predictive validity of signals for updating CERs detected in 2009 that have since been updated. We start with a description of the original process used to detect signals  and then describe how we assessed the validity of the signals. This original process subsequently evolved to the process described by Ahmadzai et al.; the two are nearly identical.
The 2009 method for detecting signals
Identifying new evidence from published studies
Search strategy. We started by using the search strategy employed in the original report. However, we limited the search (which included at least MEDLINE/PubMed and/or Cochrane Reviews, as well as, on a topic-specific basis, additional databases) to five top-rated general interest medical journals (Annals of Internal Medicine, British Medical Journal, Journal of the American Medical Association, The Lancet and New England Journal of Medicine) and the specialty journals most relevant to the topic. The specialty journals were those most highly represented among the references from the original report (four to six specialty journals). We also modified the key terms if, for example, we were aware of new drugs for the condition, adding their names to the search terms. Search inception dates were 6 to 12 months prior to the end date of the original CER search in order to ensure overlap between the searches.
Study selection and extraction. Using the same general inclusion and exclusion criteria as the original CER, a single reviewer experienced in systematic reviews conducted a screening of the titles and abstracts and requested any articles deemed relevant to the topic. From among those articles, the reviewer extracted relevant data from articles that met the inclusion criteria and then constructed an evidence table. These data included study-level details extracted in the original CER (for example, sample size, study design, and outcomes measured) as well as the outcomes themselves.
Identifying new evidence from experts and expert opinion. For each topic, we created a questionnaire matrix that listed the key questions and conclusions from the original executive summary. The matrix was sent to experts in the field, including the original project leader, technical expert panel members and peer reviewers. The experts were asked to indicate whether each conclusion listed in the matrix was, to their knowledge, still valid and, if not, to describe any new evidence and provide citations.
Assessing individual conclusions for signals. Once abstraction of the study conditions and findings for each new included study was completed and expert opinions were received, we assessed, on a conclusion-by-conclusion basis, whether the new findings provided a signal for the need for an update. Table 1 lists the criteria used for making these determinations .
For each CER, we constructed a summary table that included the following for each key question: original conclusions, findings of the new literature search, summary of expert assessment, our final assessment of the currency of the conclusions, and the priority for updating.
Determining priority for updating a CER. We needed to make an overall judgment regarding the priority for updating an entire CER. This determination rested on two criteria. (1) How much of the CER is possibly, probably or certainly out-of-date? (2) How out-of-date is that portion of the CER? For example, we asked whether the potential changes to the conclusions would involve only refinement of original estimates or whether the potential changes would include the finding that some therapies are no longer favored or might no longer be in use. Another question was whether the portion of the CER that was probably or certainly out-of-date involved an issue of safety (for example, a drug withdrawn from the market, a US Food and Drug Administration black box warning) or the availability of a new drug within an existing class, with the latter being a less important signal to update than the former. This final determination was a global judgment made by all the individuals working on each particular CER. On the basis of that determination, we classified CERs as being of low, medium or high priority for updating. For high-priority updates, we also provided our rationale.
Assessment of predictive validity
Our 2009 work assessed 11 CERs. We classified four as having a high priority for updating, four as having a medium priority for updating and three as having a low priority for updating (see Table 2). One of the low-priority topics, comparative effectiveness of percutaneous coronary interventions and coronary artery bypass grafting for coronary artery disease, was considered a low priority for an update because AHRQ had already commissioned an individual patient data meta-analysis, which it considered to be an update of the CER and was published in 2009 .
AHRQ elected to support full updates of all of the remaining CERs except the report on clinically localized prostate cancer, for which they believed it would be prudent to wait for the pending PIVOT trial results . This situation presented us with a natural experiment. Because all of the reports, regardless of update priority status, were going to get the gold standard of a complete update, we could assess for both false-positives (reports classified as high priority but having no major change in conclusions when updated) and false-negatives (reports classified as low priority that, when updated, had major changes in conclusions) based on the 2009 predictions. To do this experiment, we took each conclusion from the original CER and then tried to match it with the closest similar conclusion from the update. We then assessed the degree of concordance between the 2009 prediction and the updated conclusion. We used the criteria described below.
Good: Concordance was considered good if the original prediction was “still valid” and there was no new relevant evidence or if new evidence continued to support the conclusion, or if the original prediction was “possibly out-of-date”, “probably out-of-date” or “out-of-date” and new evidence appeared that changed the conclusions by a substantial amount.
Fair: Concordance was considered fair if the original prediction was “still valid” and new evidence supported changes in some conclusions but not others or if the original prediction was “possibly out-of-date” but no new evidence was incorporated into the updated conclusions and there were no substantive changes from the original conclusions; or if the original prediction was “probably out-of-date” or “out-of-date” and some conclusions or some aspects of the conclusions had changed but others had not.
Poor: Concordance was considered poor if the original prediction was “still valid” but new evidence substantially changed the conclusions or if the original prediction was “probably out-of-date” or “out-of-date” but no new evidence was incorporated into the update and the conclusions underwent no substantive changes.
Examples of the degree of concordance analysis are shown in Table 3.
We assessed “concordance” rather than “agreement” because the matching of the original conclusions to updated conclusions was often challenging, and “agreement” implies a more direct comparison of original to updated conclusions than is always possible. For this reason, we refrained from using a 2 × 2 table to make comparisons.
We then made a summary assessment of the CER’s priority for updating, based on the updated conclusions. We used the same criteria as those in the prospective assessment: How much of the report was out-of-date and the degree to which it was out-of-date. Using the κ statistic, we compared the agreement between the original assessment of priority and the actual changes.
In the assessment of concordance of individual conclusions, an additional complicating factor was the time delay between our limited literature searches to assess for signals (2008) and the search dates of the update reports (2010 to 2012). Therefore, for conclusions with poor concordance, we reviewed whether they may have been influenced by new evidence published after the surveillance signals search.
We performed our assessment of predictive validity for nine CERs comprising 149 individual conclusions. For each CER, we present our assessment of the concordance of individual conclusions (Additional file 1) as well as a full table describing each conclusion and how it was assessed (Additional file 2). We also provide an overall table that sums up the individual conclusion assessments across all CERs (Table 4).
The great majority (83%) of conclusions for each CER and across CERs had good concordance. However, the CER on gastroesophageal reflux disease (GERD) had four “out-of-date” conclusions with only fair concordance, and one conclusion we had assessed as “still valid” was shown to be out-of-date.
The published 2009 updating assessment judged that the conclusion regarding endoscopic treatment for GERD “should be deleted”, meaning that it was out-of-date, because the endoscopic procedures had been withdrawn from the market. However, one of the three endoscopic procedures reviewed in the original report continued to be used, new endoscopic procedures were introduced and one of the two withdrawn procedures was later reintroduced. The update report noted this changing landscape, and we deemed the concordance with the 2009 prediction as only fair. A more appropriate surveillance assessment would have been that the conclusion needed updating because the endoscopic procedures were evolving over time.
Another conclusion in the original GERD report—that surgery and medical therapy were similarly effective—was rated as “still valid” during the surveillance process but had poor concordance with the update review, which concluded that surgery was favored over medical therapy. One of the studies providing new evidence in support of this conclusion was published in 2009, after completion of the surveillance signal search.
Table 5 compares our original predictions of the need for updating with the priority as determined by the actual update. One CER that was predicted in 2009 to be a high priority for updating was judged to have been a medium priority for updating based on the updated report. A CER determined to be a medium priority update was originally judged as having been a high priority for an update. The updating priority remained the same for the other seven CERs. Table 6 presents in a 3 × 3 table the results of the overall assessment of priority for updating. The κ statistic for agreement was 0.74 (Table 6).
This assessment of the predictive validity of a method to assess a CER for signals for updating yielded generally favorable results. For the vast majority of individual conclusions, concordance between the 2009 predictions and the subsequent updated conclusions was judged to be good. The one instance of poor concordance had new evidence published after the surveillance signals had been assessed, and in this instance involved a CER already judged to be of high priority for updating based on signals of other out-of-date conclusions.
Our present study has three primary limitations. The first is sample size. We were able to assess only nine CERs. However, this number included CERs assessed as being of high, medium or low priority, thus allowing us to assess the possibility of false-negatives (that is, CERs assessed as low priority for updating that nevertheless were fully updated). The likelihood of assessing such false-negatives again is small, as it requires that low-priority CERs be subjected to the gold standard of a full update. Our findings that neither of the CERs judged to be a low priority had any substantive changes in conclusions will reinforce the decision to invest scarce resources in researching other topics rather than commisioning updates of low-priority CERs.
A second limitation is the matching of original conclusions to updated conclusions. In some updated reports, the authors themselves matched the conclusions. In most cases, however, this was not done, and, in some circumstances, determining the appropriate match to the original conclusion was challenging. Additional file 2 lists each original conclusion and its matching updated conclusion so that readers may judge this agreement for themselves.
The third principal limitation of this study is that the 2013 assessment of the 2009 predictions could not be made in a blinded fashion. Our Evidence-based Practice Center (EPC) did both assessments, and, even if some other group had done the 2013 assessment, we could not have enforced blinding, because the 2009 assessments are in the public domain. We tried to guard against bias by having explicit reasons for each judgment and presenting these reasons for readers themselves to judge. Our reasoning should be transparent.
With the limitation of small sample size in mind, we offer the following preliminary conclusions about the surveillance signal method. (1) Low-priority CERs are unlikely to have any substantive changes in conclusions. (2) Conclusions judged likely to be “still valid” almost certainly are still valid. (3) Conclusions judged to be “out-of-date” almost certainly are out-of-date. (4) Safety concerns and the appearance of new classes of therapies and more efficacious treatments are the best targets for high-priority updates. (5) The classification of individual conclusions as possibly or probably out-of-date owing to new evidence may be slightly too sensitive as a signal; in a number of such instances, the update report’s conclusion did not change, because the new evidence identified in the signal search was either rejected or insufficient to change the original conclusion.
In sum, our assessment provides some support for the predictive validity of this method of assessing CERs for signals of the need for updating. Future research is likely to be confined to assessing updates of systematic reviews judged to be a medium or high priority for updating. Further assessment of the factors leading to changes in individual conclusions may help refine the criteria for distinguishing between high- and medium-priority update topics. However, investing extra time and effort to distinguish “possibly” from “probably” out-of-date conclusions or to further refine the global assessment to distinguish medium- from high-priority update topics may begin to make the surveillance process resemble the actual update, which is not the goal of surveillance. In this application, the surveillance process worked very well—nearly perfectly, in fact (κ ≥ 0.8 is considered nearly perfect agreement). No low-priority CER was judged, as having had a substantive change to a conclusion in the update, whereas 3 of 4 high priority CERs did have substantive changes to the conclusions. The results suggest that it is very unlikely that new, practice-changing evidence exists concerning a systematic review judged to be a low priority for updating and supports a policy of delaying an update of a systematic review until new evidence is sufficient to warrant assigning it at least a medium priority.
The assessment method described herein represents part of the basis for the surveillance method used to assess AHRQ systematic reviews as described by Ahmadzai et al. . That program was designed to assess each AHRQ systematic review every 6 months and to take 3 months to complete. One important result is that no systematic review was judged to be a high priority for updating at the first 6-month assessment, meaning that it is probably more cost-effective to assess systematic reviews no more frequently than yearly. Additional work on making surveillance more cost-effective is warranted.
In our present study, we found evidence supporting the predictive validity of a method for assessing AHRQ systematic reviews regarding their need for updating. One advantage of this method relative to other proposed methods is that it is equally useful for meta-analytic reviews and narrative reviews. It may be applicable to systematic reviews produced by other organizations.
Shekelle PG, Newberry SJ, Wu H, Suttorp M, Motala A, Lim YW, Balk EM, Chung M, Yu WW, Lee J, Gaylor JM, Moher D, Ansari MT, Skidmore R, Garritty C, RAND Corporation: Identifying Signals for Updating Systematic Reviews: A Comparison of Two Methods (AHRQ Publication No. 11-EHC042-EF). June 2011, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://effectivehealthcare.ahrq.gov/ehc/products/265/702/Updating-Systematic-Reviews_Final-Report_20110614.pdf (accessed 3 February 2014)
Chung M, Newberry SJ, Ansari MT, Yu WW, Wu H, Lee J, Suttorp M, Gaylor JM, Motala A, Moher D, Balk EM, Shekelle PG: Two methods provide similar signals for the need to update systematic reviews. J Clin Epidemiol. 2012, 65: 660-668. 10.1016/j.jclinepi.2011.12.004.
Newberry SJ, Ahmadzai N, Motala A, Tsertsvadze A, Maglione M, Ansari MT, Hempel S, Tsouros S, Schneider Chafen J, Shanman R, Skidmore B, Moher D, Shekelle PG, RAND Corporation, University of Ottawa Evidence-based Practice Centre: Surveillance and Identification of Signals for Updating Systematic Reviews: Implementation and Early Experience (AHRQ Publication No. 13-EHC088-EF). June 2013, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.effectivehealthcare.ahrq.gov/ehc/products/461/1527/Updating-CERs-Surveillance-Signals-methods-130607.pdf (accessed 3 February 2014)
Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D: How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007, 147: 224-233. 10.7326/0003-4819-147-4-200708210-00179.
Peterson K, McDonagh MS, Fu R: Decisions to update comparative drug effectiveness reviews vary based on type of new evidence. J Clin Epidemiol. 2011, 64: 977-984.6. 10.1016/j.jclinepi.2010.11.019.
Pattanittum P, Laopaiboon M, Moher D, Lumbiganon P, Ngamjarus C: A comparison of statistical methods for identifying out-of-date systematic reviews. PLoS One. 2012, 7: e48894-10.1371/journal.pone.0048894.
Tovey D, Marshall R, Bazian L, Hopewell S, Rader T, National Institute for Health Research (NIHR): Fit for Purpose: Centralised Updating Support for High-Priority Cochrane Reviews (National Institute for Health Research: Cochrane-National Health Service Engagement Award Scheme). July 2011, Southampton, UK: NIHR Evaluation, Trials and Studies Coordination Centre, Available at http://www.editorial-unit.cochrane.org/sites/editorial-unit.cochrane.org/files/uploads/10_4000_01%20Fit%20for%20purpose%20-%20centralised%20updating%20support%20for%20high%20priority%20Cochrane%20Reviews%20FINAL%20REPORT.pdf (accessed 3 February 2014)
Ahmadzai N, Newberry SJ, Maglione MA, Tsertsvadze A, Ansari MT, Hempel S, Motala A, Tsouros S, Schneider Chafen JJ, Shanman R, Moher D, Shekelle PG: A surveillance system to assess the need for updating systematic reviews. Syst Rev. 2013, 2: 104-10.1186/2046-4053-2-104.
Shekelle P, Newberry S, Maglione M, Shanman R, Johnsen B, Carter J, Motala A, Hulley B, Wang Z, Bravata D, Chen M, Grossman J, Southern California Evidence-based Practice Center: Assessment of the Need to Update Comparative Effectiveness Reviews: Report of an Initial Rapid Program Assessment (2005–2009). 9 October 2009, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://effectivehealthcare.ahrq.gov/ehc/products/125/331/2009_0923UpdatingReports.pdf (accessed 3 February 2014)
Hlatky MA, Boothroyd DB, Bravata DM, Boersma E, Booth J, Brooks MM, Carrié D, Clayton TC, Danchin N, Flather M, Hamm CW, Hueb WA, Kähler J, Kelsey SF, King SB, Kosinski AS, Lopes N, McDonald KM, Rodriguez A, Serruys P, Sigwart U, Stables RH, Owens DK, Pocock SJ: Coronary artery bypass surgery compared with percutaneous coronary interventions for multivessel disease: a collaborative analysis of individual patient data from ten randomised trials. Lancet. 2009, 373: 1190-1197. 10.1016/S0140-6736(09)60552-3.
Ip S, Bonis P, Tatsioni A, Raman G, Chew P, Kupelnick B, Fu L, DeVine D, Lau J: Comparative Effectiveness of Management Strategies for Gastroesophageal Reflux Disease (Comparative Effectiveness Review 1). December 2005, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK42949/pdf/TOC.pdf (accessed 3 February 2014)
Bruening W, Launders J, Pinkney N, Kostinsky H, Schoelles K, Turkelson C: Effectiveness of Noninvasive Diagnostic Tests for Breast Abnormalities (Comparative Effectiveness Review 2). February 2006, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK42958/pdf/TOC.pdf (accessed 3 February 2014)
Seidenfeld J, Piper M, Bohlius J, Weingart O, Trelle S, Engert A, Skoetz N, Schwarzer G, Wilson J, Brunskill S, Hyde C, Bonnell C, Ziegler KM, Aronson N: Comparative Effectiveness of Epoetin and Darbepoetin for Managing Anemia in Patients Undergoing Cancer Treatment (Comparative Effectiveness Review 3). May 2006, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK42982/pdf/TOC.pdf (accessed 3 February 2014)
Chou R, Helfand M, Peterson K, Dana T, Roberts C: Comparative Effectiveness and Safety of Analgesics for Osteoarthritis (Comparative Effectiveness Review 4). September 2006, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK42994/pdf/TOC.pdf (accessed 3 February 2014)
Shekelle P, Maglione M, Bagley S, Suttorp M, Mojica WA, Carter J, Rolon C, Hilton L, Zhou A, Chen S, Glassman P: Comparative Effectiveness of Off-Label Use of Atypical Antipsychotics (Comparative Effectiveness Review 6). January 2007, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.effectivehealthcare.ahrq.gov/ehc/assets/File/Atypical_Antipsychotics_Final_Report.pdf (accessed 3 February 2014)
Donahue KE, Gartlehner G, Jonas DE, Lux LJ, Thieda P, Jonas B, Hansen RA, Morgan LC, Williams SC, Lohr KN: Comparative Effectiveness of Drug Therapy for Rheumatoid Arthritis and Psoriatic Arthritis in Adults (Comparative Effectiveness Review 11). November 2007, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK43126/pdf/TOC.pdf (accessed 3 February 2014)
MacLean C, Alexander A, Carter J, Chen S, Desai SB, Grossman J, Maglione M, McMahon M, McNamara M, Mojica W, Newberry S, Ranganath V, Suttorp M, Timmer M, Tringale C, Valentine D, Zhou A: Comparative Effectiveness of Treatments to Prevent Fractures in Men and Women with Low Bone Density or Osteoporosis (Comparative Effectiveness Report 12). December 2007, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK43160/pdf/TOC.pdf (accessed 3 February 2014)
Gartlehner G, Hansen RA, Thieda P, DeVeaugh-Geiss AM, Gaynes BN, Krebs EE, Lux LJ, Morgan LC, Shumate JA, Monroe LG, Lohr KN: Comparative Effectiveness of Second-Generation Antidepressants in the Pharmacologic Treatment of Adult Depression (Comparative Effectiveness Review 7). January 2007, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK43023/pdf/TOC.pdf (accessed 3 February 2014)
Matchar DB, McCrory DC, Orlando LA, Patel MR, Patel UD, Patwardhan MB, Powers B, Samsa GP, Gray RN: Comparative Effectiveness of Angiotensin-Converting Enzyme Inhibitors (ACEIs) and Angiotensin II Receptor Antagonists (ARBs) for Treating Essential Hypertension (Comparative Effectiveness Review 10). November 2007, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK43073/pdf/TOC.pdf (accessed 3 February 2014)
Wilt TJ, Shamliyan T, Taylor B, MacDonald R, Tacklind J, Rutks I, Koeneman K, Cho CS, Kane RL: Comparative Effectiveness of Therapies for Clinically Localized Prostate Cancer (Comparative Effectiveness Review 13). February 2008, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK43147/pdf/TOC.pdf (accessed 3 February 2014)
Bravata DM, McDonald KM, Gienger AL, Sundaram V, Perez MV, Varghese R, Kapoor JR, Ardehali R, McKinnon MC, Stave CD, Owens DK, Hlatky MA: Comparative Effectiveness of Percutaneous Coronary Interventions and Coronary Artery Bypass Grafting for Coronary Artery Disease (Comparative Effectiveness Report 9). October 2007, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK43032/pdf/TOC.pdf (accessed 3 February 2014)
Wilt TJ, Brawer MK, Jones KM, Barry MJ, Aronson WJ, Fox S, Gingrich JR, Wei JT, Gilhooly P, Grob BM, Nsouli I, Iyer P, Cartagena R, Snider G, Roehrborn C, Sharifi R, Blank W, Pandya P, Andriole GL, Culkin D, Wheeler T, Prostate Cancer Intervention versus Observation Trial (PIVOT) Study Group: Radical prostatectomy versus observation for localized prostate cancer. N Engl J Med. 2012, 367: 203-213. 10.1056/NEJMoa1113162. A published erratum appears in N Engl J Med 2012, 367:582
Chou R, McDonagh MS, Nakamoto E, Griffin J: Analgesics for Osteoarthritis: An Update of the 2006 Comparative Effectiveness Review (Comparative Effectiveness Review 38). October 2011, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://effectivehealthcare.ahrq.gov/ehc/products/180/795/Analgesics-Update_CER-38_20111007.pdf (accessed 3 February 2014)
Gartlehner G, Hansen RA, Morgan LC, Thaler K, Lux LJ, Van Noord M, Mager U, Gaynes BN, Thieda P, Strobelberger M, Lloyd S, Reichenpfader U, Lohr KN: Second-Generation Antidepressants in the Pharmacologic Treatment of Adult Depression: An Update of the 2007 Comparative Effectiveness Review (Comparative Effectiveness Review 46). December 2011, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK83442/pdf/TOC.pdf (accessed 3 December 2014)
Ip S, Chung M, Moorthy D, Yu WW, Lee J, Chan JA, Bonis PA, Lau J: Comparative Effectiveness of Management Strategies for Gastroesophageal Reflux Disease: Update (Comparative Effectiveness Review 29). September 2011, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK65406/pdf/TOC.pdf (accessed 3 February 2014)
Bruening W, Uhl S, Fontanarosa J, Reston J, Treadwell J, Schoelles K: Noninvasive Diagnostic Tests for Breast Abnormalities: Update of a 2006 Review (Comparative Effectiveness Review 47). February 2012, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK84530/pdf/TOC.pdf (accessed 3 February 2014)
Grant MD, Piper M, Bohlius J, Tonia T, Robert N, Vats V, Bonnell C, Ziegler KM KM, Aronson N: Epoetin and Darbepoetin for Managing Anemia in Patients Undergoing Cancer Treatment: Comparative Effectiveness Update (Comparative Effectiveness Review 113). April 2013, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK143013/pdf/TOC.pdf (accessed 3 February 2014)
Maglione M, Maher AR, Hu J, Wang Z, Shanman R, Shekelle PG, Roth B, Hilton L, Suttorp MJ, Ewing BA, Motala A, Perry T: Off-Label Use of Atypical Antipsychotics: An Update (Comparative Effectiveness Review 43). September 2011, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK66081/pdf/TOC.pdf (accessed 3 February 2014)
Donahue KE, Jonas DE, Hansen RA, Roubey R, Jonas B, Lux LJ, Gartlehner G, Harden E, Wilkins T, Peravali V, Bangdiwala SI, Yuen A, Thieda P, Morgan LC, Crotty K, Desai R, Van Noord M: Drug Therapy for Rheumatoid Arthritis in Adults: An Update (Comparative Effectiveness Review 55). April 2012, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.effectivehealthcare.ahrq.gov/ehc/products/203/1044/CER55_DrugTherapiesforRheumatoidArthritis_FinalReport_20120618.pdf (accessed 3 February 2014)
Crandall CJ, Newberry SJ, Diamant A, Lim YW, Gellad WF, Suttorp MJ, Motala A, Ewing B, Roth B, Shanman R, Timmer M, Shekelle PG: Treatment to Prevent Fractures in Men and Women With Low Bone Density or Osteoporosis: Update of a 2007 Report (Comparative Effectiveness Review 53). March 2012, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK92566/pdf/TOC.pdf (accessed 3 February 2014)
Sanders GD, Coeytaux R, Dolor RJ, Hasselblad V, Patel UD, Powers B, Yancy WS, Gray RN, Irvine RJ, Kendrick A: Angiotensin-Converting Enzyme Inhibitors (ACEIs), Angiotensin II Receptor Antagonists (ARBs), and Direct Renin Inhibitors for Treating Essential Hypertension: An Update (Comparative Effectiveness Review 34). June 2011, Rockville, MD: Agency for Healthcare Research and Quality, Available at http://www.ncbi.nlm.nih.gov/books/NBK61789/pdf/TOC.pdf (accessed 3 February 2014)
The authors thank Patricia Smith for her administrative assistance in the preparation of the manuscript. This study was supported by the Agency for Healthcare Research and Quality (contract HHSA290200710062I). This project was funded by the Agency for Healthcare Research and Quality (AHRQ) under contract number HHSA-290-2007-10062I.
The authors declare they have no competing interests.
PGS developed the idea for the study. PGS and SJN developed the original 2009 method and its applications. AM and BJ collected the information from the updated reports and performed the initial matching of conclusions and PGS revised these matches, made the determinations of agreement and concordance, and performed the statistical analysis. All authors read and approved the final manuscript.
An erratum to this article is available at http://dx.doi.org/10.1186/2046-4053-3-22.