Recommendations for reporting of systematic reviews and meta-analyses of diagnostic test accuracy: a systematic review

Background This study is to perform a systematic review of existing guidance on quality of reporting and methodology for systematic reviews of diagnostic test accuracy (DTA) in order to compile a list of potential items that might be included in a reporting guideline for such reviews: Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy (PRISMA-DTA). Methods Study protocol published on EQUATOR website. Articles in full text or abstract form that reported on any aspect of reporting systematic reviews of diagnostic test accuracy were eligible for inclusion. We used the Ovid platform to search Ovid MEDLINE®, Ovid MEDLINE® In-Process & Other Non-Indexed Citations and Embase Classic+Embase through May 5, 2016. The Cochrane Methodology Register in the Cochrane Library (Wiley version) was also searched. Title and abstract screening followed by full-text screening of all search results was performed independently by two investigators. Guideline organization websites, published guidance statements, and the Cochrane Handbook for Diagnostic Test Accuracy were also searched. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Standards for Reporting Diagnostic Accuracy (STARD) were assessed independently by two investigators for relevant items. Results The literature searched yielded 6967 results; 386 were included after title and abstract screening and 203 after full-text screening. After reviewing the existing literature and guidance documents, a preliminary list of 64 items was compiled into the following categories: title (three items); introduction (two items); methods (35 items); results (13 items); discussion (nine items), and disclosure (two items). Conclusion Items on the methods and reporting of DTA systematic reviews in the present systematic review will provide a basis for generating a PRISMA extension for DTA systematic reviews.


Results:
The literature searched yielded 6967 results; 386 were included after title and abstract screening and 203 after full-text screening. After reviewing the existing literature and guidance documents, a preliminary list of 64 items was compiled into the following categories: title (three items); introduction (two items); methods (35 items); results (13 items); discussion (nine items), and disclosure (two items). Conclusion: Items on the methods and reporting of DTA systematic reviews in the present systematic review will provide a basis for generating a PRISMA extension for DTA systematic reviews.

Background
In their 2015 report titled "Improving Diagnosis in Healthcare", the National Academy of Medicine identified a better understanding of the performance of diagnostic tests as an imminent priority for patient safety [1]. Systematic reviews, which incorporate findings from multiple primary studies, can increase confidence in our understanding of the accuracy of diagnostic tests in detecting medical conditions or diseases [2]. Systematic reviews and meta-analyses are cited more than any other study design and are prioritized in clinical practice guidelines [3][4][5]. Consistent with this, the number of systematic reviews, including those on diagnostic test accuracy (DTA), has grown extremely rapidly over the past decade [6,7].
When systematic reviews and meta-analyses are poorly reported, readers are not able to assess the quality of the review and its underlying primary studies or to weigh the applicability of its conclusions. Thus, incomplete or inaccurate reports that do not transparently and completely convey review methods and results may mislead readers, rather than clarify the true value of a test. This contributes to waste of scarce medical research resources [8,9] and hinders efforts to ensure the reproducibility of research. Previous studies have shown that many published DTA systematic reviews are not adequately reported [10,11].
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement is a 27-item checklist and flow diagram that aims to provide guidance on complete and transparent reporting of systematic reviews [12]. Use of reporting guidelines, such as PRISMA, is associated with more informative reporting of medical research [10]. PRISMA was developed primarily for systematic reviews of medical interventions. While DTA systematic reviews share some common elements with intervention reviews, there are important differences. Thus, some items in the original PRISMA checklist may not apply to DTA reviews, and some essential items necessary for reporting DTA systematic reviews may be lacking [2,6,13,14]. Existing guidance for reporting of DTA systematic reviews is limited to nonsystematic "expert opinion" [2,15,16], guidance on specific methodologic items [6,17], or work that is not yet complete [18].
The PRISMA-DTA group is developing an extension for DTA systematic reviews and meta-analyses. As the initial step, we performed a systematic review of existing guidance on reporting of DTA systematic reviews in order to compile a list of potential items that might be included in a reporting guideline for such reviews, the PRISMA extension for DTA (PRISMA-DTA).

Methods
The protocol for this review is available on the EQUA-TOR network's website (http://www.equator-network.org/) in "guidelines under development" [19].

Database search
To identify published articles pertaining to reporting of DTA systematic reviews, an experienced medical information specialist (BS) developed a search strategy through an iterative process in consultation with the review team. The strategy was peer-reviewed prior to execution by another senior information specialist using the PRESS checklist [20]. Using the Ovid platform, we searched Ovid MED-LINE® and Ovid MEDLINE® In-Process & Other Non-Indexed Citations and Embase Classic+Embase on May 5, 2016. We also searched the Cochrane Methodology Register in the Cochrane Library, which contains records published July 2012 and earlier, (Wiley version) on the same date. Strategies used a combination of controlled vocabulary (e.g., "Diagnostic Tests, Routine," "Review Literature as Topic," "Publication Bias") and keywords (e.g., "DTA," "systematic review," "reporting"). Vocabulary and syntax were adjusted across databases. There were no date or language restrictions on any of the searches. Specific details regarding search strategies appear in Appendix 1.
Inclusion/exclusion criteria, study selection, and data extraction We included articles in full-text or abstract form that reported on any aspect of reporting DTA systematic reviews. Specifically, we included studies that evaluated the quality of reporting of any aspect of DTA systematic reviews and studies that provided guidance or suggestions as to how a DTA systematic review should be performed.
Titles and abstracts of all search results were screened independently for potential relevance by two investigators (MA, MDFM). For any citation deemed potentially relevant, full texts were retrieved and independently assessed in duplicate for inclusion with disagreements being resolved by consensus (TAM, MDFM). To facilitate the extraction process, studies were divided into several categories pertaining to the specific reporting topics: assessment of quality of reporting, general guidance on performing or reporting DTA systematic reviews, guidance on search methods for primary DTA studies, assessment of heterogeneity, pooling and metaanalysis methods, assessment of publication bias, risk of bias, and "other." Reference list of included sources is provided in Appendix 2.
In addition to sources related to DTA systematic reviews, the following sources were reviewed: reporting guideline organizations' websites (Enhancing the QUAlity and Transparency of Health Research (EQUATOR) [21]), guidance for reporting systematic reviews and metaanalyses of other types of research (Meta-analysis of Observational Studies in Epidemiology (MOOSE) [22], PRISMA [12], PRISMA extensions [23][24][25][26][27]), guidance for reporting diagnostic test accuracy studies (STARD 2015 [28], STARD for abstracts), guidance for, or tools for assessing the methodologic quality of systematic reviews and meta-analyses (A Measurement Tool to Assess Systematic reviews (AMSTAR) [29], risk of bias in systematic reviews (ROBIS) [30], Methodological Expectations of Cochrane Intervention Reviews (MECIR) [31]), and The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (completed chapters) [18]. Post hoc assessment of the following items not included in the initial search was done: the Agency for Healthcare Research and Quality (AHRQ) Methods Guide for Comparative Effectiveness Research, the Institute of Medicine's 2011 Standards for Systematic Reviews and the Centre for Reviews and Dissemination guidance [32][33][34]. No additional items were generated from these sources.
The PRISMA and STARD 2015 checklists were initially assessed independently and in duplicate in order to compile a list of potentially relevant items for the PRISMA-DTA statement. Any item that was deemed possibly relevant to DTA systematic reviews by either investigator was included. Next, all other guidance documents (reporting checklists, The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, etc.) and full texts of potentially relevant records were similarly assessed in duplicate for additional potentially relevant items (TAM, MDFM). Again, any item that was deemed possibly relevant to DTA systematic reviews by either investigator was included. Items deemed relevant may have had wording changed from the original source to make them more applicable to systematic reviews of diagnostic test accuracy and/or broken into multiple sub-items to facilitate the Delphi process for PRISMA-DTA. All included items were used to generate a comprehensive summary of existing guidance on reporting of DTA systematic reviews.

Database search
The database search yielded 6967 results. After title and abstract screening, 386 results remained. This was further reduced to 203 results after full-text screening (Fig. 1).

Identification of potentially relevant items
After searching the existing literature and guidance documents, a preliminary list of 64 unique items was compiled and divided into the following categories mirroring the PRISMA statement: title (three items); introduction (two items); methods (35 items); results (13 items); discussion (nine items), and disclosure (two items). The methods section was further divided into eligibility criteria and search strategy (10 items), study selection and data extraction (seven items), primary study data items that should be provided (one item containing 10 sub-items.), risk of bias and heterogeneity (six items), and summary measures and statistics (11 items). The identified items along with citations for the sources from which they were taken are presented in Table 1; shaded items on the table indicate items specific to diagnostic accuracy systematic reviews, while unshaded items represent more general guidance for systematic reviews.

Summary of rationale for relevant items
This section will highlight some of the items that are proposed that have particular relevance to DTA systematic reviews.
Title: The potential items listed in this section aim to clearly identify "big picture" components of study design; this not only allows immediate reader comprehension, but enhances indexing and searchability. Items 1 and 2 are drawn from PRISMA and STARD 2015 and require that the title indicate that the study is a systematic review (item 1) and is a study of diagnostic accuracy (item 2). Item 3 required reporting on whether the study design is comparative (one test vs. another) or non-comparative; comparative design is increasingly important, common, and associated with methodologic challenges [37].
Introduction: Item 4 requires framing the role of the index test in the existing clinical pathway; understanding the clinical role of a test is essential to generalizability of findings. For example, if a test evaluation focuses on a "triage" test (e.g., d-dimer for determination of pre-test probability prior to CT pulmonary angiogram), it may not be appropriate to generalize its use as a "replacement" test (e.g., d-dimer as a replacement for CT). The performance of diagnostic tests is variable depending on the specific clinical scenario [28,47].
Methods-protocol, eligibility, and search: All items in this section are generalizable to all systematic reviews; none were deemed to be specific to DTA systematic reviews.
Methods-study selection and data collection: Multiple items in this section focus on specific details of the search strategy and are aimed at enhancing reproducibility. None of these is of particular specific relevance to DTA reviews; however, detail additional to that recommended by PRISMA has been listed since subsequent systematic review methodologic recommendations have suggested their inclusion [31].
Methods-primary study data items: Item 25 focuses on which characteristics from primary studies included in a review should be reported. Several aspects of this item are unique to DTA systematic reviews, such as  Table 1 Potential relevant items for PRISMA-DTA checklist. Items deemed by the authors to apply specifically to DTA reviews are in Bold Item Ref Title [12] 17 Provide an appendix with studies excluded, with reasons for exclusion, during full-text screening [12] 18 Describe method of data extraction from reports [12] 19 Report which data items were extracted from included studies [12] 20 Report how studies for which only a subgroup of participants is relevant to the review will be handled [31] 21 Report how "indeterminate" or "missing" results for either the index test or reference standard were dealt with in the analysis [40] 22 Report if and how any parameters beyond test accuracy will be evaluated (e.g., cost-effectiveness, mortality) [46] Methods: primary study data items 25 Describe if and how "piloting" the risk of bias tool was done [14] 26 List criteria used for risk of bias ratings applied during the review [31] 27 Describe methods for study quality assessment [12] 28 Provide measures of consistency (e.g., tau 2 ) for each meta-analysis [12]  Describe test used to assess for publication bias [12] Methods: summary measures and statistics 30 State the principal summary measures of diagnostic accuracy to be assessed [28] 31 Report whether summary measures were calculated on a per-patient or per-lesion basis [31] 32 Report pre-defined criteria for minimally acceptable test performance [42] 33 State how multiple readers of an index test were accounted for [17] 34 Report the statistical method used for meta-analysis (e.g., hierarchical model) [2] 35 State which software package and macros was used for meta-analysis [6] 36 Report any programming deviations made from published software packages [6] 37 If comparative design, state the statistical methods used to compare test accuracy [28] 38 Describe methods of additional analyses (e.g., subgroup), indicating whether pre-specified [12] 39 Report how subgroup analyses were performed [31] 40 When performing meta-regression report the form of factors being explored (categorical vs. continuous) and the cut-off points used [41] Results 41 Report studies from screen to inclusion, ideally with a flow diagram [12] 42 For each study, present characteristics for which data were extracted and provide the citations [12] 43 Present data on risk of bias of each study on a per-item or per-domain basis [12,14,35] 44 Present results of any assessment of publication bias [12] 45 Report any adverse events or harms from index test or reference standard [31] 46 For each study report 2 × 2 data (TP, FN, FP, TN) [43,45] 47 For each study report summary estimates of accuracy and confidence intervals [28] 48 Report each meta-analysis including confidence intervals and measures of consistency (e.g., tau2) [12] 49 Graphically display results with an ROC curve or forest plots of sensitivity and specificity [44] 50 Report additional analyses (e.g., meta-regression) [12] 51 Report risk of bias in the synthesis (e.g., analyses stratified by risk of bias) [31] 52 Report summary of findings table with main outcomes and issues re: applicability of results [31] 53 Report "frequency" tables of 2 × 2 data demonstrating potential findings in a patient population based on the prevalence [45] Discussion 54 Summarize findings including implications for practice [12,28] 55 Provide a general interpretation of the results in the context of other evidence and implications for future research [12] 56 For comparative design, report whether conclusions were based on direct vs. indirect comparisons [37] 57 Discuss the implications of any missing data [31] 58 Discuss applicability concerns to different populations/settings [14,45] 59 Discuss quality of included studies when forming conclusions [36] 60 Account for any statistical heterogeneity when interpreting the results [31] 61 Discuss the potential impact of reporting biases [31] 62 Discuss the five GRADE considerations (study limitations, consistency of effect, imprecision, indirectness, and publication bias) [31] Disclosure 63 Describe sources of funding for the review and role of funders [12] 64 Report potential relevant conflicts of interest for review investigators [36] "Ref" = source reference(s) for the item index test, reference standard, target condition definition, test positivity thresholds, and clinical setting. All this information is vital for readers to make an appropriate assessment of the review.
Methods-risk of bias and heterogeneity: Assessment of study quality and heterogeneity are not unique to DTA reviews. However, study quality assessment for diagnostic accuracy studies includes assessment of risk of bias and concerns regarding applicability, thus the quality assessment tool used in DTA reviews should capture and report these issues (item 24) [14]. Additionally, since sensitivity and specificity are correlated, univariate measures of heterogeneity, such as I 2 , are typically not appropriate to report heterogeneity in diagnostic test accuracy reviews. Thus, heterogeneity may be reported either qualitatively or using measures that account for the correlation between sensitivity and specificity (item 28) [2].
Methods-summary statistics: Multiple readers may interpret an index test. How this is accounted for statistically may affect the results and, therefore, should be reported (item 33) [17]. An important difference in DTA meta-analysis from interventions is the correlation between sensitivity and specificity. Thus, it is very important to report the statistical model used for meta-analysis so readers can determine the impact of these methods on the results (item 34) [6].
Results: In order to facilitate reproduction of analyses and to make it clear to the readers which data was metaanalyzed, 2 × 2 data for each study included in meta-analyses should be made available (item 46) [43,45].
Discussion and disclosure: All items in this section are generalizable to all systematic reviews; none was deemed to be specific to DTA systematic reviews.

Discussion
We consulted existing guidance on the reporting of systematic reviews and the published literature related to the conduct and reporting of DTA systematic reviews to identify 64 potential items for reporting DTA systematic reviews. The systematic, comprehensive search categorized by manuscript section builds on prior work, which has been based on non-systematic searches and expert opinion. The items identified will form the basis of a Delphi process that will be conducted to generate the PRISMA-DTA checklist. Items have been broken down into single concepts or descriptors for the Delphi process. During the Delphi process, suggestions from the PRISMA-DTA group will be incorporated. Thus, some items may not appear on the final PRISMA-DTA checklist. Additionally, PRISMA-DTA group members may propose additional items during the Delphi process. Wording of items as presented here may also be adjusted at the PRISMA-DTA consensus meeting. Therefore, it is advised to consult the final checklist after it has been published for use in guiding reporting systematic reviews of diagnostic test accuracy. This evaluation improves on prior work, which has largely been based on non-systematic reviews, and expert opinion. The work is a small but essential step towards a clear reporting guideline for DTA systematic reviews. Future work should not only include creating the PRISMA-DTA checklist, but evaluating for "baseline" adherence to PRISMA-DTA in order to guide knowledge translation interventions aimed at targeted improvements for reporting of DTA systematic reviews.

Strengths and limitations
This systematic review benefits from a comprehensive, expert, peer-reviewed search, duplicate extraction, and categorization of potentially relevant items by manuscript section which mirrors the format of the PRISMA checklist. Limitations of our systematic review are that we did not formally assess the quality of sources for included items, we provide only a qualitative summary, and we may not have identified potentially relevant items from work yet to be published. We believe that many of these shortcomings will be addressed in the process for generation of the PRISMA-DTA checklist as outlined in our complete study protocol [48].

Conclusions
The reporting of DTA systematic reviews is often incomplete [10,11,49]. Incomplete reporting has been identified as a preventable source of waste in biomedical research [43]. Therefore, a reporting guideline specific to DTA systematic reviews is needed to reduce waste, increase utility, and facilitate reproducibility of these reviews. This systematic review is the first step towards gathering all relevant evidence pertinent to reporting of DTA systematic reviews. This step is critical in the EQUATOR network's established guidance for reporting guidelines development [50]. This information will serve as the substrate for a PRISMA-DTA extension to guide reporting of DTA systematic reviews and will complement the more than 300 reporting guidelines indexed by the EQUATOR Network [21].