The relationship between methodological quality and the use of retracted publications in evidence syntheses

Background Evidence syntheses cite retracted publications. However, citation is not necessarily endorsement, as authors may be criticizing or refuting its findings. We investigated the sentiment of these citations—whether they were critical or supportive—and associations with the methodological quality of the evidence synthesis, reason for the retraction, and time between publication and retraction. Methods Using a sample of 286 evidence syntheses containing 324 citations to retracted publications in the field of pharmacy, we used AMSTAR-2 to assess methodological quality. We used scite.ai and a human screener to determine citation sentiment. We conducted a Pearson’s chi-square test to assess associations between citation sentiment, methodological quality, and reason for retraction, and one-way ANOVAs to investigate association between time, methodological quality, and citation sentiment. Results Almost 70% of the evidence syntheses in our sample were of critically low quality. We found that these critically low-quality evidence syntheses were more associated with positive statements while high-quality evidence syntheses were more associated with negative citation of retracted publications. In our sample of 324 citations, 20.4% of citations to retracted publications noted that the publication had been retracted. Conclusion The association between high-quality evidence syntheses and recognition of a publication’s retracted status may indicate that best practices are sufficient. However, the volume of critically low-quality evidence syntheses ultimately perpetuates the citation of retracted publications with no indication of their retracted status. Strengthening journal requirements around the quality of evidence syntheses may lessen the inappropriate citation of retracted publications.


Introduction
Scientific discovery is not a linear process; it requires verification, replication, and correction.This correction may come in the form of retraction of scientific publications and could be necessitated by a range of reasons, from errors in methodology or reporting, ethical concerns or authorship disputes, or data falsification and fabrication.Retractions are becoming an increasingly common corrective mechanism, growing from an average of 240 retractions per year in the 2000s to over 1500 retractions per year in the subsequent decade [1].However, to say that science has been corrected would indicate that researchers understand that a publication has been retracted and the context surrounding the retraction.The continued use of retracted publications can be considered a proxy for the efficacy of this corrective mechanism.If a retracted publication is treated as if it is valid and used as the groundwork for future research, this may indicate that the process of retraction is ineffective.
Citation of retracted publications is not inherently problematic, as citation may not necessarily be an endorsement.The examination of the meaning and function of citations is well-established, with formal approaches dating back to the early 1960s [2,3].Since then, two primary schools of thought have been established: the social constructivist, in which scholars use citations for persuasive purposes, and the normative, in which scholars use citation as a means of giving credit and acknowledgement of achievement [4,5].Numerous taxonomies have been created [6][7][8][9][10].While these taxonomies vary in degrees of granularity, structure, and terminology, the majority consider the interpretation of the scholar's positioning of the work they were citing (i.e., whether an endorsement or a critique), whether the cited material is being treated as something core to the scholar's work or is a more limited level of engagement.
However, taxonomies which are based upon researcher motivations have faced criticism.As Tahamtan and Bornmann note, the choice to cite a particular document may be motivated by factors beyond the control of the author, including external influences such as the recommendations of editors or peer reviewers, or characteristics specific to the author, such as academic background and topic knowledge [11].Furthermore, attempts to uncover researcher motivation in their citation decisions are often problematic and subject to the limitations of any data gathered through self-report [12].
Emerging approaches to citation sentiment analysis, which are influenced by computational linguistics, natural language processing, and machine learning, attempt to employ broader categories of positive, negative, and neutral [13].These approaches do not attempt to determine the author's intention, but instead consider the citation in the context of the manuscript and describe how the citation is operating in the scholarly literature [14].
Previous research has shown that retracted publications indeed continue to be cited by other researchers and are often cited without any recognition of the retraction status of the research [15][16][17].Such citation is particularly problematic when these citations are occurring in systematic reviews and other types of evidence syntheses.Evidence syntheses are often positioned at the pinnacle of the evidence hierarchy and are intended to be a rigorous examination of the totality of the evidence on a particular topic, with the potential to provide the basis for decisions in policy and practice [18][19][20].Evidence syntheses are the preferred research method underpinning patient care decisions, decisions on health insurance provision and coverage, health system policy decisions, and more.A lack of effective evidence synthesis has been directly tied to delays in implementing effective treatment options for patients, perpetuation of ineffective and harmful treatments, unnecessary risks to patients and research participants, and inefficient use of research funding and resources [21].However, for evidence synthesis to be effective in improving policy and practice, it must be based upon sound science.In contrast, the majority of retracted publications are retracted due to misconduct [22][23][24].This may include compromised peer review processes, data falsification, image manipulation, and fabricated results.
Despite the importance of evidence synthesis and its function as a rigorous examination of the evidence, they are not immune to the potential impact of retracted publications.Kataoka et al. examined 587 systematic reviews and clinical practice guidelines (CPGs) that cited retracted randomized controlled trials (RCTs) [25].They found that of the 252 systematic reviews and CPGs that cited previously retracted publications, 67% made no mention of the retracted status of the RCT.Of the 335 systematic reviews and CPGs that cited RCTs that were later retracted, 3% were later corrected and 11% excluded the RCT at the time, either due to concerns about the study or inclusion criteria.Eighty percent incorporated the retracted RCT and were not subsequently corrected.This reinforces Avenell et al. 's previous findings that of 68 evidence syntheses that cited retracted publicationsincluding 13 of which would have their findings changed by the removal of the retracted publication from the analysis-only one undertook reassessment [26].
When there is potentially flawed research incorporated into evidence syntheses, it may raise questions about the conclusions of these syntheses and the rigor of the methods that produced them.Findings on the impact of retracted publications on the statistical findings of metaanalyses have varied significantly.One recent study found that in their sample of 229 meta-analyses cited retracted publications, only 21 indicated that the retracted publication had been retracted; however, removing the data associated with those retracted publications from the pooled summaries of meta-analyses did not significantly alter the results [15].In contrast, other studies show substantial influence of retracted publications.A reanalysis of 22 meta-analyses removing data associated with a retracted publication altered the results of over half of the meta-analyses [27], and the exclusion of two publications from a meta-analysis of ivermectin and COVID-19 invalidated that meta-analysis's previous finding of decreased mortality [28].
While the impact of retractions on the findings of specific meta-analyses may vary, treating retracted publications as valid research in the context of evidence syntheses is problematic.Not only does it have the potential to influence findings of the synthesis, but it also perpetuates the use of the retracted publication and its associated findings and undermines the function of retraction as a corrective mechanism.While previous research affirms that retracted publications continue to be cited as valid in evidence syntheses, the methodological quality of these evidence syntheses and its relationship to the use of retracted publications has not been explored.To the best of our knowledge, no research has assessed the methodological quality of evidence syntheses citing retracted publications.
We sought to address the following research questions: 1) What is the methodological quality of evidence syntheses citing retracted publications?2) Are evidence syntheses citing retracted publications indicating that the publications have been retracted?3) Is there an association between the methodological quality of the evidence syntheses and whether they indicate that the publications have been retracted?4) Is there a relationship between the length of time between publication and retraction and an association with a retracted publication being indicated as such or the methodological quality of the evidence synthesis?

Methods
A previous research project identified evidence syntheses that cited retracted publications in pharmacy [29].This project was based on retracted publications identified through the Retraction Watch Database [30].From a list of retracted publications, we created a subset of 1396 retracted publications in the fields of pharmacology, toxicology, and drug design.These fields were selected to reflect the breadth of the field of pharmacy, which was chosen as it extends from bench research to clinical care and has potential impact on other healthcare specialties, such as surgery, anesthesia, and family medicine.Known item searching was then conducted in Web of Science and Scopus to retrieve all citing publications.32,559 publications which cited these retracted items were retrieved.Titles and abstracts of citing publications were then screened by two independent reviewers using Rayyan to confirm that they were evidence syntheses.This was then followed by full-text screening phase, which was completed by two independent researchers.Any discrepancies were resolved through consensus.Publications were excluded if they were not evidence syntheses, which was defined as systematic reviews, scoping reviews, rapid reviews, meta-analyses, or clinical practice guidelines.Publications were also excluded if they were subsequently retracted.Evidence syntheses published in languages other than English were excluded due to the linguistic nuance necessary to assess citation sentiment.No limitations were placed on year of publication.
This previous project identified 1096 citations to retracted publications in evidence syntheses, including 712 that occurred prior to retraction and 384 that occurred after the publication had been retracted.This research project isolated 384 citations occurring in 310 evidence syntheses.From the original set of 310 evidence syntheses, 24 were excluded because the evidence synthesis was not in English (n = 9), the evidence synthesis was subsequently retracted (n = 4), the citation to the retracted article could not be found in the full text or the references (n = 6), the item was a duplicate or co-publication (n = 4), or it was determined not to be an evidence synthesis (n = 1).Our sample included 286 evidence syntheses containing 324 citations to retracted research.The process of identifying these evidence syntheses is shown in Fig. 1.

Fig. 1 Identification of evidence syntheses citing retracted publications
We utilized previous mapping of the Retraction Watch Database's reason for retraction to a modified version of Bar Ilan and Halevi's taxonomy of reasons for retraction [31].Bar Ilan and Halevi's taxonomy includes three broad classifications: (1) scientific distortion, including data falsification and errors; (2) ethical misconduct, including plagiarism and IRB issues; and (3) administrative error, such as a journal erroneously publishing the wrong version of an article.We further subdivided scientific distortion to scientific distortion-falsification and manipulation, which refers to instances of intentional distortion, and scientific distortion-concerns or errors, in which scientific distortion occurred but intention was not proven, such as an unintentional error in data collection or analysis.Bar Ilan and Halevi posited that scientific distortion was the most problematic of the classifications, as the publication's findings may be unsound and could subsequently lead to misdirected research in the future or false conclusions.While ethical misconduct is troubling, it does not necessarily invalidate the findings of the research.
Data collection was completed using a Qualtrics form.We established agreement in our assessment by having all researchers independently code 5% of the total sample.Upon assessment of the sample, we found near perfect agreement and subsequently the remaining evidence syntheses were assessed by one independent reviewer.Methodological quality of the evidence synthesis was assessed using the AMSTAR 2 criteria [32].The AMSTAR 2 checklist is designed to aid in the assessment of the methodological quality of systematic reviews.The 16 questions in AMSTAR 2 relate to 16 domains or potential weaknesses, 7 of which are critical and 9 of which are non-critical.A critical weakness is one that is thought to have potential impact on the overall validity of findings, while a non-critical weakness is one that is indicative of methodological quality but may not impact overall validity.The AMSTAR 2 criteria result in one of four overall ratings: high quality, moderate quality, low quality, and critically low quality.A high-quality review has no more than one non-critical weakness, a moderatequality review has more than one non-critical weakness but no critical weaknesses, a low-quality review has one critical weakness, and a critically low-quality review has more than one critical weakness.
We used scite.ai. to capture the sentiment of the citation, by which we mean the reason the authors were citing the paper.Scite is a web-based tool that uses machine learning algorithms to identify which articles have cited which studies, and whether those citations were supporting, mentioning, or contrasting [33].Scite refers to this as a classification of rhetorical function, intending to describe whether the citing paper is supporting or contrasting claims made in the paper it is citing.Scite's terminology has been aligned with the more commonly used positive, negative, and neutral categories prominent in citation sentiment analysis for clarity.We used a second Qualtrics form to capture scite's assessment of the sentiment of the citation.Researchers independently noted their agreement or disagreement with the assessment.In the case where the researcher disagreed with the scite's assessment, the researcher's interpretation was recorded.Assessments were reviewed by all researchers to ensure consistency and agreement.The findings of this assessment are outlined elsewhere [34].
To assess for associations between the categorical variables of citation sentiment and methodological quality and reason for retraction, we conducted a Pearson's chi-square test with a simulated p value based on 2000 replicates.To investigate association between the mean time between publication and retraction for publications grouped by methodological quality and grouped by citation sentiment, we conducted one-way ANOVAs and post hoc Tukey tests.All analyses were conducted in R 4.1.1.

Results
Of the 286 syntheses citing retracted publications, the majority (199, 69.6%) were found to be of critically low quality, 49 (17.1%) were found to be of low quality, 21 (7.3%) were found to be of moderate quality, and 17 (5.9%)were found to be of high quality according to AMSTAR 2. Details of the findings of the methodological assessment are outlined in Table 1.
Of the 324 citations to retracted publications, the majority (140, 43.2%) were positive while 118 (36.4%) were neutral and 66 (20.4%) were negative.We found statistically significant associations between citation sentiment and methodological quality (X 2 : 44.39, df = NA, p = 0.0005).High-quality studies were more associated with negative statements and were less associated with positive statements, while critically low-quality studies were more associated with positive statements and less associated with negative statements.We also found a statistically significant association between citation sentiment and reason for the article's retraction (X 2 : 28.405, df = NA, p value = 0.001).Articles retracted for scientific distortion due to falsification and manipulation were more associated with negative citations and were less associated with positive citations, while articles retracted due to ethical misconduct were more associated with positive citations.These findings are detailed in Table 2.
Within our sample, the mean time between publication and retraction was 1646 days or 4.5 years.There was no statistically significant association between timing and the quality of the evidence synthesis (p = 0.25).There was a statistically significant association between timing and citation sentiment (p = 0.0386, f value = 3.287).On average, positive citations had a period between publication and retraction that was 564 days (1.56 years) shorter than negative citations.Findings are detailed in Table 3.A one-way ANOVA revealed no statistically significant interactions between the reason for retraction and the citation sentiment when considering the time between publication and retraction.

Discussion
This study found that the vast majority of evidence synthesis studies citing retracted publications were of critically low quality with only 7% being of high quality.These findings are in line with other research investigating the quality of evidence syntheses in general.One study investigating the quality of systematic reviews in dentistry found that 68% of the reviews were of critically low quality and none was of high quality [35], while another in urology found that only 4.2% of reviews were of high

Critical domains/weaknesses
Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol?63 ( 22  quality [36].We found that high-and moderate-quality evidence syntheses were associated with negative citations to retracted publications, while syntheses of critically low quality were associated with positive citations and did not include negative citations.The association between methodological quality and the ways in which retracted publications were cited may indicate that existing best practices in conducting evidence syntheses are effective in addressing the challenge of retracted publications.
There is increasingly robust documentation on best practices in identifying retractions which has become available [37], as well as technological innovations such as the integration of Retraction Watch data into EndNote and Zotero [38,39].The recent integration of Retraction Watch data by Third Iron into their LibKey and Browzine products has the potential to further these efforts by alerting library users of a publication's retracted status [40].Surfacing this information at the point of discovery rather than positioning it as an additional verification step has the potential to further help researchers identify flawed research and avoid integrating it into their own work.Such technical solutions, particularly where they leverage third-party, vendor agnostic data, can have a tremendous impact in improving the clarity and consistency with which the retracted status of publications are communicated.
While this is heartening, enthusiasm is somewhat tempered when considering the large number of poor-quality evidence syntheses found.It is difficult to determine why low-quality evidence syntheses are undertaken and continue to be published.Previous research has found that the rate at which evidence syntheses are produced has grown dramatically-1930% between 2000 and 2019 [41]-but that the citation and usage of evidence syntheses have decreased over time.Halevi and Pinotti found that as the number of systematic reviews increased, the average number of citations, citations in policy documents, downloads, and views decreased [42].This could suggest that an exponential increase in published evidence synthesis results in poorer quality studies going unnoticed due to the decrease in citations and views.
Despite the overall low quality of the evidence syntheses in this sample, it is possible that the quality is still being overestimated.AMSTAR 2 has several limitations as a tool for methodological assessment.The assessment of the adequacy of the search may not be sufficient.While the AMSTAR 2 criteria do require that "[k]ey words and/or MESH terms should be reported and the full search strategy available upon request" [32], there is no requirement to consider the appropriateness of the terms (subject terms or keywords alone are appropriate) or the comprehensiveness, structure, or replicability of the search.The PRESS Guidelines provide a significantly more in-depth assessment mechanism with requirements for Boolean operators, subject headings and keywords, spelling variations, and filters [43].PRISMA-S and PRISMA 2020 Reporting Checklists-both of which were released after AMSTAR 2-provide further tools for assessing the comprehensive reporting of searches [44,45].PRISMA-S complements PRISMA 2020 and offers guidelines creating reproducible searches for evidence synthesis.The AMSTAR 2 search criteria may need to be updated to reflect the comprehensiveness and rigor a systematic review search strategy requires.
Previous research on the experience of information professionals engaged in evidence syntheses has found significant challenges in ensuring methodological rigor.Surveys have found that between 28 and 68% of information professionals report challenges with researchers not following appropriate systematic review methodology [46,47].While low-quality evidence syntheses are regularly published in a range of disciplines, few journals provide guidance on methodological quality of systematic reviews and meta-analyses in their author guidelines [48].Strengthening journal requirements around the methodological quality of systematic reviews and metaanalyses may aid in lessening the inappropriate citation of retracted publications.
The majority of citations to retracted publications in these evidence syntheses did not indicate that the publication had been retracted.We found that the majority of our citations were positive, followed by citations that mentioned the retracted publication in passing but did not indicate that it had been retracted.This would indicate that, in our sample, evidence syntheses that cite retracted publications are not identifying that the publication has been retracted.These findings are consistent with previous research, including two studies which independently found that over 94% of citations to retracted publications do not indicate that the publication was retracted [16,49].Future research could explore the correlations between citing retracted publications in evidence syntheses and having a librarian or informational professional as a co-author.
We found an association between publications retracted due to ethical misconduct and positive citation, while publications retracted due falsification and manipulation were more associated with negative citations.In the context of Bar-Ilan and Halevi's taxonomy, this does indicate that the most potentially damaging science is being recognized as such in our sample.However, the continued positive citation of publications previously retracted for ethical misconduct is nevertheless problematic.While retraction is meant to correct the scientific record, it is also intended to disincentivize unethical behavior.While some previous research has found that retracted publications receive fewer citations than their non-retracted counterparts [50], both citation and selfcitation of publications continue following retraction [51].While it is beyond the scope of this project to investigate the associations between the impact on findings and the reason for retraction, this would be a useful area of future research.
Negative citations were associated with a significantly longer time between publication and retraction.This may initially appear counterintuitive, as one might expect that a publication that has a longer time in the scholarly ecosystem without correction would become more entrenched and therefore more likely to accrue positive or neutral citations.However, it should be noted that of the 66 negative citations in our sample, 28 were associated with 4 primary authors, 3 of whom have the dubious distinction of being in the top 5 on Retraction Watch's Leaderboard [52].The retractions were generally associated with long-standing ethical and scientific misconduct which spanned multiple years and impacted a cumulative 476 publications.Removal of the 28 publications associated with these 4 researchers reduced the average period between publication and retraction to 1423 days (3.89 years), which does not differ significantly from that of the positive citations (1412 days, 3.86 years).It is possible that the notoriety of these cases and the publicity surrounding these retractions increased the likelihood that they would receive negative citations in comparison to retractions that did not receive as much publicity.
While it is not possible to state conclusively why individuals are citing retracted publications positively based on publication data alone, previous research describes the inconsistency with which the retracted status of publications is displayed.One 2018 study looked at this issue across different bibliographic databases, finding that some platforms display fewer than 5% of retracted publications as retracted [53].A more recent 2020 white paper reinforced these findings and noted the variability even within a single journal [54].This inconsistency in the indication of the retracted status of publications has been found in disciplinary journals, including research in emergency medicine and dentistry which found that watermarking of retracted publications ranged from 40 to 57% [55,56].While we cannot state that an inconsistent representation perpetuates the citation of retracted publications, it does stand to reason that if publications are not clearly marked as being retracted, a reader would be less likely to realize that the publication had been retracted and would therefore be less likely to reflect that understanding in their own work.
Authors of systematic reviews can play a dual role in both modifying their existing practices and in advocating for clearer and more consistent representation by publishers and aggregators.A recently launched NISO Working Group is developing recommended practices for metadata display and transfer "to improve the dissemination of retraction information and to support consistent, timely transmission of that information to the reader" [57].Such guidelines may ultimately lead to more consistently and accurately represented retractions.Adoption of these forthcoming guidelines, as well as the development and implementation of retraction policies, should be encouraged.
Evidence syntheses are a powerful tool to identify, appraise, and synthesize scholarship and to improve patient care, accelerate research, and contribute to evidence-based policy.However, for evidence synthesis to perform these functions, it must be based upon sound science.Evidence syntheses that include retracted publications without indication of their retracted status perpetuate the citation and use of those publications and raise questions regarding the rigor of these evidence syntheses and the validity of their findings.As evidence syntheses may impact clinical decision making and policy, the downstream impact of evidence syntheses that incorporate retracted publications, including their incorporation into policy and practice guidelines would be a useful area for future research.
Our study has several limitations.We focus specifically on evidence syntheses in the field of pharmacy due to its breadth and impact.Previous research has established concerns around methodological quality of systematic reviews in a range of fields [35,36] and has found that publications citing retracted items in multiple disciplines do not indicate that these items have been retracted [16,17,49].However, it is possible that the associations we found between methodological quality and citation sentiment may not be generalizable to other disciplines.Future research could extend this research to other disciplines.
Citation sentiment analysis cannot determine why an author chose to cite a particular article or conversely how they became aware of the retracted status of an article.It also cannot determine whether existing methodological guidance was influential in uncovering the retracted status of an article.Future research should consider how authors become aware of the retractions and which mechanisms and interventions are most effective.

Conclusion
Science requires continual revision; it is a process of adjusting theories in response to contradictory evidence [58].However, for that adjustment to occur, this contradictory evidence must be observed.Significant progress has been made in providing guidance to facilitate identification of retracted publications in evidence syntheses.Despite the availability of this guidance, the continued use of retracted publications in evidence syntheses is common and may be a consequence of the prevalence of low-quality evidence syntheses.While ongoing efforts to educate users about this issue are valuable, requirements from journals and publishers, and the adoption of consistent practices by bibliographic databases have the potential to modify researcher behavior, improving the overall quality of evidence syntheses while ensuring that retracted publications are recognized as such.

Table 1
Assessment of methodological quality according to AMSTAR 2 criteria If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis?

Table 2
Factors associated with citation sentiment

Table 3
Days between publication and retraction and its association with methodological quality and citation sentiment