Publication bias in otorhinolaryngology meta-analyses in 2021

Introduction One concern in meta-analyses is the presence of publication bias (PB) which leads to the dissemination of inflated results. In this study, we assessed how much the meta-analyses in the field of otorhinolaryngology in 2021 evaluated the presence of PB. Methods Six of the most influential journals in the field were selected. A search was conducted, and data were extracted from the included studies. In cases where PB was not assessed by the authors, we evaluated the risk of its presence by designing funnel plots and performing statistical tests. Results Seventy-five systematic reviews were included. Fifty-one percent of them used at least one method for assessing the risk of PB, with the visual inspection of a funnel plot being the most frequent method used. Twenty-nine percent of the studies reported a high risk of PB presence. We replicated the results of 11 meta-analyses that did not assess the risk of PB and found that 63.6% were at high risk. We also found that a considerable proportion of the systematic reviews that found a high risk of PB did not take it into consideration when making conclusions and discussing their results. Discussion Our results indicate that systematic reviews published in some of the most influential journals in the field do not implement enough measures in their search strategies to reduce the risk of PB, nor do they assess the risk of its presence or take the risk of its presence into consideration when inferring their results. Supplementary Information The online version contains supplementary material available at 10.1186/s13643-023-02404-0.


Rationale
The Catalogue of Bias, a collaboration dedicated to describing a wide range of biases and outlining their potential impact on research studies, defines publication bias (PB) as "when the likelihood of a study being published is affected by the findings of the study" [1].Several factors could lead to PB including selective publication of positive findings, selective publication of statistically significant findings, selective publication of "interesting" findings, and publication according to the quality of the trial or its funding [2].A study in 2009 demonstrated that randomized controlled trials with positive findings were 3.9 times more likely to be submitted and published than trials with negative or null findings [3].The OPEN project (Overcome failure to Publish nEgative fiNdings), a project funded by the European Union to investigate the extent and impact of dissemination bias, has encouraged systematic reviewers (SRs) to follow "the best practices" in conducting systematic reviews (SR) (especially practices concerning the assessment of the impact of dissemination bias) and publishing the protocol and results of their SRs publicly.Regarding "the best practices" in conducting SRs, the members of the OPEN project have proposed the Cochrane Handbook for Systematic Reviews of Interventions [4] and the standards for SRs stated by the Institute of Medicine of the National Academy of Sciences [5].The main recommendation in these guidelines for avoiding PB is to search for evidence that includes bibliographic databases and other sources such as grey literature, citation indexes, trial data, and other unpublished reports.On the other hand, selection models and graph-based methods have been suggested to be the main methods for assessing the presence of PB [6].
Unfortunately, despite the significance of PB, previous studies show that a considerable proportion of systematic reviewers in different healthcare fields, such as oncology [7], anesthesiology [8], dermatology [9], cardiology [10], and gastroenterology [11], did not try to evaluate its possible presence in their SRs and MAs.Also, a substantial proportion of SRs were found not to search resources other than published materials, hence increasing the risk of PB in their results [12].
Unfortunately, previous reports show that methodological research biases, including biases in conducting randomized trials or gender bias in research and publishing, in the field of otorhinolaryngology (ENT) are still pretty common, and not many improvements have been made in the last few decades despite the significant growth of research methodology awareness in the scientific community [13,14].In this study, however, we aimed to assess another form of bias in the field, namely, to what extent the recent SRs with MAs in the field of ENT have taken measures to reduce the risk of PB in their results or have evaluated the probability of its presence in their research.To our knowledge, this is the first study that has evaluated this subject in this field.Our findings may help to understand how much the issue of PB is addressed in the field.

Approaches to deal with publication bias
There are two approaches for dealing with PB: selection models and graph-based methods [6].Selection models use the weighted distribution theory to model the publication process and, thus, develop estimation procedures that account for the selection process [15].The Hedges model [16] is the first and most known selection model used for assessing PB.Unfortunately, these models rely on largely untestable assumptions [17] and, thus, are rarely used in practice for assessing PB.Instead, they are used in sensitivity analyses [15].On the other hand, graph-based methods are widely used.These methods are based on a funnel plot which usually presents effect sizes plotted against their standard errors (SE) or precision [18].In the presence of PB, the plot is expected to be asymmetrical.However, a variety of other factors may also lead to asymmetrical funnel plots, such as inflated effects in smaller studies, true heterogeneity, artefactual effects, and chance [19].Another issue with the visual inspection of funnel plots for detecting PB is the subjectivity of the approach, which in turn leads to errors in interpretation [20].Thus, statistical tests have also been proposed to detect funnel plot asymmetries, such as Begg's rank test [21], Egger's regression [19], Harbord's regression [22], Peters' regression [23], and Deeks' regression [24].Trim-and-fill method, a nonparametric rank-based correction method, was proposed to recover symmetry by "trimming" observed studies and subsequently imputing missing studies [25].
It has been proposed that reviewers use various tests and methods to detect PB in their research because different tests make different assumptions on the association between the effect sizes and their precision measures [26].The handbook of Cochrane [27] has also made some other recommendations: • Tests should be used only when there are at least 10 studies included in the meta-analysis.

Objectives
To what extent the MAs in the field of ENT in 2021 have searched for evidence other than bibliographic databases and assessed the risk of PB?

SR selection
Due to limited resources (mostly reviewers), only six journals were included in this research.These six journals were selected from the ten journals with the highest impact factor in the field of ENT (discovered through Google Scholar) following a consensus between a group of four attending ENT surgeons at the Tehran University of Medical Sciences.The attending ENT surgeons that selected these journals were blinded to the aim of the study and were told to select six of those ten journals that they believe to (1) have been the most influential in the field in the past decade and (2) are known to them and their peers as publishing some of SRs with the highest impact on their clinical practice.Although this process had the potential to introduce some selection bias, we figured that it might be the best tactic for assessing the presence of PB in some of the most influential SRs in the field, given our limited review resources.The selected journals included the following: Audiology and Neurotology, Ear & Hearing, International Forum of Allergy & Rhinology, Otolaryngology-Head and Neck Surgery, Rhinology, and The Laryngoscope.PubMed was searched for the papers published in these journals in the year 2021.The "Systematic Review" and "Meta-Analysis" filters of PubMed were activated to narrow the search results.Also, those journals were hand-searched to make sure that no related studies were missed in our search.
The results of the search were imported into the Mendeley Reference Manager application, a software designed for the management of citations.Then, using the "Look up metadata by DOI" feature of the application, all the metadata of the records were updated.Two reviewers retrieved the full texts of the records and independently assessed them for eligibility.Discrepancies were resolved through discussion.Studies that had an MA component were included.

Data items
Two reviewers independently extracted the data from eligible studies.Discrepancies were resolved through discussion.The following data were gathered: • Study characteristics • Reporting guideline • Bibliographic databases and citation indexes searched • Other sources of data (clinical trials registries, grey literature, citation checking, etc.) • Whether studies in a language other than English were considered eligible for inclusion • The number of studies included in the SR • If assessed, the method used to assess PB • Number of papers that cited the SR (as a measure of the influence the SR had in the field) • Review type The review type was determined according to the 10 categories proposed by a typology study [28]: effectiveness, experiential, costs/economic evaluation, prevalence and/or incidence, diagnostic test accuracy (DTA), etiology and/or risk, expert opinion/policy, psychometric, prognostic, and methodology.In cases where one SR fell into more than one category, we considered the main objective of the SR to determine the review type.

Publication bias assessment
We recorded the methods used for assessing PB in each SR.It must be noted that PB is an issue at the outcome level rather than the study level.In this study, we only assessed the primary outcome in each SR for PB.For SRs that did not evaluate the risk of PB, we assessed it by replicating the results of their MAs (using the reported effect measures for each included study), designing funnel plots, using the trim-and-fill method, and performing Begg's rank test and Egger's regression (or Deeks' regression in case of DTA SRs).Only SRs that included at least 10 studies in their MA were assessed in this way because when there are fewer than 10 studies, the power of the statistical tests for assessing the risk of PB is relatively low [4].
We used R version 4.2 [29] "meta" package [30] to perform analyses and assess PB in SRs where it was not assessed.The significance level for PB tests was set at P < 0.10, as it is the commonly used threshold for such tests due to their low statistical power.

Study flow
The search on PubMed returned 188 records.After updating the metadata, 57 records were removed as they were not published in 2021.After screening the full-text papers, 56 reports were also excluded.Especially, an SR that was conducted on individual patient data (IPD) was excluded because, although these studies are also at risk of PB [31], the unit of analysis in our research is a study and not an individual patient.Finally, 75 SRs were included in this study (check the Appendix for the characteristics of the included studies).Figure 1 shows the flow of study selection.

Basic characteristics of contributing SRs
The basic characteristics of the included SRs are presented in Table 1.
The SR types were as follows: 34 effectiveness, 24 etiology and/or risk, 10 prevalence and/or incidence, 5 prognostic, and 2 DTA.Most of the SRs followed PRISMA [32] or MOOSE [33] reporting guidelines.Only 9 SRs did not report the use of a standard reporting guideline.
Included SRs were cited by other papers on a median of 4 citations per SR (interquartile range (IQR): 1-8).The highest number of citations to an SR was 107 [34], while 9 SRs were not cited by any other paper at all.

Literature search Bibliographic databases and citation indexes
On average, 3.8 bibliographic databases and citation indexes were searched in the included SRs.The data sources searched included APA PsychINFO, CAB Abstracts, CINAHL, CNKI, Cochrane Library, CQVIP, Embase, EmCare, Google Scholar, LILACS, MEDLINE, OTseeker, ProQuest, PubMed, SciELO, ScienceDirect, Scopus, Wanfang Data, and Web of Science.PubMed was the most searched database with 56 SRs (74.7%) searching it, followed by Embase which was searched by 51 SRs (68.0%).Figure 2 shows how many SRs went through searching each of these databases.

Other sources and languages
Backward citation checking was the most prevalent technique used for finding other potentially eligible records (50 SRs, 66.7%).On the other hand, forward citation checking was rarely used (4 SRs, 5.3%).Also, 3 SRs (4.0%) hand-searched the relevant journals or contacted experts in the field.Finally, 1 SR used the "related articles" feature of PubMed to search for other studies.
The search for unpublished studies and grey literature was also less than satisfactory.One study searched for the abstracts from the related conferences, one study searched for theses (using ProQuest), one study searched the National Rehabilitation Information Center, and one study searched the Chulalongkorn Medical Library.
Most of the included SRs also used language restrictions in their search strategy.Only 18 SRs (24.0%) searched for studies in languages other than English, while 6 SRs did not report if they utilized any language restrictions in their search.

Included studies
On average, 24.8 studies were included in the SRs (median 16, IQR 10.5-30.5).The maximum number of studies included in an SR was 98, while the minimum was 3.For MAs, an average number of 21.9 studies were used (median 13, IQR 7.5-26.5).The maximum and the minimum number of studies included in an MA were similar to the aforementioned numbers.

Publication bias Assessment for publication bias
Almost half of the included SRs used at least one method to assess the potential risk of PB (38/75, 50.7%), while the other SRs failed to acknowledge PB at all. Figure 3 shows the ratio of SRs that assessed PB per journal.Visual inspection of the funnel plot was the most used method for assessing PB across the included SRs (34/38, 89.5%).Among tests used to assess funnel plot asymmetry, Egger's regression was the most widely used (25/38, 65.8%), followed by Begg's rank test (4/38, 10.6%).Two SRs used Harbord's regression and one used Peters' regression.Finally, 4 SRs used the trim-and-fill method to assess the effect of missing studies.
Of 38 SRs that assessed PB, 26 (68.4%) used at least two of the mentioned methods, while 6 (15.8%) used at least three methods.Also, 4 SRs did not design a funnel plot but relied solely on statistical tests for funnel plot asymmetry to assess the potential risk of PB.This finding was interesting because it is always advised that the results of tests for funnel plot asymmetry must be interpreted in the light of visual inspection of the plot [4].Also, both included DTA SRs used Egger's regression test, instead of Deeks' regression, to assess funnel plot asymmetry.

Presence of publication bias
Of the 38 SRs that assessed PB, 11 (28.9%) reported a considerable risk for its presence.We tried to assess the risk of PB in the remaining 37 SRs.In the process, we found that 22 SRs had included less than 10 studies in their MAs for any single outcome and, thus, were ineligible for PB risk assessment through statistical tests.Furthermore, replicating the MA for 4 SRs was impossible mostly due to an incomplete report of the data or statistical methods used.Eventually, we managed to assess the risk of PB in 11 additional SRs.Considerably, 7 of these SRs (63.6%) were found to be at a considerable risk of PB.Overall, 49 SRs were assessed for the risk of PB, out of which 18 (36.7%)were found to be at considerable risk.
We also examined the SRs that assessed the presence of PB in their review and found a high risk for its presence, in more depth to check if they took measures to assess its impact on their MA results, or if their Of the 11 SRs that reported a high risk of PB in their included studies, 3 (27.2%)used further tests to estimate intervention effects "corrected" for the effects of PB, such as the trim-and-fill method or conducting sensitivity analyses.Also, only 7 of those 11 studies (63.3%) that found a high risk of PB took this risk into consideration when making conclusions, by either downgrading the certainty of the evidence or discussing the potential impact that PB might have on their results.

Possible factors contributing to the risk of publication bias
To test for the possible factors that may have affected the risk of PB, a series of Pearson's chi-square tests and t-tests were conducted.However, none of these tests were found to indicate a possible statistically significant correlation.The results of these tests are provided in Table 2.
As these results indicate, the chances of PB were slightly altered following the inclusion of languages other than English and sources other than bibliographic databases in the search strategy, but none of these correlations were statistically significant.The number of bibliographic databases and citation indexes searched  was also not different in a statistically significant manner between the studies with low risk of PB vs. those at high risk of PB.

Interpretation of results
In this study, we aimed to evaluate to what extent the SRs and MAs published in some of the most influential journals of ENT use methods to reduce the risk of PB and different techniques to assess the risk of its presence.Our findings revealed that this issue is not addressed optimally in a considerable proportion of the SRs.First, the search strategies used in these SRs were not comprehensive enough to mitigate the risk of PB.Most SRs restricted their search to papers published in English, thus suffering from a great risk of language bias.Although previous studies have shown that the impact of language bias is negligible on the results of an SR in most circumstances [35][36][37], exceptions have also been observed [38][39][40].As a result, Cochrane recommends that language restrictions should not be used unless in the setting of rapid reviews, and even in that setting, its use should be justified by the reviewers [4].Also, most of the SRs did not search for other sources of data or grey literature.This issue is of great importance as it has been found that such data can seriously affect the results of an SR [41,42].Specifically, including a grey literature search should be seriously considered when conducting an SR because an association between "statistically significant" results and publication has been documented in previous studies [4].
Another finding of interest was that almost half of the SRs did not assess the risk of PB.This finding becomes bolder knowing that our analyses revealed that the risk of PB was considerably higher in the SRs that did not assess the risk of its presence (63.6% vs 28.9%).The reason behind this phenomenon is unknown, but some of the potential reasons could be as follows: (a) reviewers trying not to downgrade the confidence in their results; (b) lack of methodological expertise and knowledge for assessing the risk of PB which also resulted in designing poor search strategies; and (c) solely due to chance.Nevertheless, the journal editors and reviewers should ask the authors to assess the risk of PB in their SRs whenever feasible.
More importantly, we saw that in a lot of the cases where reviewers found a high risk for PB in their SRs, they did not try to estimate the intervention effect corrected for the impact of PB, take the risk of PB in making conclusions, or expand their search to reduce the risk of PB presence.This issue should be specifically noted by journal editors, asking the authors to include other sources of data as well when the risk of PB was assessed to be high, in an attempt to avoid publishing inflated results as much as possible.
Another finding of interest was the inappropriate use of methods to assess the risk of PB.Although this problem was not frequent across the SRs, some used inappropriate tests to assess funnel plot asymmetry, such as using Egger's regression test instead of Deeks' regression in the setting of DTA SRs or using statistical tests alone with no visual inspection of the funnel plot beforehand.Both journal editors and reviewers must note that the results of statistical tests for funnel plot asymmetry should be interpreted in light of the visual inspection of the plot, as all these tests are known to have low statistical power [27].Other factors should also be considered for using such tests, such as the fact that they are not recommended for cases when there are less than 10 studies included in the MA or that they should not be used when studies are of similar size [27].Using contour-enhanced funnel plots is also highly desirable as they help with differentiating the reasons for funnel plot asymmetry [43].
Finally, we assessed some possible factors that might have contributed to the risk of PB presence.Surprisingly though, none of those factors (language restriction, a search of sources other than bibliographic databases, and the number of databases searched) had a statistically significant correlation with the presence of PB.This could be due to some possible reasons: First, it might be due to the small sample size of SRs included in the test.Another reason could be that some risk of PB was inevitable even in the absence of language restriction of the search, seeking other sources of data, and searching a large number of databases.Nevertheless, the results of these tests do not exclude the fact that implementing these measures will most probably reduce the risk of PB.

Implications
Our findings indicate the lack of methodological sufficiency for conducting high-quality SRs in the most influential journals of the field, which in turn might have led to the possible dissemination of inflated results.We strongly encourage future reviewers and editors of journals to take the issue of PB seriously and demand authors to take measures to reduce its risk and use appropriate methods to assess its possible presence.As PB is an issue at the outcome level, we also encourage future reviewers who want to conduct a study similar to ours in their fields to also assess if the SRs that evaluated the risk of PB for their primary outcome did the same for the secondary outcomes in their study as well.Finally, if feasible, we encourage future researchers who want to conduct a similar study to use more robust selection criteria for including SRs, as our criteria, which was a necessity due to the lack of enough review resources in our team, might have introduced some degree of selection bias in the results.Overall, the issue of PB is a serious issue that can result in the dissemination of inflated results, and thus, the whole scientific community is encouraged to take this phenomenon into more careful consideration, especially when conducting an SR.
• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ? Choose BMC and benefit from:

Fig. 1
Fig. 1 The flow of study selection

Fig. 2
Fig. 2 Number of included systematic reviews (SRs) that searched each bibliographic database and citation index

Fig. 3
Fig. 3 Ratio of included systematic reviews (SRs) that assessed publication bias per journal

Table 1
Basic characteristics of contributing studies.SR systematic reviews

Table 2
The possible sources of publication bias.OR odds ratio