Our findings demonstrated that SRA-DM identifies substantially more duplicate citations than EndNote and has greater sensitivity [(84% vs 51%), (90% vs 63%), (84% vs 73%), (84 vs 64%)]. The specificity of SRA-DM was 100% with no false positives, whereas the specificity of EndNote was imperfect.
Waste in research occurs for several methodological, legislative and reporting reasons [19–22]. Another form of waste is inefficient labouring, in part, as a consequence of non-standardised citations details across bibliographic databases, perfunctory error checking and absence of a unique trial identification number for it and its associated further multiple reports. If these issues were solved at source, manual duplicate checking would be unnecessary. Until these issues are resolved, deploying the SRA-DM will save information specialists and reviewers valuable time by identifying on average a further 42.86% of duplicate records.
Several citations were wrongly designated as duplicates by EndNote auto-deduplication due to different citations sharing the same authors and title but published in other journals or as conference proceedings. In a recent study by Jiang [23], the authors also found that EndNote, for the same reason, had erroneously assigned unique records as duplicates. It is probable that in most scenarios no important loss of data would occur; although sometimes additional methodological or outcome data are reported, and ideally these need to be retained for inspection. A recent study by Qi [18] examined the content of undetected duplicate records in EndNote and found that errors often occurred due to missing or wrong data in the fields, especially for records retrieved from EMBASE database. This also affected the sensitivity of SRA-DM, with duplicates undetected due to missing or wrong or extraneous data in the fields.
During the training and development stage, the four iterations of SRA-DM achieved sensitivities ranging from 68%, 75%, 84% and 96% with the most sensitive (96%) achieved with a trade-off in specificity (99.75%) with three false positives. For systematic reviews and Health Technology Assessment reports, the aim is to conduct comprehensive searches to ensure all relevant trials are identified [24]; thus, losing even three citations is undesirable. Therefore, the final algorithm (fourth iteration) with the lower sensitivity (84%) but perfect (100%) specificity was preferred. Future developments with SRA-DM may incorporate two algorithms, first using the 100% specific algorithm to automatically remove duplicates and another algorithm with higher sensitivity (albeit with lower specificity) to identify the remaining duplicates for manual verification. If this strategy was implemented on the respiratory dataset using the fourth and second algorithm (Table 3), only 91 out of 1,988 citations would have to be manually checked and only 34 duplicates would remain undetected.
In spite of this major improvement with the SRA-DM, no software can currently detect all duplicate records, and the perfect uncluttered dataset remains elusive. Undetected duplicates in SRA-DM occurred due to discrepancies such as missing page numbers or too much variance with author names. Duplicates were also missed because the OVID MEDLINE platform inserted additional extraneous information into the title field (e.g. [Review] [72 refs]) whereas the same article retrieved from EMBASE or other non-OVID MEDLINE platforms (i.e. PubMed, Web of Knowledge) report only the title. Some of these problems could be overcome in the future with record linkage and citation enrichment techniques to populate blank fields with meta-data to increase the detection rate.
Strengths and weaknesses
The deduplication program was developed to identify duplicate citations from biomedical databases and has not been tested on other bibliographic records such as books and governmental reports and therefore may not perform as well with other bibliographies. However, the deduplication program was developed iteratively to remove problems of false positives and was tested on four different datasets which included comprehensive searches using 14 different databases that are used by information specialists, and therefore, similar efficiencies should occur in other medical specialities. Also, the accuracy of SRA-DM was consistently higher than that of EndNote, and these finding are probably generalizable to other biomedical database searches due to the same records types and fields used. It is possible that some duplicates were not detected during the manual benchmarking process, although the database was screened twice first by author and then by title, and additional cross-checking was performed by manually comparing the benchmark against EndNote auto-deduplication and SRA-DM decisions—thus minimising the possibility of undetected duplicates.
Whilst we compared SRA-DM against the typical default EndNote deduplication setting, we recognise that some information specialists adopt additional steps whilst performing deduplication in EndNote. For example, they may employ multi-stage screening or attempt to replace incomplete citations by updating citation fields with the ‘Find References Update’ feature in EndNote. However, many researchers and information specialists do not employ such techniques, and our aim was to address deduplication with an automated algorithm and compare it against the default deduplication process in EndNote. Qi [18] recommended employing a two-step strategy to address the problem of undetected duplicates by first performing auto-deduplication in EndNote followed by manual hand screening to identify remaining duplicates. This basic strategy is used by some information specialists and systematic reviewers but is inefficient due to the large proportion of unidentified duplicates. Other more complex multi-stage screening strategies have been suggested [25] but are EndNote-specific and not viable for other reference management software.