Skip to main content

Table 1 SRA-DM algorithm changes

From: Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module

Iterations

Changes to algorithms

First iteration

Matching criteria were based on simple field comparison (ignoring punctuation) with checks against the year field since this field has a lower probability for errors because it is restricted to integers 0–9 and therefore the best non-mistakable field.

Second iteration

Short format page numbers were converted to full format (e.g. 221–226, 221–6), and the algorithm was further modified to increase the sensitivity by incorporating matching criteria on authors OR title.

Third iteration

Match author AND title with the extension of the non-reference fields from only ‘year’ to year OR volume OR edition.

Fourth iteration

The fourth algorithm extended the matching criteria of the third algorithm, with the addition of an improved name matching system. This was context aware of author name variations, i.e. initialisation, punctuation and rearranged author listings using fuzzy logic, so that differences could be accommodated. For example, the following names are all syntactically equivalent and will match as identical authors:

1. William Shakespeare

2. W. Shakespeare

3. W Shakespeare

4. William John Shakespeare

5. William J. Shakespeare

6. W. J. Shakespeare

7. W J Shakespeare

8. Shakespeare, William

9. Shakespeare, W

10. Shakespeare, W, A

11. Shakespeare, W, A, B, C

12. William Shakespeare 1st

13. William Shakespeare 2nd

14. William Shakespeare IV

15. William Adam Bob Charles Shakespeare XVI