Skip to main content

Table 1 SRA-DM algorithm changes

From: Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module

Iterations Changes to algorithms
First iteration Matching criteria were based on simple field comparison (ignoring punctuation) with checks against the year field since this field has a lower probability for errors because it is restricted to integers 0–9 and therefore the best non-mistakable field.
Second iteration Short format page numbers were converted to full format (e.g. 221–226, 221–6), and the algorithm was further modified to increase the sensitivity by incorporating matching criteria on authors OR title.
Third iteration Match author AND title with the extension of the non-reference fields from only ‘year’ to year OR volume OR edition.
Fourth iteration The fourth algorithm extended the matching criteria of the third algorithm, with the addition of an improved name matching system. This was context aware of author name variations, i.e. initialisation, punctuation and rearranged author listings using fuzzy logic, so that differences could be accommodated. For example, the following names are all syntactically equivalent and will match as identical authors:
1. William Shakespeare
2. W. Shakespeare
3. W Shakespeare
4. William John Shakespeare
5. William J. Shakespeare
6. W. J. Shakespeare
7. W J Shakespeare
8. Shakespeare, William
9. Shakespeare, W
10. Shakespeare, W, A
11. Shakespeare, W, A, B, C
12. William Shakespeare 1st
13. William Shakespeare 2nd
14. William Shakespeare IV
15. William Adam Bob Charles Shakespeare XVI