Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module

Table 1 SRA-DM algorithm changes

Iterations	Changes to algorithms
First iteration	Matching criteria were based on simple field comparison (ignoring punctuation) with checks against the year field since this field has a lower probability for errors because it is restricted to integers 0–9 and therefore the best non-mistakable field.
Second iteration	Short format page numbers were converted to full format (e.g. 221–226, 221–6), and the algorithm was further modified to increase the sensitivity by incorporating matching criteria on authors OR title.
Third iteration	Match author AND title with the extension of the non-reference fields from only ‘year’ to year OR volume OR edition.
Fourth iteration	The fourth algorithm extended the matching criteria of the third algorithm, with the addition of an improved name matching system. This was context aware of author name variations, i.e. initialisation, punctuation and rearranged author listings using fuzzy logic, so that differences could be accommodated. For example, the following names are all syntactically equivalent and will match as identical authors:
	1. William Shakespeare
	2. W. Shakespeare
	3. W Shakespeare
	4. William John Shakespeare
	5. William J. Shakespeare
	6. W. J. Shakespeare
	7. W J Shakespeare
	8. Shakespeare, William
	9. Shakespeare, W
	10. Shakespeare, W, A
	11. Shakespeare, W, A, B, C
	12. William Shakespeare 1st
	13. William Shakespeare 2nd
	14. William Shakespeare IV
	15. William Adam Bob Charles Shakespeare XVI

ISSN: 2046-4053