Iterations | Changes to algorithms |
---|---|
First iteration | Matching criteria were based on simple field comparison (ignoring punctuation) with checks against the year field since this field has a lower probability for errors because it is restricted to integers 0–9 and therefore the best non-mistakable field. |
Second iteration | Short format page numbers were converted to full format (e.g. 221–226, 221–6), and the algorithm was further modified to increase the sensitivity by incorporating matching criteria on authors OR title. |
Third iteration | Match author AND title with the extension of the non-reference fields from only ‘year’ to year OR volume OR edition. |
Fourth iteration | The fourth algorithm extended the matching criteria of the third algorithm, with the addition of an improved name matching system. This was context aware of author name variations, i.e. initialisation, punctuation and rearranged author listings using fuzzy logic, so that differences could be accommodated. For example, the following names are all syntactically equivalent and will match as identical authors: |
1. William Shakespeare | |
2. W. Shakespeare | |
3. W Shakespeare | |
4. William John Shakespeare | |
5. William J. Shakespeare | |
6. W. J. Shakespeare | |
7. W J Shakespeare | |
8. Shakespeare, William | |
9. Shakespeare, W | |
10. Shakespeare, W, A | |
11. Shakespeare, W, A, B, C | |
12. William Shakespeare 1st | |
13. William Shakespeare 2nd | |
14. William Shakespeare IV | |
15. William Adam Bob Charles Shakespeare XVI |