What is Fuzziness?

Fuzziness (Edit/Levenshtein Distance) is a matching technique that allows for a variation in spelling or small variations in the spelling of a search term and the entities returned in the search results. The fuzziness will allow 1 phonetic typo per each word from the search term, the fuzziness percentage has more to do with the length of the word. Setting the interval is entirely dependent on your risk-based approach and how sure you are that the names you input for searching are correct (e.g. if you take the info directly from the customers' IDs, or if they input it themselves - which would be more prone to error).


First AML's fuzziness setting is set at 30% across all jurisdictions.


Please note that the impact and use of Edit/Levenshtein Distance is inversely proportional to the length of the name. As the search term length increases the relative importance of a deviation in spelling will decrease. 


For example Leederheimer - Lexderheimer are far more likely to be misspellings of each other than Lee - Lex


●   Exact Match ->

Difference between 0% fuzziness and an exact match:

  • The exact match does not allow for extra words to be added i.e. Robert Mugabe will not match with Robert Gabriel Mugabe
  • We allow +/- 1 year difference in Year of Birth when fuzziness is between 10% and 100%. For exact match and 0% fuzziness, the Year of Birth has to match exactly
  • An exact match doesn’t account for any preprocessing, for example, we do not strip out honorifics or suffixes like Mr./Ms./Dr./PHD etc



Why is fuzziness useful?


It allows for variations in the spellings of the search term. If you misspell or are unsure of the spelling of a search term, you will be returned entities that are spelt differently to the search term by an inserted, omitted, or replaced character. This principle is useful when searching for non-Latin Characters. Fuzziness will not be performed on non-Latin characters, however, the search term will be converted from the native non-Latin text into Latin. The Latin transliteration is what we conduct fuzziness on. Through transliteration, there may be variation in the spelling of the search term to what it was in the non-Latin text, therefore having a higher fuzziness setting for non-Latin names is useful to prevent false negatives from occurring.



Impact on false positives 


The Levenshtein distance algorithm has been tested extensively (both internally and by independent third-party consultants) across different names and name variations in our database. To reduce false positives we have capped the maximum edit distance change at one character. This allows for spelling errors/variations without returning large numbers of unnecessary false positives.