MySQL Levenshtein and Damerau-Levenshtein UDF’s

Levenshtein distance is a helpful metric to use when creating a “Fuzzy search” or “Nearest match” query. It also comes in handy when trying to find and eliminate duplicate records. MySQL doesn’t come with a Levenshtein function. A few generous souls have created Levenshtein UDF’s: Joshua Drew, Sean Collins (Damerau-Levenshtein), and Nicholas Sherlock. I am providing compiled versions and some guidance on how to compile them yourself.

Traditional Levenshtein

levenshtein.zip
The zip file contains two UDF’s, one by Joshua Drew that comes with original C++ source code, a Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express) and Ubuntu (10.10 with MySQL 5.5) binary are present. The other, by Nicholas Sherlock, has a third argument for distance limit. This UDF comes with original Delphi code, Win32 binary (don’t know when/how it was compiled), and unfortunately no Win64 or Ubuntu binary.

Damerau-Levenshtein UDF

damlev.zip
The Damerau-Levenshtein metric is a slightly modified version of the Levenshtein metric, a description of the differences can be found on Wikipedia and on Sean Collins’ site.

There are three versions (damlev, damlevlim, damlevlim256) of this metric in the zip file, all created by Sean Collins. Visit his site for information on how to use each version. A Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express), and Ubuntu (10.10 with MySQL 5.5) binary are present.

[Read more…]