MySQL Levenshtein and Damerau-Levenshtein UDF’s

Levenshtein distance is a helpful metric to use when creating a “Fuzzy search” or “Nearest match” query. It also comes in handy when trying to find and eliminate duplicate records. MySQL doesn’t come with a Levenshtein function. A few generous souls have created Levenshtein UDF’s: Joshua Drew, Sean Collins (Damerau-Levenshtein), and Nicholas Sherlock. I am providing compiled versions and some guidance on how to compile them yourself.

Traditional Levenshtein

levenshtein.zip
The zip file contains two UDF’s, one by Joshua Drew that comes with original C++ source code, a Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express) and Ubuntu (10.10 with MySQL 5.5) binary are present. The other, by Nicholas Sherlock, has a third argument for distance limit. This UDF comes with original Delphi code, Win32 binary (don’t know when/how it was compiled), and unfortunately no Win64 or Ubuntu binary.

Damerau-Levenshtein UDF

damlev.zip
The Damerau-Levenshtein metric is a slightly modified version of the Levenshtein metric, a description of the differences can be found on Wikipedia and on Sean Collins’ site.

There are three versions (damlev, damlevlim, damlevlim256) of this metric in the zip file, all created by Sean Collins. Visit his site for information on how to use each version. A Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express), and Ubuntu (10.10 with MySQL 5.5) binary are present.


Installing UDF as MySQL plugin on Windows

1. Move the .dll binary to your MySQL plugins directory

2. Run the following SQL CREATE FUNCTION commands

If you get the following error:

Make sure that you are using the correct version (for example, a Win32 binary won’t work on Win64), and that Microsoft Visual C++ 2010 Redistributable Package is installed.

3. The functions should now be accessible in MySQL, try:


Installing UDF as MySQL plugin on Ubuntu

1. Move the file to your MySQL plugins directory

2. Run the following SQL CREATE FUNCTION commands

3. The functions should now be accessible in MySQL, try:


Compiling on Windows

Follow Roland Bouman’s guide:

Download my modified version of his guide (with added Module Definitions File step)
compile_mysql_udf_vc_express.pdf

Or visit Bouman’s original post


Compiling on Ubuntu

To compile these binaries on Ubuntu 10.10

1. Get g++ (if you don’t have it already)

2. Get the MySQL development library (if you don’t have it already)

3. cd to the location of the source file

4. Compile the binary

Share

23 thoughts on “MySQL Levenshtein and Damerau-Levenshtein UDF’s

  1. Thanks for the tips & resources !

    How would you use this on whole table entries, ie.
    SELECT Levenshtein(‘my text’,mytable.field, 4) FROM mytable ?

    I would expect in return all entries from the table that are near ‘my text’… is this how it’s supposed to work ?

    Thank you again.

    • Not exactly, it’s giving you the distance between the two words – you can filter based on that distance, for example:

      SELECT * FROM mytable WHERE Levenshtein("cow",mytable.field) < = 5

      This statement will find and display the rows where mytable.field is within a Levenshtein distance of 5 or less of the word "cow"

  2. Hi, great work!!

    Glad you could use my info to get it to work on windows.

    In the pdf, I noticed a typo in my name though – it’s “Roland”, not “Ronald”. Not a biggie, but if you have time to fix the pdf, that’d be great.

    Thanks, and kind regards,

    Roland.

  3. Hello Sam, when i run
    mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘liblevenshtein.dll’;
    its return
    ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 )
    What could be the problem?

    Thank

    • Hello Silvana,

      I have seen this happen before if the version of MySQL is older than the version in which the UDF was compiled

      Make sure that you have the latest version of MySQL

        • which DLL are you trying (by Josh Drew or..)? did you compile it yourself?

          also make sure liblevenshtein.dll is named correctly and is in the right directory, C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin

          • I’m using Joshua Drew’dll. The name is levenshtein.dll (i’d made a mistake writing the name in the previous post) and it is locate in C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin.

  4. I found the problem: A dependent dll can’t be found. I had to install the Microsoft Visual C++ 2010 Redistributable Package because i could not found msvcr100.dll. Thank for all.

  5. I wish I has scrolled down to see the comment about the “msvcr100.dll” error. -sigh- . I can confirm about the “Microsoft Visual C++ 2010 Redistributable Package” thingy. I used http://www.dependencywalker.com and, sure enough, that dll popped as unavailable. That’s for XP.

    Now on to Ubuntu. 64. Does anybody have a compiled version for x86_64? I’ve tried compiling, but I’m no good with gcc or other scary acronyms.

    This guy seems to have worked it out, though: http://tech.darke.net/2011/05/30/compiling-levenshtein-udf-for-mysql-on-a-64bit-os/

    Drop a link or something of that blasted 64 *.so. I’d appreciate it. A lot.

    • I’ll put a note in the original post about MVC++ Redistributable– probably a good idea

      As far as compiling it on 64 bit, it shouldn’t be too hard. I might get around to it at some point

      Thanks for posting Dependency Walker. It looks like a great tool

      • I tried downloading and installing and get the dreaded 1126 error.

        I have the C++ redistributable package installed.

        I then realised the dll is a 32bit library and I am running 64bit windows. AARGH.

        Has anybody recompiled for 64bit windows????

        Help!

        Michael

  6. Great post, just had to do some fuzzy searching and I used your help to compile and install UDF on my CentOS 5.x box. Here is the compile string I used on my 64 bit CentOS box.

    g++ -fPIC -I /usr/include/mysql/ -o levenshtein.so -shared levenshtein.cc
    cp levenshtein.so /usr/lib64/mysql/plugin/

    Then from within mysql I installed the plugin.
    mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘levenshtein.so’;

  7. Hi,
    I’m trying to install the Win64 version dll on Windows Server 2003 x64 and the create function says can’t open shared library and the process monitor shows that it tries to open damlev.dll.2.Config which it cannot find, which is the reason wy it fails.
    Is the Windows server 2008 complie cannot be used on Windows server 2003?

    • Sorry, found the reason. On this server I didn’t have MSVCR100.dll installed. I had copied it over from elsewhere and then it worked.

  8. Hi

    What about unicode characters? If we compare:
    levenshtein(‘ół,’ol’);
    How to develop your code to recognizing these string as identical? For “ó Ó ł Ł” and others: “ą Ą ć Ć ę Ę ś Ś n Ń ź Ź ż Ż”

    • Unfortunately I’m not sure how to go about doing that

      Maybe try contacting one of the original UDF developers?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>