MySQL Levenshtein and Damerau-Levenshtein UDF’s

Levenshtein distance is a helpful metric to use when creating a “Fuzzy search” or “Nearest match” query. It also comes in handy when trying to find and eliminate duplicate records. MySQL doesn’t come with a Levenshtein function. A few generous souls have created Levenshtein UDF’s: Joshua Drew, Sean Collins (Damerau-Levenshtein), and Nicholas Sherlock. I am providing compiled versions and some guidance on how to compile them yourself.

Traditional Levenshtein

levenshtein.zip
The zip file contains two UDF’s, one by Joshua Drew that comes with original C++ source code, a Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express) and Ubuntu (10.10 with MySQL 5.5) binary are present. The other, by Nicholas Sherlock, has a third argument for distance limit. This UDF comes with original Delphi code, Win32 binary (don’t know when/how it was compiled), and unfortunately no Win64 or Ubuntu binary.

Damerau-Levenshtein UDF

damlev.zip
The Damerau-Levenshtein metric is a slightly modified version of the Levenshtein metric, a description of the differences can be found on Wikipedia and on Sean Collins’ site.

There are three versions (damlev, damlevlim, damlevlim256) of this metric in the zip file, all created by Sean Collins. Visit his site for information on how to use each version. A Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express), and Ubuntu (10.10 with MySQL 5.5) binary are present.


Installing UDF as MySQL plugin on Windows

1. Move the .dll binary to your MySQL plugins directory

1
C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin
C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin

2. Run the following SQL CREATE FUNCTION commands

1
2
3
4
5
6
7
8
9
10
11
-- levenshtein.dll (Joshua Drew)
CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.dll';
 
-- LevenshteinUDF.dll (with limit arg, Nicholas Sherlock)
-- Note: you can only use either Drew's or Sherlock's Levenshtein, they both have the same function name
CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'LevenshteinUDF.dll';
 
-- damlev.dll (Sean Collins)
CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.dll';
CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.dll';
CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.dll';
-- levenshtein.dll (Joshua Drew)
CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.dll';

-- LevenshteinUDF.dll (with limit arg, Nicholas Sherlock)
-- Note: you can only use either Drew's or Sherlock's Levenshtein, they both have the same function name
CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'LevenshteinUDF.dll';

-- damlev.dll (Sean Collins)
CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.dll';
CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.dll';
CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.dll';

If you get the following error:

1
ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 )
ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 )

Make sure that you are using the correct version (for example, a Win32 binary won’t work on Win64), and that Microsoft Visual C++ 2010 Redistributable Package is installed.

3. The functions should now be accessible in MySQL, try:

1
2
3
SELECT Levenshtein('cow','coy'); -- 1
SELECT damlev('tuesday', 'something'); -- 8
SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early)
SELECT Levenshtein('cow','coy'); -- 1
SELECT damlev('tuesday', 'something'); -- 8
SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early)

Installing UDF as MySQL plugin on Ubuntu

1. Move the file to your MySQL plugins directory

1
sudo mv <filename>.so /usr/lib/mysql/plugin/
sudo mv <filename>.so /usr/lib/mysql/plugin/

2. Run the following SQL CREATE FUNCTION commands

1
2
3
4
5
6
7
-- levenshtein.so (Joshua Drew)
CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.so';
 
-- damlev.so (Sean Collins)
CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.so';
CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.so';
CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.so';
-- levenshtein.so (Joshua Drew)
CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.so';

-- damlev.so (Sean Collins)
CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.so';
CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.so';
CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.so';

3. The functions should now be accessible in MySQL, try:

1
2
3
SELECT Levenshtein('cow','coy'); -- 1
SELECT damlev('tuesday', 'something'); -- 8
SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early)
SELECT Levenshtein('cow','coy'); -- 1
SELECT damlev('tuesday', 'something'); -- 8
SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early)

Compiling on Windows

Follow Roland Bouman’s guide:

Download my modified version of his guide (with added Module Definitions File step)
compile_mysql_udf_vc_express.pdf

Or visit Bouman’s original post


Compiling on Ubuntu

To compile these binaries on Ubuntu 10.10

1. Get g++ (if you don’t have it already)

1
sudo apt-get install g++
sudo apt-get install g++

2. Get the MySQL development library (if you don’t have it already)

1
sudo apt-get install libmysql++-dev
sudo apt-get install libmysql++-dev

3. cd to the location of the source file

4. Compile the binary

1
g++ -I /usr/include/mysql/ -o <filename without extension>.so -shared <filename with extension>
g++ -I /usr/include/mysql/ -o <filename without extension>.so -shared <filename with extension>
Share

19 thoughts on “MySQL Levenshtein and Damerau-Levenshtein UDF’s

  1. Thanks for the tips & resources !

    How would you use this on whole table entries, ie.
    SELECT Levenshtein(‘my text’,mytable.field, 4) FROM mytable ?

    I would expect in return all entries from the table that are near ‘my text’… is this how it’s supposed to work ?

    Thank you again.

    • Not exactly, it’s giving you the distance between the two words – you can filter based on that distance, for example:

      1
      
      SELECT * FROM mytable WHERE Levenshtein("cow",mytable.field) < = 5
      SELECT * FROM mytable WHERE Levenshtein("cow",mytable.field) < = 5

      This statement will find and display the rows where mytable.field is within a Levenshtein distance of 5 or less of the word "cow"

  2. Hi, great work!!

    Glad you could use my info to get it to work on windows.

    In the pdf, I noticed a typo in my name though – it’s “Roland”, not “Ronald”. Not a biggie, but if you have time to fix the pdf, that’d be great.

    Thanks, and kind regards,

    Roland.

  3. Hello Sam, when i run
    mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘liblevenshtein.dll’;
    its return
    ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 )
    What could be the problem?

    Thank

    • Hello Silvana,

      I have seen this happen before if the version of MySQL is older than the version in which the UDF was compiled

      Make sure that you have the latest version of MySQL

        • which DLL are you trying (by Josh Drew or..)? did you compile it yourself?

          also make sure liblevenshtein.dll is named correctly and is in the right directory, C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin

          • I’m using Joshua Drew’dll. The name is levenshtein.dll (i’d made a mistake writing the name in the previous post) and it is locate in C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin.

  4. I found the problem: A dependent dll can’t be found. I had to install the Microsoft Visual C++ 2010 Redistributable Package because i could not found msvcr100.dll. Thank for all.

  5. I wish I has scrolled down to see the comment about the “msvcr100.dll” error. -sigh- . I can confirm about the “Microsoft Visual C++ 2010 Redistributable Package” thingy. I used http://www.dependencywalker.com and, sure enough, that dll popped as unavailable. That’s for XP.

    Now on to Ubuntu. 64. Does anybody have a compiled version for x86_64? I’ve tried compiling, but I’m no good with gcc or other scary acronyms.

    This guy seems to have worked it out, though: http://tech.darke.net/2011/05/30/compiling-levenshtein-udf-for-mysql-on-a-64bit-os/

    Drop a link or something of that blasted 64 *.so. I’d appreciate it. A lot.

    • I’ll put a note in the original post about MVC++ Redistributable– probably a good idea

      As far as compiling it on 64 bit, it shouldn’t be too hard. I might get around to it at some point

      Thanks for posting Dependency Walker. It looks like a great tool

      • I tried downloading and installing and get the dreaded 1126 error.

        I have the C++ redistributable package installed.

        I then realised the dll is a 32bit library and I am running 64bit windows. AARGH.

        Has anybody recompiled for 64bit windows????

        Help!

        Michael

  6. Great post, just had to do some fuzzy searching and I used your help to compile and install UDF on my CentOS 5.x box. Here is the compile string I used on my 64 bit CentOS box.

    g++ -fPIC -I /usr/include/mysql/ -o levenshtein.so -shared levenshtein.cc
    cp levenshtein.so /usr/lib64/mysql/plugin/

    Then from within mysql I installed the plugin.
    mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘levenshtein.so’;

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code lang=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" extra="">