Levenshtein distance is a helpful metric to use when creating a “Fuzzy search” or “Nearest match” query. It also comes in handy when trying to find and eliminate duplicate records. MySQL doesn’t come with a Levenshtein function. A few generous souls have created Levenshtein UDF’s: Joshua Drew, Sean Collins (Damerau-Levenshtein), and Nicholas Sherlock. I am providing compiled versions and some guidance on how to compile them yourself.
Traditional Levenshtein
levenshtein.zip
The zip file contains two UDF’s, one by Joshua Drew that comes with original C++ source code, a Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express) and Ubuntu (10.10 with MySQL 5.5) binary are present. The other, by Nicholas Sherlock, has a third argument for distance limit. This UDF comes with original Delphi code, Win32 binary (don’t know when/how it was compiled), and unfortunately no Win64 or Ubuntu binary.
Damerau-Levenshtein UDF
damlev.zip
The Damerau-Levenshtein metric is a slightly modified version of the Levenshtein metric, a description of the differences can be found on Wikipedia and on Sean Collins’ site.
There are three versions (damlev, damlevlim, damlevlim256) of this metric in the zip file, all created by Sean Collins. Visit his site for information on how to use each version. A Win32 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.0A, and VC+ 2010 Express), a Win64 (compiled on Server 2008 with MySQL 5.5, Windows SDK v7.1, and VC+ 2010 Express), and Ubuntu (10.10 with MySQL 5.5) binary are present.
Installing UDF as MySQL plugin on Windows
1. Move the .dll binary to your MySQL plugins directory
1 | C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin |
C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin
2. Run the following SQL CREATE FUNCTION commands
1 2 3 4 5 6 7 8 9 10 11 | -- levenshtein.dll (Joshua Drew) CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.dll'; -- LevenshteinUDF.dll (with limit arg, Nicholas Sherlock) -- Note: you can only use either Drew's or Sherlock's Levenshtein, they both have the same function name CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'LevenshteinUDF.dll'; -- damlev.dll (Sean Collins) CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.dll'; CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.dll'; CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.dll'; |
-- levenshtein.dll (Joshua Drew) CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.dll'; -- LevenshteinUDF.dll (with limit arg, Nicholas Sherlock) -- Note: you can only use either Drew's or Sherlock's Levenshtein, they both have the same function name CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'LevenshteinUDF.dll'; -- damlev.dll (Sean Collins) CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.dll'; CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.dll'; CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.dll';
If you get the following error:
1 | ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 ) |
ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 )
Make sure that you are using the correct version (for example, a Win32 binary won’t work on Win64), and that Microsoft Visual C++ 2010 Redistributable Package is installed.
3. The functions should now be accessible in MySQL, try:
1 2 3 | SELECT Levenshtein('cow','coy'); -- 1 SELECT damlev('tuesday', 'something'); -- 8 SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early) |
SELECT Levenshtein('cow','coy'); -- 1
SELECT damlev('tuesday', 'something'); -- 8
SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early)Installing UDF as MySQL plugin on Ubuntu
1. Move the file to your MySQL plugins directory
2. Run the following SQL CREATE FUNCTION commands
1 2 3 4 5 6 7 | -- levenshtein.so (Joshua Drew) CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.so'; -- damlev.so (Sean Collins) CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.so'; CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.so'; CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.so'; |
-- levenshtein.so (Joshua Drew) CREATE FUNCTION levenshtein RETURNS INTEGER SONAME 'levenshtein.so'; -- damlev.so (Sean Collins) CREATE FUNCTION damlev RETURNS INTEGER SONAME 'damlev.so'; CREATE FUNCTION damlevlim RETURNS INTEGER SONAME 'damlev.so'; CREATE FUNCTION damlevlim256 RETURNS INTEGER SONAME 'damlev.so';
3. The functions should now be accessible in MySQL, try:
1 2 3 | SELECT Levenshtein('cow','coy'); -- 1 SELECT damlev('tuesday', 'something'); -- 8 SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early) |
SELECT Levenshtein('cow','coy'); -- 1
SELECT damlev('tuesday', 'something'); -- 8
SELECT damlevlim('tuesday', 'something', 3); -- 3 (returns early)Compiling on Windows
Follow Roland Bouman’s guide:
Download my modified version of his guide (with added Module Definitions File step)
compile_mysql_udf_vc_express.pdf
Or visit Bouman’s original post
Compiling on Ubuntu
To compile these binaries on Ubuntu 10.10
1. Get g++ (if you don’t have it already)
2. Get the MySQL development library (if you don’t have it already)
3. cd to the location of the source file
4. Compile the binary
1 | g++ -I /usr/include/mysql/ -o <filename without extension>.so -shared <filename with extension> |
g++ -I /usr/include/mysql/ -o <filename without extension>.so -shared <filename with extension>
Thanks for the tips & resources !
How would you use this on whole table entries, ie.
SELECT Levenshtein(‘my text’,mytable.field, 4) FROM mytable ?
I would expect in return all entries from the table that are near ‘my text’… is this how it’s supposed to work ?
Thank you again.
Not exactly, it’s giving you the distance between the two words – you can filter based on that distance, for example:
SELECT * FROM mytable WHERE Levenshtein("cow",mytable.field) < = 5This statement will find and display the rows where mytable.field is within a Levenshtein distance of 5 or less of the word "cow"
Hi, great work!!
Glad you could use my info to get it to work on windows.
In the pdf, I noticed a typo in my name though – it’s “Roland”, not “Ronald”. Not a biggie, but if you have time to fix the pdf, that’d be great.
Thanks, and kind regards,
Roland.
No problem, thank you
Fixed.
Hello Sam, when i run
mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘liblevenshtein.dll’;
its return
ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 )
What could be the problem?
Thank
Hello Silvana,
I have seen this happen before if the version of MySQL is older than the version in which the UDF was compiled
Make sure that you have the latest version of MySQL
I’m using mysql-5.5.13-win32, it’s the right version?
which DLL are you trying (by Josh Drew or..)? did you compile it yourself?
also make sure liblevenshtein.dll is named correctly and is in the right directory, C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin
I’m using Joshua Drew’dll. The name is levenshtein.dll (i’d made a mistake writing the name in the previous post) and it is locate in C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin.
I found the problem: A dependent dll can’t be found. I had to install the Microsoft Visual C++ 2010 Redistributable Package because i could not found msvcr100.dll. Thank for all.
Great, thank you for posting that, I’m sure it will help others
I love this. It was literally thousands of times faster than a SQL stored procedure!
Great stuff!
Some systems have apparmor which adds constrains. See http://forums.mysql.com/read.php?117,298997,300655#msg-300655 for more info on that.
I wish I has scrolled down to see the comment about the “msvcr100.dll” error. -sigh- . I can confirm about the “Microsoft Visual C++ 2010 Redistributable Package” thingy. I used http://www.dependencywalker.com and, sure enough, that dll popped as unavailable. That’s for XP.
Now on to Ubuntu. 64. Does anybody have a compiled version for x86_64? I’ve tried compiling, but I’m no good with gcc or other scary acronyms.
This guy seems to have worked it out, though: http://tech.darke.net/2011/05/30/compiling-levenshtein-udf-for-mysql-on-a-64bit-os/
Drop a link or something of that blasted 64 *.so. I’d appreciate it. A lot.
I’ll put a note in the original post about MVC++ Redistributable– probably a good idea
As far as compiling it on 64 bit, it shouldn’t be too hard. I might get around to it at some point
Thanks for posting Dependency Walker. It looks like a great tool
I tried downloading and installing and get the dreaded 1126 error.
I have the C++ redistributable package installed.
I then realised the dll is a 32bit library and I am running 64bit windows. AARGH.
Has anybody recompiled for 64bit windows????
Help!
Michael
Due to request I have compiled Win x64 versions, the zip files have been updated
Great post, just had to do some fuzzy searching and I used your help to compile and install UDF on my CentOS 5.x box. Here is the compile string I used on my 64 bit CentOS box.
g++ -fPIC -I /usr/include/mysql/ -o levenshtein.so -shared levenshtein.cc
cp levenshtein.so /usr/lib64/mysql/plugin/
Then from within mysql I installed the plugin.
mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘levenshtein.so’;
Glad you got it working
Thanks for the comment, I’m sure it will help someone else with CentOS