Levenshtein and Damerau-Levenshtein for MySQL

Levenshtein distance is a useful metric for creating a “Fuzzy” or “Nearest match” search function.

Below you will find resources for compiling and using a Levenshtein user-defined function (UDF) for MySQL on both Windows and Linux.

Credit goes to the following individuals for making this possible:

I have updated the source code so that it can be compiled using Visual C++ 2017

Downloads

Levenshtein distance UDF

Damerau-Levenshtein distance UDF

Compiling the MySQL UDF on Windows

To compile the Visual Studio 2017 solution:

  1. Install Visual Studio Community 2017  and select Visual C++ during installation
  2. Install Windows 10 SDK
  3. Install MySQL Connector/C
  4. Download the Levenshtein or Damerau-Levenshtein source code Visual C++ solution
  5. Open the .sln file
  6. Update the Project ‘Additional Include Directories’ path in Property Pages under CC++/General
    to point to the MySQL Connector’s (downloaded in Step 3) /includes directory for the desired platform (32/64)
  7. Compile

Older resources:

Installing UDF as MySQL plugin on Windows

1. Move the .dll binary to your MySQL plugins directory.  The directory path can be discovered using the following query:

2. Run the following SQL CREATE FUNCTION commands.  Be sure to rename the referenced .dll if necessary.

If you get the following error:

Make sure that you are using the .dll for the correct platform (for example, a Win32 binary won’t work on Win64), and that the correct Visual C++ Redistributable Package (2010, 2015, etc) is installed.

3. The functions should now be accessible in MySQL, try:

Compiling on Linux

To compile these binaries on Linux (tested with Ubuntu)

1. Get gcc (if you don’t have it already)

2. Get the MySQL development library (if you don’t have it already)

3. Change directory (cd) to the location of your Levenshtein source file

4. Compile the binary. Replace ‘levenshtein’ with your source file name if needed (such as damlev.cpp, damlevlim.cpp, etc.)

Installing UDF as MySQL plugin on Linux

1. Move the compiled binary (.so) to your MySQL plugins directory

2. Run the SQL CREATE FUNCTION commands that correspond to your binary

3. The functions should now be accessible in MySQL, try:

49 Comments

  1. Sébastien

    Thanks for the tips & resources !

    How would you use this on whole table entries, ie.
    SELECT Levenshtein(‘my text’,mytable.field, 4) FROM mytable ?

    I would expect in return all entries from the table that are near ‘my text’… is this how it’s supposed to work ?

    Thank you again.

    • sam

      Not exactly, it’s giving you the distance between the two words – you can filter based on that distance, for example:

      This statement will find and display the rows where mytable.field is within a Levenshtein distance of 5 or less of the word “cow”

  2. Roland Bouman

    Hi, great work!!

    Glad you could use my info to get it to work on windows.

    In the pdf, I noticed a typo in my name though – it’s “Roland”, not “Ronald”. Not a biggie, but if you have time to fix the pdf, that’d be great.

    Thanks, and kind regards,

    Roland.

  3. Silvana

    Hello Sam, when i run
    mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘liblevenshtein.dll’;
    its return
    ERROR 1126 (HY000): Can’t open shared library ‘liblevenshtein.dll’ (errno: 0 )
    What could be the problem?

    Thank

    • sam

      Hello Silvana,

      I have seen this happen before if the version of MySQL is older than the version in which the UDF was compiled

      Make sure that you have the latest version of MySQL

        • sam

          which DLL are you trying (by Josh Drew or..)? did you compile it yourself?

          also make sure liblevenshtein.dll is named correctly and is in the right directory, C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin

          • Silvana

            I’m using Joshua Drew’dll. The name is levenshtein.dll (i’d made a mistake writing the name in the previous post) and it is locate in C:\Program Files\MySQL\MySQL Server 5.5\lib\plugin.

  4. Silvana

    I found the problem: A dependent dll can’t be found. I had to install the Microsoft Visual C++ 2010 Redistributable Package because i could not found msvcr100.dll. Thank for all.

  5. nevvermind

    I wish I has scrolled down to see the comment about the “msvcr100.dll” error. -sigh- . I can confirm about the “Microsoft Visual C++ 2010 Redistributable Package” thingy. I used http://www.dependencywalker.com and, sure enough, that dll popped as unavailable. That’s for XP.

    Now on to Ubuntu. 64. Does anybody have a compiled version for x86_64? I’ve tried compiling, but I’m no good with gcc or other scary acronyms.

    This guy seems to have worked it out, though: http://tech.darke.net/2011/05/30/compiling-levenshtein-udf-for-mysql-on-a-64bit-os/

    Drop a link or something of that blasted 64 *.so. I’d appreciate it. A lot.

    • sam

      I’ll put a note in the original post about MVC++ Redistributable– probably a good idea

      As far as compiling it on 64 bit, it shouldn’t be too hard. I might get around to it at some point

      Thanks for posting Dependency Walker. It looks like a great tool

      • Michael

        I tried downloading and installing and get the dreaded 1126 error.

        I have the C++ redistributable package installed.

        I then realised the dll is a 32bit library and I am running 64bit windows. AARGH.

        Has anybody recompiled for 64bit windows????

        Help!

        Michael

  6. Great post, just had to do some fuzzy searching and I used your help to compile and install UDF on my CentOS 5.x box. Here is the compile string I used on my 64 bit CentOS box.

    g++ -fPIC -I /usr/include/mysql/ -o levenshtein.so -shared levenshtein.cc
    cp levenshtein.so /usr/lib64/mysql/plugin/

    Then from within mysql I installed the plugin.
    mysql> CREATE FUNCTION levenshtein RETURNS INTEGER SONAME ‘levenshtein.so’;

  7. Zoltan

    Hi,
    I’m trying to install the Win64 version dll on Windows Server 2003 x64 and the create function says can’t open shared library and the process monitor shows that it tries to open damlev.dll.2.Config which it cannot find, which is the reason wy it fails.
    Is the Windows server 2008 complie cannot be used on Windows server 2003?

    • Zoltan

      Sorry, found the reason. On this server I didn’t have MSVCR100.dll installed. I had copied it over from elsewhere and then it worked.

  8. hashi101

    Hi

    What about unicode characters? If we compare:
    levenshtein(‘ół,’ol’);
    How to develop your code to recognizing these string as identical? For “ó Ó ł Ł” and others: “ą Ą ć Ć ę Ę ś Ś n Ń ź Ź ż Ż”

    • sam

      Unfortunately I’m not sure how to go about doing that

      Maybe try contacting one of the original UDF developers?

  9. Yeva

    I was able to install it locally and on our server (here at work), and it works when I just plug in two arbitrary strings, but it does not return anything when I use it like this:

    SELECT colname FROM table WHERE levenshtein(“string”, colname) <= 5;

    • sam

      Is it possible that none of the colnames are <= 5 in distance? Also try:

      • Yeva

        I’m positive that the distance exists, I tried matching the name ‘Mary’ and our DB has about 500 of them to the string ‘Mary2’. The other query you provided didn’t work either. It seems to only work when I plug in two arbitrary strings, it returns the distance, but selecting fields is not giving me any results.

  10. Zack

    Hi sam,
    i am trying to match a list of names against the company’s DB . I have been using Soundex and was wondering if Levenshtein can be used to find misspelled names in another word compare two fields of cross tables as
    it can be achieved using Soundex :
    select customer.first_name,customer.last_name from customer,records
    WHERE soundex(records.first_name)=soundex(customer.first_name);

    after store Levenshtein implmenation is it possible to do something like this
    select customer.first_name,customer.last_name from customer,records
    WHERE Levenshtein (records.first_name)= Levenshtein (customer.first_name);

    or Levenshtein is just to compare two handy strings.

    • sam

      Zack,

      Unlike Soundex, Levenshtein needs two inputs. Lev is finding the distance between two strings, whereas Soundex is creating a numerical representation of a single string’s sound. You will need a nested query that runs a Lev comparison against every entry in table ‘record’ at every step of ‘customer’. It will also need to specify an acceptable distance to qualify which results appear.

      Here is an example,

      Customers will be displayed when at least one name within a Levenshtein distance of 3 is present in the ‘records’ table.

      For this query to run efficiently, the columns that are being compared should be indexed.

  11. Mark

    Hi Sam,

    Thanks for this. I’ve managed to implement Sean Collins UDFs and it appears to work, but was wondering if you know of how I could combine the fuzzy with a LIKE wildcard search. If I have this data:

    ID | Text
    ———————————————
    1 | let’s find this document
    2 | let’s find this docment
    3 | When the book is closed
    4 | The dcument is locked

    I would want to run a query that would combine the fuzzy (damlev())and wilcard (LIKE) matching together if I wanted all records returned that have the word “document” or variations of it appearing anyway within the Text field. As can be seen the document, or variations of it, can appear anywhere within the Text field.

    Is there a way to do this?

  12. Mario

    Hi,

    I’m trying to install the plugin on a mysql 5.5 server but when i install the plugin using the mysql cli “install plugin damlev SONAME ‘damlev.so’;” i get “ERROR 1127 (HY000): Can’t find symbol ‘_mysql_plugin_interface_version_’ in library” as error. Tried it also on ubuntu 12.10 but the problem is the same. Any hint what could be wrong?

    Thanks for any hint!
    Mario

      • Mario

        Thanks for the reply!

        Yes, I did compile it myself on debian and ubuntu. First thought that it could be a 5.1 -> 5.5 issue, but read that you did compile it for 5.5 successful. So it looks to me, as a non-programmer, that it might miss to include the mysql plugin library or something like that.

        • sam

          It may have something to do with 12.10

          I am able to compile and run it on 12.04.1 LTS

          Although you said you tried it with Debian too? Which version of Debian?

  13. Thanks Sam for this post!
    I was able to compile sources using your tips and mysql plugins works great.
    If it can helps someone, on my Ubuntu 12.04.2 LTS box (x64), I had to add the -fPIC option to compile with no erros.

    Daniele

    P.S.: you should change the .so names in CREATE functions above, for damlevlim and damlevlim256 plugins.

    • sam

      Daniele,

      Glad to hear it

      And thanks for pointing out the naming issue in the CREATE commands- I have amended the original post

  14. Naman

    Hi Sam,

    I am trying to get this UDF up and running on a remote machine, I have copied the dll file to the li/plugin location and I have installed the C++ redistributable package. But I am still getting the following error:

    Error Code : 1126
    Can’t open shared library ‘levenshtein.dll’ (errno: 126 The specified module could not be found.)
    (0 ms taken)

    Please let me know how I can resolve this.

    Thanks for your help

    -Naman

    • sam

      Naman,

      Make sure you are using the correct architecture – 32 or 64 bit. Additionally, make sure the redistributable package that you are using is 2010

      If that is all correct, it’s possible you are missing some other dependency. You can use http://www.dependencywalker.com to try and find that out

      If all else fails you can compile your own DLL using the instructions provided in the original post

      Hope that helps! Let me know what you figure out

      Sam

  15. Peng

    Hi, Sam,

    I am testing UDF in MySQL with your damlev.dll. It works fine. But when I compiled a new dll followed instruction from MySQL manual (using CMake) and your noted Roland PDF file (using VC++ 2010 and Windows 7 SDK), it failed to load and returned ERROR 1126 (HY000): Can’t open shared library ‘xxxx.dll’ (errno 193). Do you have idea what is wrong?

    Thanks,

    Peng

  16. sam

    Hi Peng,

    Look at some of the comments above

    This can mean a few things like wrong architecture or missing MSVC 2010 Redistributable

    You can also use dependency walker to try and track it down

    • sam

      Jose,

      All three functions are actually included in damlev.dll

      That’s a mistake in the original post

      Use these function create commands:

  17. Lars

    Hi Sam

    We are trying to get damlevlim to work with mariadb 10.1 and compiling and initial install works great.
    We run the CREATE FUNCTION part and the plugin is installed and can be used. However we cannot get it to reload when we restart the mariadb server.
    In the error log we get this:
    2016-03-23 10:49:56 140048623790208 [ERROR] Can’t open shared library ‘mysqldamlevlim.so ‘ (errno: 0, /usr/lib64/mysql/plugin/mysqldamlevlim.so )
    Only way then to get it to work is to delete it from mysql.func and run the create again.
    Any hints on what could be wrong?

    • sam

      Hi Lars,

      Unfortunately I haven’t attempted to use any of these with MariaDB

      It doesn’t seem like a compilation issue if you are able to create/run it fine after the service starts. You’ve also checked path and permissions?

      This is kind of a hack but you could try putting the DROP/CREATE FUNCTION in the init_file, so that it runs at service startup

Leave a Reply

Your email address will not be published. Required fields are marked *