Jump to content
Eusebio

Soundex and levenstein distance algorithms

Recommended Posts

jvanegmond

Soundex hasn't been. Us AutoIt users rather use the Windows alternative of speaking text (It's in v3 Scripts n Scraps.) Therefore Levenstein distance hasn't been used either.

Edit: Ofcourse, i don't know anything about any private projects that might have..

Edited by Manadar

Share this post


Link to post
Share on other sites
Eusebio

I've created functions if anybody are interested (any comments are welcome)

;..........................................................................................................................................
; This function returns an integer number which indicates the Levenshtein-Distance between the two
; argument strings or -1, if one of the argument strings is longer than the limit of 255 characters
; (255 should be more than enough for name or dictionary comparison).
;
; The Levenshtein distance is defined as the minimal number of characters you have to replace,
; insert or delete to transform sString1 into sString2.
;
; The greater the Levenshtein-Distance, the more different the strings are.
; Levenshtein-Distance is named after the Russian scientist Vladimir Levenshtein,
; who devised the algorithm in 1965.
; In its simplest form the function will take only the two strings as parameter and will calculate
; just the number of insert, replace and delete operations needed to transform sString1 into sString2.
;
; If you can't spell or pronounce Levenshtein, the metric is also sometimes called 'edit distance'.
; The Levenshtein distance algorithm has been used in:
; - Spell checking, - Speech recognition, - DNA analysis, - Plagiarism detection .
;
; Reference: [url=http://www.merriampark.com/ld.htm]http://www.merriampark.com/ld.htm[/url]
;
; I added some character 'cleaning' procedures prior to the specific Levenshtein algorithm.
;
; Eusebio Pérez Hurtado
;..........................................................................................................................................

Func _Levenshtein ($sString1, $sString2)
    $iStrLen1 = StringLen($sString1)
    $iStrLen2 = StringLen($sString2)

    If $iStrLen1=0 Then 
        Return ($iStrLen2)
    EndIf
    If $iStrLen2=0 Then 
        Return ($iStrLen1)
    EndIf

    If ($iStrLen1>255) Then Return (-1) ; see Note at end of function.
    If ($iStrLen2>255) Then Return (-1) ; see Note at end of function.

    ;..........................................................................................................................................
    ; Cleanup procedures, not quite necessary, but useful.
    $sString1 = StringUpper($sString1)
    $sString1 = _StringClean($sString1,"ÄÅÃÂÁÀ","A")
    $sString1 = _StringClean($sString1,"ËÊÉÈ"  ,"E")
    $sString1 = _StringClean($sString1,"ÏÎÍÌ"  ,"I")
    $sString1 = _StringClean($sString1,"ÒÓÔÕÖ" ,"O")
    $sString1 = _StringClean($sString1,"ÜÛÚÙ"  ,"U")
    $sString1 = _StringClean($sString1,"Ç","C")
    $sString1 = _StringClean($sString1,"Ñ","N")

    $sString2 = StringUpper($sString2)
    $sString2 = _StringClean($sString2,"ÄÅÃÂÁÀ","A")
    $sString2 = _StringClean($sString2,"ËÊÉÈ"  ,"E")
    $sString2 = _StringClean($sString2,"ÏÎÍÌ"  ,"I")
    $sString2 = _StringClean($sString2,"ÒÓÔÕÖ" ,"O")
    $sString2 = _StringClean($sString2,"ÜÛÚÙ"  ,"U")
    $sString2 = _StringClean($sString2,"Ç","C")
    $sString2 = _StringClean($sString2,"Ñ","N")

    $sString1 = _StringClean($sString1,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","",2) ;OjO! aquí quito los numeros también
    $sString2 = _StringClean($sString2,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","",2) ;OjO! aquí quito los numeros también
    ;..........................................................................................................................................

    ; The Levenshtein algorithm
 
Edited by Eusebio

WhoIsYouTube Video downloaderSoundex - SoundexEx - Levenshtein Distance (algorithms)[font="Arial"]I3osé[/font][font="Arial"]AutoIT[/font]

Share this post


Link to post
Share on other sites
Zinthose

Shame... Looks like the _StringClean function got corrupted at some point.    I was looking forward to playing around with this.

Anyone happen to have retained a copy that wasn't garbled by the site?


--- TTFN

Share this post


Link to post
Share on other sites
jchd

>This post should be able to help you there.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×