Jump to content

Soundex and levenstein distance algorithms


Recommended Posts

  • 1 month later...

I've created functions if anybody are interested (any comments are welcome)

;..........................................................................................................................................
; This function returns an integer number which indicates the Levenshtein-Distance between the two
; argument strings or -1, if one of the argument strings is longer than the limit of 255 characters
; (255 should be more than enough for name or dictionary comparison).
;
; The Levenshtein distance is defined as the minimal number of characters you have to replace,
; insert or delete to transform sString1 into sString2.
;
; The greater the Levenshtein-Distance, the more different the strings are.
; Levenshtein-Distance is named after the Russian scientist Vladimir Levenshtein,
; who devised the algorithm in 1965.
; In its simplest form the function will take only the two strings as parameter and will calculate
; just the number of insert, replace and delete operations needed to transform sString1 into sString2.
;
; If you can't spell or pronounce Levenshtein, the metric is also sometimes called 'edit distance'.
; The Levenshtein distance algorithm has been used in:
; - Spell checking, - Speech recognition, - DNA analysis, - Plagiarism detection .
;
; Reference: [url=http://www.merriampark.com/ld.htm]http://www.merriampark.com/ld.htm[/url]
;
; I added some character 'cleaning' procedures prior to the specific Levenshtein algorithm.
;
; Eusebio Pérez Hurtado
;..........................................................................................................................................

Func _Levenshtein ($sString1, $sString2)
    $iStrLen1 = StringLen($sString1)
    $iStrLen2 = StringLen($sString2)

    If $iStrLen1=0 Then 
        Return ($iStrLen2)
    EndIf
    If $iStrLen2=0 Then 
        Return ($iStrLen1)
    EndIf

    If ($iStrLen1>255) Then Return (-1) ; see Note at end of function.
    If ($iStrLen2>255) Then Return (-1) ; see Note at end of function.

    ;..........................................................................................................................................
    ; Cleanup procedures, not quite necessary, but useful.
    $sString1 = StringUpper($sString1)
    $sString1 = _StringClean($sString1,"ÄÅÃÂÁÀ","A")
    $sString1 = _StringClean($sString1,"ËÊÉÈ"  ,"E")
    $sString1 = _StringClean($sString1,"ÏÎÍÌ"  ,"I")
    $sString1 = _StringClean($sString1,"ÒÓÔÕÖ" ,"O")
    $sString1 = _StringClean($sString1,"ÜÛÚÙ"  ,"U")
    $sString1 = _StringClean($sString1,"Ç","C")
    $sString1 = _StringClean($sString1,"Ñ","N")

    $sString2 = StringUpper($sString2)
    $sString2 = _StringClean($sString2,"ÄÅÃÂÁÀ","A")
    $sString2 = _StringClean($sString2,"ËÊÉÈ"  ,"E")
    $sString2 = _StringClean($sString2,"ÏÎÍÌ"  ,"I")
    $sString2 = _StringClean($sString2,"ÒÓÔÕÖ" ,"O")
    $sString2 = _StringClean($sString2,"ÜÛÚÙ"  ,"U")
    $sString2 = _StringClean($sString2,"Ç","C")
    $sString2 = _StringClean($sString2,"Ñ","N")

    $sString1 = _StringClean($sString1,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","",2) ;OjO! aquí quito los numeros también
    $sString2 = _StringClean($sString2,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","",2) ;OjO! aquí quito los numeros también
    ;..........................................................................................................................................

    ; The Levenshtein algorithm
    
    $iStrLen1 = StringLen($sString1)
    $iStrLen2 = StringLen($sString2)

    Dim $aArray [$iStrLen1+1][$iStrLen2+1]

    For $iRow=0 To $iStrLen1
       $aArray[$iRow][0] = $iRow
    Next

    For $iCol=0 To $iStrLen2
       $aArray[0][$iCol] = $iCol
    Next

    For $iRow=1 To $iStrLen1
       For $iCol=1 To $iStrLen2
          $iCost = StringMid($sString1,$iRow,1) <> Stringmid($sString2,$iCol,1)
          $iRowPrev = $iRow-1
          $iColPrev = $iCol-1
          $aArray[$iRow][$iCol] = _Min3(1+$aArray[$iRowPrev][$iCol],1+$aArray[$iRow][$iColPrev],$iCost+$aArray[$iRowPrev][$iColPrev])
       Next
    Next
    $iDistance = $aArray[$iStrLen1][$iStrLen2]

    Return ($iDistance)
EndFunc
oÝ÷ Ù«­¢+Ø)Õ¹}5¥¸Ì ÀÌØí¸Ä°ÀÌØí¸È°ÀÌØí¸Ì¤(íIÑÕɹÌÑ¡5¥¹¥µÕ´½Ì¹ÕµÉÌ(íÕÍ¥¼AÉè!ÕÉѼ(ÀÌØíµ¥¸ÌôÀÌØí¸Ä(%ÀÌØí¸È±ÐìÀÌØíµ¥¸ÌQ¡¸ÀÌØíµ¥¸ÌôÀÌØí¸È(%ÀÌØí¸Ì±ÐìÀÌØíµ¥¸ÌQ¡¸ÀÌØíµ¥¸ÌôÀÌØí¸Ì(%IÑÕɸ ÀÌØíµ¥¸Ì¤)¹Õ¹oÝ÷ Ù«­¢+Ø(츸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸(ìM½Õ¹àµ¹¥Áձѥ½¸Í½¸(ìT¹L¸AѹÑÌÄÈØÄÄØÜ ÄäÄरÄÐÌÔØØÌ ÄäÈȤ(ìä5ÉÉÐ,¸=±°¹I½ÉиIÕÍÍ°(ìÌÕÍäÑ¡9Ñ¥½¹°É¡¥Ù̹I½É̵¥¹¥ÍÑÉÑ¥½¸¡9I¤(ì¡ÁÕ±¥Í¡ä½¸-¹ÕÑ m-¹ÕÑ¡t¤¸(ì(ìIÕÍÍ°Ìäí̵ѡ½¥ÌÕͱ½È¹µÌɽ´¹±¹°µÉ¥°ÝÍÑɸÕɽÁ½Õ¹Ñɥ̰(ìÕн̹½ÐÁÁ±äݱ°Ñ¼µ¹äM±Ù¥¹e¥¥Í ÍÕɹµÌ(칥̹½Ð¥¹Á¹¹Ð½ÍÙÉ°Ñ¡¹¥½¹Í¥ÉÑ¥½¹Ì¸(ì(ì]¥Ñ ͽչà°Ñ¡ÅÕ½ÐíͽչÅÕ½Ðì½¹µÌ´Ñ¡Á¡½¹Ñ¥Í½Õ¹Ñ¼áа¥Ì½¸(ìQ¡¥Ì¥Ì½ÉС±ÀÍ¥¹¥ÐÙ½¥Ìµ½ÍÐÁɽ±µÌ½µ¥ÍÍÁ±±¥¹Ì½È±ÑɹÑÍÁ±±¥¹Ì¸(ì½ÈáµÁ±M¡Éµ¸°M¡Õɵ¸°M¡Éµ¸¹M¡¥Éµ¸¹M¡Õɵ¸É¥¹áѽѡÈÌÅÕ½ÐíLØÔÔÅÕ½Ðì¸(ìMÕɹµÍ½Õ¹à¥¹á¥¹¥Ì¹½Ð±Á¡Ñ¥°°ÕХ̱¥ÍÑäÑ¡±ÑÑȵ¹µ¹ÕµÈ½¸(ì%ÍÙÉ°ÍÕɹµÌ¡Ùѡ͵½°Ñ¡¥ÈÉÌÉÉɹ±Á¡Ñ¥±±ää¥Ù¸¹µ¸(ìáµÁ±èLØÔÔÉÑ¡ÕÈ°LØÔÔ    ÑÍä°LØÔÔ
¡É±Ì¸(ì(ìIÕÍͱ°M½Õ¹à9µµ5Ñ¡¥¹(ìQ¡IÕÍͱ°M½Õ¹à
½±½É¥Ñ¡´¥ÌÍ¥¹ÁÉ¥µÉ¥±ä½ÈÕÍÝ¥Ñ ¹±¥Í ¹µÌ¹¥Ì(ìÁ¡½¹Ñ¥±±ä͹µµÑ¡¥¹µÑ¡½¸Q¡±½É¥Ñ¡´½¹ÙÉÑÌ ¹µÑ¼½Õȵ¡ÉÑȽ°(ìÝ¡¥ ¸ÕÍѼ¥¹Ñ¥äÅÕ¥Ù±¹Ð¹µÌ°¹¥ÌÍÑÉÕÑÕÉ̽±±½ÝÌm-¹ÕÑ¡tè(ìĸIÑ¥¸Ñ¡¥ÉÍбÑÑȽѡ¹µ°¹É½À±°½ÕÉɹ̽°° °¤°¼°Ô°Ü°ä¥¸½Ñ¡ÈÁ½Í¥Ñ¥½¹Ì¸(ìȸÍÍ¥¸Ñ¡½±±½Ý¥¹¹ÕµÉÌѼѡɵ¥¹¥¹±ÑÑÉÌÑÈÑ¡¥ÉÍÐè(ì°°À°ØôôÄ(ì°°¨°¬°Ä°Ì°à°èôôÈ(ì°ÐôôÌ(ì°ôôÐ(ì´°¸ôôÔ(ìÈôôØ(ì̸%Ñݼ½Èµ½É±ÑÑÉÌÝ¥Ñ Ñ¡Íµ½ÝÉ©¹Ð¥¸Ñ¡½É¥¥¹°¹µ¡½ÉÍÑÀĤ ÌÌìÌÌìÌÌ줰(ì½µ¥Ð±°ÕÐÑ¡¥ÉÍи(ìи
½¹ÙÉÐѼѡ½É´E±ÑÑÈ°¥¥Ð°¥¥Ð°¥¥ÒH䥹ÑÉ¥±¥¹éɽÌ(졥ѡÉɱÍÌÑ¡¸Ñ¡É¥¥Ñ̤°½ÈäɽÁÁ¥¹É¥¡Ñµ½ÍÐ¥¥ÑÌ¥ÐÑ¡Éɵ½ÉÑ¡¸Ñ¡É¸(ì(ì½ÈáµÁ±°Ñ¡¹µÌձȰÕÍÌ°!¥±Éа-¹ÕÑ ¹1±½åɥٸѡÉÍÁÑ¥Ù½Ì(ìÐØÀ°ÈÀÀ° ÐÄØ°,ÔÌÀ°0ÌÀÀ¸(ì!½ÝÙÈ°Ñ¡±½É¥Ñ¡´±Í¼¥ÙÌѡ͵½Ì½È(ì±±Éä°¡½Í °!¥±É½¹¸°-¹Ð¹1m-¹ÕÑ¡tÝ¡¥ ɹ½Ðɱѥ¸É±¥Ñä¸(ì(ìm-¹ÕÑ¡t踸-¹ÕÑ °Q¡ÉÐ=
½µÁÕÑÈAɽɵµ¥¹°Y½°¸Ì°M½ÉÑ¥¹¹MÉ¡¥¹°¥Í½¸]ͱä°ÁÀÌäÄ´Ìäȸ(ì(ìÕÍ¥¼AÉè!ÕÉѼ(츸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸)Õ¹}M½Õ¹à ÀÌØíÍMÑÉ¥¹¤(%%MÑÉ¥¹1¸ ÀÌØíÍMÑÉ¥¹¤ôÀQ¡¸($%IÑÕɸ ÅÕ½ÐìÅÕ½Ðì¤(%¹%($ÀÌØíÍMÑÉ¥¹ôMÑÉ¥¹UÁÁÈ ÀÌØíÍMÑÉ¥¹¤(($ìIÑ¥¸Ñ¡¥ÉÍбÑÑȽѡ¹µ¸($ÀÌØíÍ
¡É¥ÉÍÐôMÑÉ¥¹5¥ ÀÌØíÍMÑÉ¥¹°Ä°Ä¤(($ìÍÁ¥°ÁɵÁɽÍÍ¥¹½Èɵ¸±¹Õ($ÀÌØíÍMÑÉ¥¹ôMÑÉ¥¹IÁ± ÀÌØíÍMÑÉ¥¹°ÅÕ½ÐíM
 ÅÕ½Ðì°ÅÕ½ÐíLÅÕ½Ðì¤ìɵ¸ÍÁ¥°ÅÕ½ÐíÍ ÅÕ½Ðì($ÀÌØíÍMÑÉ¥¹ôMÑÉ¥¹IÁ± ÀÌØíÍMÑÉ¥¹°ÅÕ½Ðï|ÅÕ½Ðì°ÅÕ½ÐíLÅÕ½Ðì¤ìɵ¸ÍÁ¥°Í¡ÉÀµÌÅÕ½Ðï|ÅÕ½Ðì(($ìɽÀ±°½ÕÉɹ̽°° °$°
Edited by Eusebio
WhoIsYouTube Video downloaderSoundex - SoundexEx - Levenshtein Distance (algorithms)[font="Arial"]I3osé[/font][font="Arial"]AutoIT[/font]
Link to comment
Share on other sites

  • 8 years later...

>This post should be able to help you there.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...