Jump to content



Photo

Soundex and levenstein distance algorithms


  • Please log in to reply
2 replies to this topic

#1 Eusebio

Eusebio

    Seeker

  • Active Members
  • 18 posts

Posted 20 July 2006 - 07:50 AM

Anybody has implemented the soundex algorithm for autoIT? I'm insterested in levenstein distance algorithm, too.

Thanks,

Eusebio.







#2 Manadar

Manadar

    Taking a REST.

  • MVPs
  • 10,714 posts

Posted 20 July 2006 - 07:57 AM

Soundex hasn't been. Us AutoIt users rather use the Windows alternative of speaking text (It's in v3 Scripts n Scraps.) Therefore Levenstein distance hasn't been used either.

Edit: Ofcourse, i don't know anything about any private projects that might have..

Edited by Manadar, 20 July 2006 - 07:58 AM.


#3 Eusebio

Eusebio

    Seeker

  • Active Members
  • 18 posts

Posted 22 August 2006 - 10:15 AM

I've created functions if anybody are interested (any comments are welcome)

AutoIt         
;.......................................................................................................................................... ; This function returns an integer number which indicates the Levenshtein-Distance between the two ; argument strings or -1, if one of the argument strings is longer than the limit of 255 characters ; (255 should be more than enough for name or dictionary comparison). ; ; The Levenshtein distance is defined as the minimal number of characters you have to replace, ; insert or delete to transform sString1 into sString2. ; ; The greater the Levenshtein-Distance, the more different the strings are. ; Levenshtein-Distance is named after the Russian scientist Vladimir Levenshtein, ; who devised the algorithm in 1965. ; In its simplest form the function will take only the two strings as parameter and will calculate ; just the number of insert, replace and delete operations needed to transform sString1 into sString2. ; ; If you can't spell or pronounce Levenshtein, the metric is also sometimes called 'edit distance'. ; The Levenshtein distance algorithm has been used in: ; - Spell checking, - Speech recognition, - DNA analysis, - Plagiarism detection . ; ; Reference: [url=http://www.merriampark.com/ld.htm]http://www.merriampark.com/ld.htm[/url] ; ; I added some character 'cleaning' procedures prior to the specific Levenshtein algorithm. ; ; Eusebio Pérez Hurtado ;.......................................................................................................................................... Func _Levenshtein ($sString1, $sString2)     $iStrLen1 = StringLen($sString1)     $iStrLen2 = StringLen($sString2)     If $iStrLen1=0 Then         Return ($iStrLen2)     EndIf     If $iStrLen2=0 Then         Return ($iStrLen1)     EndIf     If ($iStrLen1>255) Then Return (-1) ; see Note at end of function.     If ($iStrLen2>255) Then Return (-1) ; see Note at end of function.     ;..........................................................................................................................................     ; Cleanup procedures, not quite necessary, but useful.     $sString1 = StringUpper($sString1)     $sString1 = _StringClean($sString1,"ÄÅÃÂÁÀ","A")     $sString1 = _StringClean($sString1,"ËÊÉÈ"  ,"E")     $sString1 = _StringClean($sString1,"ÏÎÍÌ"  ,"I")     $sString1 = _StringClean($sString1,"ÒÓÔÕÖ" ,"O")     $sString1 = _StringClean($sString1,"ÜÛÚÙ"  ,"U")     $sString1 = _StringClean($sString1,"Ç","C")     $sString1 = _StringClean($sString1,"Ñ","N")     $sString2 = StringUpper($sString2)     $sString2 = _StringClean($sString2,"ÄÅÃÂÁÀ","A")     $sString2 = _StringClean($sString2,"ËÊÉÈ"  ,"E")     $sString2 = _StringClean($sString2,"ÏÎÍÌ"  ,"I")     $sString2 = _StringClean($sString2,"ÒÓÔÕÖ" ,"O")     $sString2 = _StringClean($sString2,"ÜÛÚÙ"  ,"U")     $sString2 = _StringClean($sString2,"Ç","C")     $sString2 = _StringClean($sString2,"Ñ","N")     $sString1 = _StringClean($sString1,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","",2) ;OjO! aquí quito los numeros también     $sString2 = _StringClean($sString2,"ABCDEFGHIJKLMNOPQRSTUVWXYZ","",2) ;OjO! aquí quito los numeros también     ;..........................................................................................................................................     ; The Levenshtein algorithm         $iStrLen1 = StringLen($sString1)     $iStrLen2 = StringLen($sString2)     Dim $aArray [$iStrLen1+1][$iStrLen2+1]     For $iRow=0 To $iStrLen1        $aArray[$iRow][0] = $iRow     Next     For $iCol=0 To $iStrLen2        $aArray[0][$iCol] = $iCol     Next     For $iRow=1 To $iStrLen1        For $iCol=1 To $iStrLen2           $iCost = StringMid($sString1,$iRow,1) <> Stringmid($sString2,$iCol,1)           $iRowPrev = $iRow-1           $iColPrev = $iCol-1           $aArray[$iRow][$iCol] = _Min3(1+$aArray[$iRowPrev][$iCol],1+$aArray[$iRow][$iColPrev],$iCost+$aArray[$iRowPrev][$iColPrev])        Next     Next     $iDistance = $aArray[$iStrLen1][$iStrLen2]     Return ($iDistance) EndFunc ƒo݊÷ Ù«­¢+Ø)Õ¹Œ}5¥¸Ì ˜ŒÀÌØí¸Ä°˜ŒÀÌØí¸È°˜ŒÀÌØí¸Ì¤(íI•ÑÕɹ́ѡ”5¥¹¥µÕ´½˜€Ì¹Õµ‰•ÉÌ(íÕ͕‰¥¼A•É•è!ÕÉх‘¼(€€€€˜ŒÀÌØíµ¥¸Ì€ô€˜ŒÀÌØí¸Ä(€€€%˜€˜ŒÀÌØí¸È€™±Ð쀘ŒÀÌØíµ¥¸ÌQ¡•¸€˜ŒÀÌØíµ¥¸Ì€ô€˜ŒÀÌØí¸È(€€€%˜€˜ŒÀÌØí¸Ì€™±Ð쀘ŒÀÌØíµ¥¸ÌQ¡•¸€˜ŒÀÌØíµ¥¸Ì€ô€˜ŒÀÌØí¸Ì(%I•ÑÕɸ€ ˜ŒÀÌØíµ¥¸Ì¤)¹‘Õ¹Œƒo݊÷ Ù«­¢+Ø(츸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸(ìM½Õ¹‘•àµ…¹¥ÁÕ±…Ñ¥½¸‰…Í•½¸(ìT¹L¸A…Ñ•¹Ñ̀ÄÈØÄÄØÜ€ ÄäÄर€ÄÐÌÔØØÌ€ ÄäÈȤ(쁉ä5…ɝ…ɕЁ,¸=‘•±°…¹I½‰•ÉЁ ¸IÕÍ͕°(쁅́Õ͕‰äÑ¡”9…Ñ¥½¹…°ɍ¡¥Ù•́…¹I•½É‘́‘µ¥¹¥ÍÑɅѥ½¸€¡9I¤(쀡ÁՉ±¥Í¡•‰ä½¸-¹ÕÑ m-¹ÕÑ¡t¤¸(ì(ìIÕÍ͕°˜ŒÌäí́µ•Ñ¡½¥ÌÕͅ‰±”™½È¹…µ•́™É½´•¹±…¹°…µ•É¥„°Ý•Íѕɸ•ÕɽÁ”½Õ¹ÑÉ¥•̰(쁉ÕЁ‘½•́¹½Ð…ÁÁ±äÝ•±°Ñ¼µ…¹äM±…Ù¥Œ…¹e¥‘‘¥Í ÍÕɹ…µ•Ì(쁅¹¥Ì¹½Ð¥¹‘•Á•¹‘•¹Ð½˜Í•Ù•É…°•Ñ¡¹¥Œ½¹Í¥‘•É…Ñ¥½¹Ì¸(ì(ì]¥Ñ Í½Õ¹‘•ఁѡ”€™ÅÕ½Ðíͽչ™ÅÕ½Ð쁽˜¹…µ•Ì€´Ñ¡”Á¡½¹•Ñ¥ŒÍ½Õ¹Ñ¼‰”•ᅍа¥Ì½‘•¸(ìQ¡¥Ì¥Ì½˜É•…Ё¡•±ÀÍ¥¹”¥Ð…Ù½¥‘́µ½ÍЁÁɽ‰±•µÌ½˜µ¥ÍÍÁ•±±¥¹Ì½È…±Ñ•ɹ…Ñ”ÍÁ•±±¥¹Ì¸(쁽ȁ•á…µÁ±”M¡•ɵ…¸°M¡Õɵ…¸°M¡•ɵ…¸…¹M¡¥É•µ…¸…¹M¡Õɵ…¸…É”¥¹‘•ᕐÑ½•Ñ¡•ȁ…Ì€™ÅÕ½ÐíLØÔԙÅÕ½Ðì¸(ìMÕɹ…µ”Í½Õ¹‘•à¥¹‘•᥹œ¥Ì¹½Ð…±Á¡…‰•Ñ¥…°°‰ÕЁ¥Ì±¥Íѕ‰äÑ¡”±•Ñѕȵ…¹µ¹Õµ‰•ȁ½‘”¸(ì%˜Í•Ù•É…°ÍÕɹ…µ•́¡…Ù”Ñ¡”Í…µ”½‘”°Ñ¡•¥È…ɑ́…É”…ÉɅ¹•…±Á¡…‰•Ñ¥…±±ä‰ä¥Ù•¸¹…µ”¸(ìᅵÁ±”èLØÔԁÉÑ¡ÕȰLØÔԁ   •ÑÍ䰁LØÔԁ ¡…ɱ•̸(ì(ìIÕÍ͕±°M½Õ¹‘•à9…µ”µ5…э¡¥¹œ(ìQ¡”IÕÍ͕±°M½Õ¹‘•à ½‘”…±½É¥Ñ¡´¥Ì‘•Í¥¹•ÁÉ¥µ…É¥±ä™½ÈÕ͔Ý¥Ñ ¹±¥Í ¹…µ•́…¹¥Ì„(ìÁ¡½¹•Ñ¥…±±ä‰…Í•¹…µ”µ…э¡¥¹œµ•Ñ¡½¸Q¡”…±½É¥Ñ¡´½¹Ù•ÉÑ́•… ¹…µ”Ñ¼„™½Õȵ¡…É…Ñ•ȁ½‘”°(ìÝ¡¥ …¸‰”Õ͕Ñ¼¥‘•¹Ñ¥™ä•Åեم±•¹Ð¹…µ•̰…¹¥ÌÍÑÉՍÑÕɕ…́™½±±½Ý́m-¹ÕÑ¡tè(ì€Ä¸I•Ñ…¥¸Ñ¡”™¥ÉÍЁ±•Ñѕȁ½˜Ñ¡”¹…µ”°…¹‘ɽÀ…±°½ÕÉɕ¹•́½˜„°”° °¤°¼°Ô°Ü°ä¥¸½Ñ¡•ȁÁ½Í¥Ñ¥½¹Ì¸(ì€È¸ÍÍ¥¸Ñ¡”™½±±½Ý¥¹œ¹Õµ‰•É́ѼÑ¡”É•µ…¥¹¥¹œ±•ÑѕÉ́…™Ñ•ȁѡ”™¥ÉÍÐè(쀀€ˆ°˜°À°Ø€€€€€€€€€€€€€€ôô€Ä(쀀€Œ°œ°¨°¬°Ä°Ì°à°è€€ôô€È(쀀€°Ð€€€€€€€€€€€€€€€€€€€€ôô€Ì(쀀€°€€€€€€€€€€€€€€€€€€€€€€€ôô€Ð(쀀€´°¸€€€€€€€€€€€€€€€€€€€€ôô€Ô(쀀€È€€€€€€€€€€€€€€€€€€€€€€€ôô€Ø(ì€Ì¸%˜Ñݼ½Èµ½É”±•ÑѕÉÌÝ¥Ñ Ñ¡”Í…µ”½‘”Ý•É”…‘©…•¹Ð¥¸Ñ¡”½É¥¥¹…°¹…µ”€¡‰•™½É”ÍѕÀ€Ä¤€ ˜ŒÌÌ옌ÌÌ옌ÌÌ줰(쁽µ¥Ð…±°‰ÕЁѡ”™¥ÉÍи(ì€Ð¸ ½¹Ù•ÉЁѼÑ¡”™½É´‚E±•ÑѕȰ‘¥¥Ð°‘¥¥Ð°‘¥¥ÒH‰ä…‘‘¥¹œÑɅ¥±¥¹œé•ɽÌ(쀡¥˜Ñ¡•É”…É”±•Í́ѡ…¸Ñ¡É•”‘¥¥Ñ̤°½È‰ä‘ɽÁÁ¥¹œÉ¥¡Ñµ½ÍЁ‘¥¥Ñ́¥ÐÑ¡•É”…É”µ½É”Ñ¡…¸Ñ¡É•”¸(ì(쁽ȁ•á…µÁ±”°Ñ¡”¹…µ•́Õ±•Ȱ…ÕḬ́!¥±‰•Éа-¹ÕÑ …¹1±½å…É”¥Ù•¸Ñ¡”É•ÍÁ•Ñ¥Ù”½‘•Ì(ìÐØÀ°ÈÀÀ° ÐÄØ°,ÔÌÀ°0ÌÀÀ¸(ì!½Ý•ٕȰÑ¡”…±½É¥Ñ¡´…±Í¼¥Ù•́ѡ”Í…µ”½‘•́™½È(쁱±•É䰁¡½Í °!•¥±‰É½¹¸°-…¹Ð…¹1…‘m-¹ÕÑ¡tÝ¡¥ …É”¹½ÐÉ•±…Ñ•¥¸É•…±¥Ñä¸(ì(ìm-¹ÕÑ¡t聸¸-¹ÕÑ °Q¡”ÉЁ=˜ ½µÁÕѕȁAɽÉ…µµ¥¹œ°Y½°¸€Ì°M½ÉÑ¥¹œ…¹M•…ɍ¡¥¹œ°‘‘¥Í½¸]•ͱ•䰁ÁÀÌäÄ´Ìäȸ(ì(ìÕ͕‰¥¼A•É•è!ÕÉх‘¼(츸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸¸)Õ¹Œ}M½Õ¹‘•à ˜ŒÀÌØíÍMÑÉ¥¹œ¤(%%˜MÑÉ¥¹1•¸ ˜ŒÀÌØíÍMÑÉ¥¹œ¤ôÀQ¡•¸€($%I•ÑÕɸ€ ™ÅÕ½Ðì™ÅÕ½Ðì¤(%¹‘%˜($˜ŒÀÌØíÍMÑÉ¥¹œ€ôMÑÉ¥¹UÁÁ•È ˜ŒÀÌØíÍMÑÉ¥¹œ¤(($ìI•Ñ…¥¸Ñ¡”™¥ÉÍЁ±•Ñѕȁ½˜Ñ¡”¹…µ”¸($˜ŒÀÌØíÍ ¡…É¥ÉÍЀôMÑÉ¥¹5¥ ˜ŒÀÌØíÍMÑÉ¥¹œ°Ä°Ä¤(($ìÍÁ•¥…°ÁɔµÁɽ•ÍÍ¥¹œ™½È•ɵ…¸±…¹Õ…”($˜ŒÀÌØíÍMÑÉ¥¹œ€€ôMÑÉ¥¹I•Á±…” ˜ŒÀÌØíÍMÑÉ¥¹œ°€™ÅÕ½ÐíM  ™ÅÕ½Ð찀™ÅÕ½ÐíL™ÅÕ½Ð준쁝•ɵ…¸ÍÁ•¥…°€™ÅÕ½Ðí͍ ™ÅÕ½Ðì($˜ŒÀÌØíÍMÑÉ¥¹œ€€ôMÑÉ¥¹I•Á±…” ˜ŒÀÌØíÍMÑÉ¥¹œ°€™ÅÕ½Ðï|™ÅÕ½Ð찀™ÅÕ½ÐíL™ÅÕ½Ð준€€ì•ɵ…¸ÍÁ•¥…°Í¡…ÉÀµÌ€™ÅÕ½Ðï|™ÅÕ½Ðì(($ìɽÀ…±°½ÕÉɕ¹•́½˜°° °$°

Edited by Eusebio, 22 August 2006 - 10:32 AM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users