Sign in to follow this  
Followers 0
civilcalc

Find best match of a string

8 posts in this topic

Lets say I have some raw data, extracted by pixels or OCR.

Applo

Baoon

Chee5e

Hann

and I know for sure that the words are Apple, Bacon, Cheese and Ham.

Is there a function that can compare the known word Apple with "Applo Baoon Chee5e Hann" and find the best match?

Thanks in advance ;-)

Share this post


Link to post
Share on other sites



civilcalc,

This should work as long as there are not too many "nn" for "m" type errors where the corresponding letters get out of sync: :)

Global $aTestArray[5] = [4, "Applo", "Baoon", "Chee5e", "Hann"]
Global $aBaseArray[5] = [4, "Cheese", "Ham", "Apple", "Bacon"]
; Loop through our known words
For $j = 1 To $aBaseArray[0]
; What are we trying to match?
$sBaseText = $aBaseArray[$j]
; Clear the values
$iBestMatch = 0
$nBestMatch = 0
; Now loop through the unknown words
For $i = 1 To $aTestArray[0]
  ; Use the shortest length to avoid index errors
  $iLen = StringLen($sBaseText)
  If StringLen($aTestArray[$i]) < $iLen Then
   $iLen = StringLen($aTestArray[$i])
  EndIf
  ; Clear the counter
  $nMatch = 0
  ; Now compare each letter
  For $k = 0 To $iLen
   If StringMid($sBaseText, $k, 1) = StringMid($aTestArray[$i], $k, 1) Then
    ; And increase the counter if they match
    $nMatch += 100 / $iLen
   EndIf
  Next
  ; if this is the best match so far then reset the Best values
  If $nMatch > $nBestMatch Then
   $nBestMatch = $nMatch
   $iBestMatch = $i
  EndIf
Next
; And display the result
MsgBox(0, "Best Match", $sBaseText & " matches " & $aTestArray[$iBestMatch])
Next

I seem to remember something in the Examples section which did this but I cannot find it for the moment (if indeed it exists! :)). I will keep searching during the day. :mellow:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

$words = "Applo Baoon Chee5e Hann"
$Bestword = _Bestword("Apple", $words)
$Bestword = _Bestword("Bacon", $words)
$Bestword = _Bestword("Cheese", $words)
$Bestword = _Bestword("Ham", $words)
Func _Bestword($word, $words)
 $w = StringSplit($word, "")
 $s = StringSplit($words, " ")
 $p = 0
 $px = 0
 For $i = 1 To $s[0]
  For $j = 1 To $w[0]
   If StringInStr($s[$i], $w[$j]) Then $px += 1
  Next
  If $px > $p Then
   $p = $px
   $pi = $i
  EndIf
  $px = 0
 Next
 $Bestword = $s[$pi]
 MsgBox(262144, "", "Bestword for '" & $word & "' in" & @LF & $words & @LF & "is '" & $Bestword & "'" & @LF, 0)
 Return $Bestword
EndFunc   ;==>_Bestword


My UDFs, Abbrevs and Snippets
If you like my post, just click the like button :) 

Share this post


Link to post
Share on other sites

I did see a function similar to this, but I couldnt find it either, hence the question. Thanks, I'll give it a whirl.

Share this post


Link to post
Share on other sites

Here is another method.

#include <String.au3>
#include <Array.au3>
#include <Math.au3>
 
; Ref:-
; http://www.autoitscript.com/forum/topic/113591-compare-strings/page__view__findpost__p__795728
 
Local $sString = "Applo dBaoon Chee5e Hann"
Local $numeros = StringSplit($sString, " ", 2)
 
Global $aBaseArray[5] = [4, "Cheese", "Ham", "Apple", "Bacon"]
 
For $i = 1 To $aBaseArray[0]
    Local $bestMatchIdx = 0, $iDist, $bestMatch = _EditDistance($numeros[0], $aBaseArray[$i])
 
    For $k = 0 To UBound($numeros) - 1
        $iDist = _EditDistance($numeros[$k], $aBaseArray[$i])
        If $iDist < $bestMatch Then
            $bestMatch = $iDist
            $bestMatchIdx = $k
        EndIf
    Next
 
    MsgBox(0, "Results", StringFormat("Best match for '%s' is '%s' with %i different, non-matching character[s].\n", $aBaseArray[$i], $numeros[$bestMatchIdx], $bestMatch))
Next
 
 
 
Func _EditDistance($s1, $s2)
    Local $m[StringLen($s1) + 1][StringLen($s2) + 1], $i, $j
    $m[0][0] = 0; boundary conditions
    For $j = 1 To StringLen($s2)
        $m[0][$j] = $m[0][$j - 1] + 1; boundary conditions
    Next
    For $i = 1 To StringLen($s1)
        $m[$i][0] = $m[$i - 1][0] + 1; boundary conditions
    Next
    For $j = 1 To StringLen($s2); outer loop
        For $i = 1 To StringLen($s1) ; inner loop
            If (StringMid($s1, $i, 1) = StringMid($s2, $j, 1)) Then
                $diag = 0;
            Else
                $diag = 1
            EndIf
            $m[$i][$j] = _Min($m[$i - 1][$j] + 1, _ ; insertion
                    (_Min($m[$i][$j - 1] + 1, _ ; deletion
                    $m[$i - 1][$j - 1] + $diag))) ; substitution
        Next
    Next
    Return $m[StringLen($s1)][StringLen($s2)] ; $m ;
EndFunc   ;==>_EditDistance

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Thank you Malkey that was the only one that did exactly what I wanted it to do, the other two were close, but not accurate enough. thanks to everyone though. ;-)

Edited by civilcalc

Share this post


Link to post
Share on other sites

At least the author knew where it was - I had just found it here: :mellow:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

I was still very grateful Melba ;-) Just Malkeys version allows for missing or incorrect letters, your proposal worked, but if the first and last letters were missing I got incorrect results.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0