civilcalc Posted August 30, 2011 Share Posted August 30, 2011 Lets say I have some raw data, extracted by pixels or OCR. Applo Baoon Chee5e Hann and I know for sure that the words are Apple, Bacon, Cheese and Ham. Is there a function that can compare the known word Apple with "Applo Baoon Chee5e Hann" and find the best match? Thanks in advance ;-) Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted August 30, 2011 Moderators Share Posted August 30, 2011 civilcalc, This should work as long as there are not too many "nn" for "m" type errors where the corresponding letters get out of sync: Global $aTestArray[5] = [4, "Applo", "Baoon", "Chee5e", "Hann"] Global $aBaseArray[5] = [4, "Cheese", "Ham", "Apple", "Bacon"] ; Loop through our known words For $j = 1 To $aBaseArray[0] ; What are we trying to match? $sBaseText = $aBaseArray[$j] ; Clear the values $iBestMatch = 0 $nBestMatch = 0 ; Now loop through the unknown words For $i = 1 To $aTestArray[0] ; Use the shortest length to avoid index errors $iLen = StringLen($sBaseText) If StringLen($aTestArray[$i]) < $iLen Then $iLen = StringLen($aTestArray[$i]) EndIf ; Clear the counter $nMatch = 0 ; Now compare each letter For $k = 0 To $iLen If StringMid($sBaseText, $k, 1) = StringMid($aTestArray[$i], $k, 1) Then ; And increase the counter if they match $nMatch += 100 / $iLen EndIf Next ; if this is the best match so far then reset the Best values If $nMatch > $nBestMatch Then $nBestMatch = $nMatch $iBestMatch = $i EndIf Next ; And display the result MsgBox(0, "Best Match", $sBaseText & " matches " & $aTestArray[$iBestMatch]) Next I seem to remember something in the Examples section which did this but I cannot find it for the moment (if indeed it exists! ). I will keep searching during the day. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Exit Posted August 30, 2011 Share Posted August 30, 2011 $words = "Applo Baoon Chee5e Hann" $Bestword = _Bestword("Apple", $words) $Bestword = _Bestword("Bacon", $words) $Bestword = _Bestword("Cheese", $words) $Bestword = _Bestword("Ham", $words) Func _Bestword($word, $words) $w = StringSplit($word, "") $s = StringSplit($words, " ") $p = 0 $px = 0 For $i = 1 To $s[0] For $j = 1 To $w[0] If StringInStr($s[$i], $w[$j]) Then $px += 1 Next If $px > $p Then $p = $px $pi = $i EndIf $px = 0 Next $Bestword = $s[$pi] MsgBox(262144, "", "Bestword for '" & $word & "' in" & @LF & $words & @LF & "is '" & $Bestword & "'" & @LF, 0) Return $Bestword EndFunc ;==>_Bestword App: Au3toCmd UDF: _SingleScript() Link to comment Share on other sites More sharing options...
civilcalc Posted August 30, 2011 Author Share Posted August 30, 2011 I did see a function similar to this, but I couldnt find it either, hence the question. Thanks, I'll give it a whirl. Link to comment Share on other sites More sharing options...
Malkey Posted August 30, 2011 Share Posted August 30, 2011 Here is another method. expandcollapse popup#include <String.au3> #include <Array.au3> #include <Math.au3> ; Ref:- ; http://www.autoitscript.com/forum/topic/113591-compare-strings/page__view__findpost__p__795728 Local $sString = "Applo dBaoon Chee5e Hann" Local $numeros = StringSplit($sString, " ", 2) Global $aBaseArray[5] = [4, "Cheese", "Ham", "Apple", "Bacon"] For $i = 1 To $aBaseArray[0] Local $bestMatchIdx = 0, $iDist, $bestMatch = _EditDistance($numeros[0], $aBaseArray[$i]) For $k = 0 To UBound($numeros) - 1 $iDist = _EditDistance($numeros[$k], $aBaseArray[$i]) If $iDist < $bestMatch Then $bestMatch = $iDist $bestMatchIdx = $k EndIf Next MsgBox(0, "Results", StringFormat("Best match for '%s' is '%s' with %i different, non-matching character[s].\n", $aBaseArray[$i], $numeros[$bestMatchIdx], $bestMatch)) Next Func _EditDistance($s1, $s2) Local $m[StringLen($s1) + 1][StringLen($s2) + 1], $i, $j $m[0][0] = 0; boundary conditions For $j = 1 To StringLen($s2) $m[0][$j] = $m[0][$j - 1] + 1; boundary conditions Next For $i = 1 To StringLen($s1) $m[$i][0] = $m[$i - 1][0] + 1; boundary conditions Next For $j = 1 To StringLen($s2); outer loop For $i = 1 To StringLen($s1) ; inner loop If (StringMid($s1, $i, 1) = StringMid($s2, $j, 1)) Then $diag = 0; Else $diag = 1 EndIf $m[$i][$j] = _Min($m[$i - 1][$j] + 1, _ ; insertion (_Min($m[$i][$j - 1] + 1, _ ; deletion $m[$i - 1][$j - 1] + $diag))) ; substitution Next Next Return $m[StringLen($s1)][StringLen($s2)] ; $m ; EndFunc ;==>_EditDistance Link to comment Share on other sites More sharing options...
civilcalc Posted August 30, 2011 Author Share Posted August 30, 2011 (edited) Thank you Malkey that was the only one that did exactly what I wanted it to do, the other two were close, but not accurate enough. thanks to everyone though. ;-) Edited August 30, 2011 by civilcalc Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted August 30, 2011 Moderators Share Posted August 30, 2011 At least the author knew where it was - I had just found it here: M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
civilcalc Posted August 30, 2011 Author Share Posted August 30, 2011 I was still very grateful Melba ;-) Just Malkeys version allows for missing or incorrect letters, your proposal worked, but if the first and last letters were missing I got incorrect results. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now