Jump to content



Photo

Pattern Matching


  • Please log in to reply
3 replies to this topic

#1 JRowe

JRowe

    Chasing the white rabbits

  • Active Members
  • PipPipPipPipPipPip
  • 1,764 posts

Posted 30 March 2010 - 04:25 AM

A function I rewrote from some php code written by a guy I encountered on freenode #ai irc.

AutoIt         
; #FUNCTION# ;=============================================================================== ; ; Name...........: _VectorDelta ; Description ...: Returns a similarity score between two lists ; Syntax.........: _DateDiff($sType, $sStartDate, $sEndDate) ; Parameters ....: $aDatasetA, $aDatasetB ; Return values .: Success - Similarity score. ;   Failure - potentially encounters division by zero. ; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon" ; Modified.......: ; Remarks .......: Not to be used directly, for use by _Similarity() ; Related .......: _Similarity ; Link ..........; ; Example .......; Yes ; ; ;========================================================================================== Func _VectorDelta($aDatasetA, $aDatasetB)     ;count     Local $iCount = 0     ;return     Local $return = 0     ;temp value     Local $tempValue = 0     ;index     Local $index = 0     ;value     Local $value = 0     ;iterate through each value in $aDatasetA and compare to values in $aDatasetB     ;Iterate comparisons from here...     For $value In $aDatasetA         ;increment index         $index += 1         ;check if index is lesser than or equal to the size of $aDatasetB         If $index <= UBound($aDatasetB) Then             $iCount += 1             $tempValue = $aDatasetB[$index - 1] - $value             $tempValueSquared = $tempValue * $tempValue             $return += $tempValueSquared         EndIf     Next     ;... to here.     ;Check the count of compared dataset pairs, return the square root of the summed comparisons or else 0     If $iCount > 0 Then         If $return > 0 Then             $return = Sqrt($return)         EndIf     EndIf     ;Return the result.     Return $return EndFunc ;==>_VectorDelta ; #FUNCTION# ;=============================================================================== ; ; Name...........: _Similarity ; Description ...: Returns a similarity score between a list of elements and a set of other lists ; Syntax.........: _Similarity($aArrayH, $iIndexA, $iIndexB) ; Parameters ....: $aArrayH, $iIndexA, $iIndexB ; Return values .: Success - Similarity score comparing $aArrayH[$iIndexA] to $aArrayH[$iIndexB] against $iIndexA to each other array. ;   Failure - potentially encounters division by zero. ; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon" ; Modified.......: ; Remarks .......: Compares element to element, doesn't do iterative correlation. ; Related .......: _VectorDelta ; Link ..........; ; Example .......; Yes ; ; ;========================================================================================== Func _Similarity($aArrayH, $iIndexA, $iIndexB)     ;return     Local $return = 0     ;tally     Local $tally = 0     ;Vector delta of A to B     Local $similarityOfAToB = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$iIndexB])     Local $index = 0     ;Iterate through each array, comparing similarity of every array     For $iIndexC In $aArrayH         $index += 1         ;don't include self comparisons in $result         If ($index<> $iIndexA) AND ($index<>$iIndexB) Then             ;increment tally of comparisons             $tally += 1             ;Get Vector Delta of array[A] and array[index-1]             $similarityOfAToList = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$index-1])             ;Get Vector Delta of array[B] and array[index-1]             $similarityOfBToList = _VectorDelta($aArrayH[$iIndexB], $aArrayH[$index-1])             ;increment $return if similarity is greater than A to list             If $similarityOfAToB > $similarityOfAToList Then $return += 1             ;increment $return if similarity is greater than B to list             If $similarityOfAToB > $similarityOfBToList Then $return += 1         EndIf     Next     ;return $return divided by 2 over the number of tallied comparisons     Return 1-($return / 2 / $tally) EndFunc ;==>_Similarity



Example:
AutoIt         
#include "_CorrelativeAnalysis.au3" ;Example ;1,2,3,4 representing up(1) down(2) left(3) and right(4) respectively ;[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] is a line going straight up, for example. ;Dataset for patterns Global $testSet1[16] = [1,1,1,1,4,4,4,4,2,2,2,2,3,3,3,3] Global $testSet2[16] = [1,1,1,1,4,4,4,4,4,4,4,4,1,1,1,1] Global $testSet3[16] = [1,1,1,4,1,1,1,1,1,1,1,1,3,3,3,3] Global $testSet4[16] = [2,2,2,2,3,3,3,3,3,3,3,3,1,1,1,1] Global $testSet5[16] = [3,3,3,3,2,2,2,2,4,4,4,4,1,1,1,1] Global $testSet6[16] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] ;pattern we want to test Global $MatchSet[16] = [1,4,1,1,1,4,4,4,3,2,1,2,3,3,4,1] Global $comparison[7] = [$MatchSet, $testSet1, $testSet2, $testSet3, $testSet4, $testSet5, $testSet6] ConsoleWrite("Similarity to 1: " & _Similarity($comparison, 0, 1) & @CRLF) ConsoleWrite("Similarity to 2: " & _Similarity($comparison, 0, 2) & @CRLF) ConsoleWrite("Similarity to 3: " & _Similarity($comparison, 0, 3) & @CRLF) ConsoleWrite("Similarity to 4: " & _Similarity($comparison, 0, 4) & @CRLF) ConsoleWrite("Similarity to 5: " & _Similarity($comparison, 0, 5) & @CRLF) ConsoleWrite("Similarity to 6: " & _Similarity($comparison, 0, 6) & @CRLF)


This performs element to element matching. This doesn't handle nonlinear data sets... only linear clusters. It will detect similarities between pixel colors in the same position, for example, but it won't detect similarities between a pixel and its neighbors. That requires cycling through iterations and transformations of the data.

In the example, I laid out a set of arbitrary paths that could be seen as input from a mouse gesture. $matchSet is the data being tested against the data set. It returns 91% similarity to the correct match (test set 1) and lower similarity to each other set.

Sets can be weighted by repeated inclusion. You can match against incomplete sets, but the data requires being correctly aligned.

If anyone is interested, I'd really love some help in array manipulation so that this could be used on nonlinear data. Things like facial recognition and feature detection are possible, but I'm not the greatest at matrix manipulation.

Think of this as N-dimensional Venn diagrams. The similarity scores represent the percentage of overlap between each element in each list.

Thanks to Keal for laying this out. This is really a very robust and powerful piece of code.

Attached Files







#2 gazai

gazai

    Seeker

  • Active Members
  • 6 posts

Posted 23 July 2010 - 04:42 PM

Cool!

Can it be used for some nonparametric correlation use (such as ranked correlation)?

Thanks for the script.

#3 Xibalba

Xibalba

    Wayfarer

  • Active Members
  • Pip
  • 52 posts

Posted 29 July 2010 - 10:46 AM

Seems like a nice one,

Would it be possible to breakdown strings to arrays (with Asc() ?), and get a match score?
For example: string1 = "Arnold Schwarzenegger", string2 = "Arnold Shwarseneger" - match about ~80% ?

Or perhaps some other function/UDF would be better for that?

Ty

#4 JRowe

JRowe

    Chasing the white rabbits

  • Active Members
  • PipPipPipPipPipPip
  • 1,764 posts

Posted 04 August 2010 - 01:24 AM

This would be a good use for that. Also, you can cycle through it, so asdArnold Scwharzenjikksdfg would also show a high match (this requires throwing the comparison in a loop and iterating over the offset string, and returning the highest match.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users