Sign in to follow this  
Followers 0
JRowe

Pattern Matching

4 posts in this topic

A function I rewrote from some php code written by a guy I encountered on freenode #ai irc.

; #FUNCTION# ;===============================================================================
;
; Name...........: _VectorDelta
; Description ...: Returns a similarity score between two lists
; Syntax.........: _DateDiff($sType, $sStartDate, $sEndDate)
; Parameters ....: $aDatasetA, $aDatasetB
; Return values .: Success - Similarity score.
;   Failure - potentially encounters division by zero.
; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon"
; Modified.......:
; Remarks .......: Not to be used directly, for use by _Similarity()
; Related .......: _Similarity
; Link ..........;
; Example .......; Yes
;
; ;==========================================================================================
Func _VectorDelta($aDatasetA, $aDatasetB)
    ;count
    Local $iCount = 0
    ;return
    Local $return = 0
    ;temp value
    Local $tempValue = 0
    ;index
    Local $index = 0
    ;value
    Local $value = 0

    ;iterate through each value in $aDatasetA and compare to values in $aDatasetB
    ;Iterate comparisons from here...
    For $value In $aDatasetA
        ;increment index
        $index += 1
        ;check if index is lesser than or equal to the size of $aDatasetB
        If $index <= UBound($aDatasetB) Then
            $iCount += 1
            $tempValue = $aDatasetB[$index - 1] - $value
            $tempValueSquared = $tempValue * $tempValue
            $return += $tempValueSquared
        EndIf
    Next
    ;... to here.

    ;Check the count of compared dataset pairs, return the square root of the summed comparisons or else 0
    If $iCount > 0 Then
        If $return > 0 Then
            $return = Sqrt($return)
        EndIf
    EndIf

    ;Return the result.
    Return $return
EndFunc ;==>_VectorDelta

; #FUNCTION# ;===============================================================================
;
; Name...........: _Similarity
; Description ...: Returns a similarity score between a list of elements and a set of other lists
; Syntax.........: _Similarity($aArrayH, $iIndexA, $iIndexB)
; Parameters ....: $aArrayH, $iIndexA, $iIndexB
; Return values .: Success - Similarity score comparing $aArrayH[$iIndexA] to $aArrayH[$iIndexB] against $iIndexA to each other array.
;   Failure - potentially encounters division by zero.
; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon"
; Modified.......:
; Remarks .......: Compares element to element, doesn't do iterative correlation.
; Related .......: _VectorDelta
; Link ..........;
; Example .......; Yes
;
; ;==========================================================================================
Func _Similarity($aArrayH, $iIndexA, $iIndexB)
    ;return
    Local $return = 0
    ;tally
    Local $tally = 0
    ;Vector delta of A to B
    Local $similarityOfAToB = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$iIndexB])
    Local $index = 0

    ;Iterate through each array, comparing similarity of every array
    For $iIndexC In $aArrayH
        $index += 1
        ;don't include self comparisons in $result
        If ($index<> $iIndexA) AND ($index<>$iIndexB) Then
            ;increment tally of comparisons
            $tally += 1
            ;Get Vector Delta of array[A] and array[index-1]
            $similarityOfAToList = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$index-1])
            ;Get Vector Delta of array[B] and array[index-1]
            $similarityOfBToList = _VectorDelta($aArrayH[$iIndexB], $aArrayH[$index-1])
            ;increment $return if similarity is greater than A to list
            If $similarityOfAToB > $similarityOfAToList Then $return += 1
            ;increment $return if similarity is greater than B to list
            If $similarityOfAToB > $similarityOfBToList Then $return += 1
        EndIf
    Next
    ;return $return divided by 2 over the number of tallied comparisons
    Return 1-($return / 2 / $tally)
EndFunc ;==>_Similarity

Example:

#include "_CorrelativeAnalysis.au3"
;Example

;1,2,3,4 representing up(1) down(2) left(3) and right(4) respectively
;[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] is a line going straight up, for example.

;Dataset for patterns
Global $testSet1[16] = [1,1,1,1,4,4,4,4,2,2,2,2,3,3,3,3]
Global $testSet2[16] = [1,1,1,1,4,4,4,4,4,4,4,4,1,1,1,1]
Global $testSet3[16] = [1,1,1,4,1,1,1,1,1,1,1,1,3,3,3,3]
Global $testSet4[16] = [2,2,2,2,3,3,3,3,3,3,3,3,1,1,1,1]
Global $testSet5[16] = [3,3,3,3,2,2,2,2,4,4,4,4,1,1,1,1]
Global $testSet6[16] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

;pattern we want to test
Global $MatchSet[16] = [1,4,1,1,1,4,4,4,3,2,1,2,3,3,4,1]

Global $comparison[7] = [$MatchSet, $testSet1, $testSet2, $testSet3, $testSet4, $testSet5, $testSet6]

ConsoleWrite("Similarity to 1: " & _Similarity($comparison, 0, 1) & @CRLF)
ConsoleWrite("Similarity to 2: " & _Similarity($comparison, 0, 2) & @CRLF)
ConsoleWrite("Similarity to 3: " & _Similarity($comparison, 0, 3) & @CRLF)
ConsoleWrite("Similarity to 4: " & _Similarity($comparison, 0, 4) & @CRLF)
ConsoleWrite("Similarity to 5: " & _Similarity($comparison, 0, 5) & @CRLF)
ConsoleWrite("Similarity to 6: " & _Similarity($comparison, 0, 6) & @CRLF)

This performs element to element matching. This doesn't handle nonlinear data sets... only linear clusters. It will detect similarities between pixel colors in the same position, for example, but it won't detect similarities between a pixel and its neighbors. That requires cycling through iterations and transformations of the data.

In the example, I laid out a set of arbitrary paths that could be seen as input from a mouse gesture. $matchSet is the data being tested against the data set. It returns 91% similarity to the correct match (test set 1) and lower similarity to each other set.

Sets can be weighted by repeated inclusion. You can match against incomplete sets, but the data requires being correctly aligned.

If anyone is interested, I'd really love some help in array manipulation so that this could be used on nonlinear data. Things like facial recognition and feature detection are possible, but I'm not the greatest at matrix manipulation.

Think of this as N-dimensional Venn diagrams. The similarity scores represent the percentage of overlap between each element in each list.

Thanks to Keal for laying this out. This is really a very robust and powerful piece of code.

_CorrelativeAnalysis.au3

Share this post


Link to post
Share on other sites



Cool!

Can it be used for some nonparametric correlation use (such as ranked correlation)?

Thanks for the script.

Share this post


Link to post
Share on other sites

Seems like a nice one,

Would it be possible to breakdown strings to arrays (with Asc() ?), and get a match score?

For example: string1 = "Arnold Schwarzenegger", string2 = "Arnold Shwarseneger" - match about ~80% ?

Or perhaps some other function/UDF would be better for that?

Ty

Share this post


Link to post
Share on other sites

This would be a good use for that. Also, you can cycle through it, so asdArnold Scwharzenjikksdfg would also show a high match (this requires throwing the comparison in a loop and iterating over the offset string, and returning the highest match.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0