; #FUNCTION# ;=============================================================================== ; ; Name...........: _VectorDelta ; Description ...: Returns a similarity score between two lists ; Syntax.........: _DateDiff($sType, $sStartDate, $sEndDate) ; Parameters ....: $aDatasetA, $aDatasetB ; Return values .: Success - Similarity score. ; Failure - potentially encounters division by zero. ; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon" ; Modified.......: ; Remarks .......: Not to be used directly, for use by _Similarity() ; Related .......: _Similarity ; Link ..........; ; Example .......; Yes ; ; ;========================================================================================== Func _VectorDelta($aDatasetA, $aDatasetB) ;count Local $iCount = 0 ;return Local $return = 0 ;temp value Local $tempValue = 0 ;index Local $index = 0 ;value Local $value = 0 ;iterate through each value in $aDatasetA and compare to values in $aDatasetB ;Iterate comparisons from here... For $value In $aDatasetA ;increment index $index += 1 ;check if index is lesser than or equal to the size of $aDatasetB If $index <= UBound($aDatasetB) Then $iCount += 1 $tempValue = $aDatasetB[$index - 1] - $value $tempValueSquared = $tempValue * $tempValue $return += $tempValueSquared EndIf Next ;... to here. ;Check the count of compared dataset pairs, return the square root of the summed comparisons or else 0 If $iCount > 0 Then If $return > 0 Then $return = Sqrt($return) EndIf EndIf ;Return the result. Return $return EndFunc ;==>_VectorDelta ; #FUNCTION# ;=============================================================================== ; ; Name...........: _Similarity ; Description ...: Returns a similarity score between a list of elements and a set of other lists ; Syntax.........: _Similarity($aArrayH, $iIndexA, $iIndexB) ; Parameters ....: $aArrayH, $iIndexA, $iIndexB ; Return values .: Success - Similarity score comparing $aArrayH[$iIndexA] to $aArrayH[$iIndexB] against $iIndexA to each other array. ; Failure - potentially encounters division by zero. ; Author ........: JRowe, inspired by php by Timothy Robert Keal, aka "alias Jargon" ; Modified.......: ; Remarks .......: Compares element to element, doesn't do iterative correlation. ; Related .......: _VectorDelta ; Link ..........; ; Example .......; Yes ; ; ;========================================================================================== Func _Similarity($aArrayH, $iIndexA, $iIndexB) ;return Local $return = 0 ;tally Local $tally = 0 ;Vector delta of A to B Local $similarityOfAToB = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$iIndexB]) Local $index = 0 ;Iterate through each array, comparing similarity of every array For $iIndexC In $aArrayH $index += 1 ;don't include self comparisons in $result If ($index<> $iIndexA) AND ($index<>$iIndexB) Then ;increment tally of comparisons $tally += 1 ;Get Vector Delta of array[A] and array[index-1] $similarityOfAToList = _VectorDelta($aArrayH[$iIndexA], $aArrayH[$index-1]) ;Get Vector Delta of array[B] and array[index-1] $similarityOfBToList = _VectorDelta($aArrayH[$iIndexB], $aArrayH[$index-1]) ;increment $return if similarity is greater than A to list If $similarityOfAToB > $similarityOfAToList Then $return += 1 ;increment $return if similarity is greater than B to list If $similarityOfAToB > $similarityOfBToList Then $return += 1 EndIf Next ;return $return divided by 2 over the number of tallied comparisons Return 1-($return / 2 / $tally) EndFunc ;==>_Similarity
Example:
#include "_CorrelativeAnalysis.au3" ;Example ;1,2,3,4 representing up(1) down(2) left(3) and right(4) respectively ;[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] is a line going straight up, for example. ;Dataset for patterns Global $testSet1[16] = [1,1,1,1,4,4,4,4,2,2,2,2,3,3,3,3] Global $testSet2[16] = [1,1,1,1,4,4,4,4,4,4,4,4,1,1,1,1] Global $testSet3[16] = [1,1,1,4,1,1,1,1,1,1,1,1,3,3,3,3] Global $testSet4[16] = [2,2,2,2,3,3,3,3,3,3,3,3,1,1,1,1] Global $testSet5[16] = [3,3,3,3,2,2,2,2,4,4,4,4,1,1,1,1] Global $testSet6[16] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] ;pattern we want to test Global $MatchSet[16] = [1,4,1,1,1,4,4,4,3,2,1,2,3,3,4,1] Global $comparison[7] = [$MatchSet, $testSet1, $testSet2, $testSet3, $testSet4, $testSet5, $testSet6] ConsoleWrite("Similarity to 1: " & _Similarity($comparison, 0, 1) & @CRLF) ConsoleWrite("Similarity to 2: " & _Similarity($comparison, 0, 2) & @CRLF) ConsoleWrite("Similarity to 3: " & _Similarity($comparison, 0, 3) & @CRLF) ConsoleWrite("Similarity to 4: " & _Similarity($comparison, 0, 4) & @CRLF) ConsoleWrite("Similarity to 5: " & _Similarity($comparison, 0, 5) & @CRLF) ConsoleWrite("Similarity to 6: " & _Similarity($comparison, 0, 6) & @CRLF)
This performs element to element matching. This doesn't handle nonlinear data sets... only linear clusters. It will detect similarities between pixel colors in the same position, for example, but it won't detect similarities between a pixel and its neighbors. That requires cycling through iterations and transformations of the data.
In the example, I laid out a set of arbitrary paths that could be seen as input from a mouse gesture. $matchSet is the data being tested against the data set. It returns 91% similarity to the correct match (test set 1) and lower similarity to each other set.
Sets can be weighted by repeated inclusion. You can match against incomplete sets, but the data requires being correctly aligned.
If anyone is interested, I'd really love some help in array manipulation so that this could be used on nonlinear data. Things like facial recognition and feature detection are possible, but I'm not the greatest at matrix manipulation.
Think of this as N-dimensional Venn diagrams. The similarity scores represent the percentage of overlap between each element in each list.
Thanks to Keal for laying this out. This is really a very robust and powerful piece of code.







