# _datadifference() Get the real difference between two sets of data. (Supports colors, binary + more!)

## Recommended Posts

Hey guys, saw a couple posts in the help section recently asking for a way to compare images, to get how different the two actually are. So I wrote a function that directly compares sets of data and gets how different one character is from the next. In other words this function won't tell you the percentage of data that changed what is does do is tell you is how different the two sets of data are from each other.

You may ask what's the difference? Well put quite simply if you have a string 001 and 021, the percentage of change is 33% however the difference between the two is 3.7%, Simmilarly with the sets of data 001 and 051, the percentage of change is still 33%, however the difference is now 15%. We're not checking to see what has changed, but rather, how much it has changed from it's original value.

While this was primarily deigned to compare the difference between two colors, you could get the relative difference between any two sets of data.

This can also be used to see how random an output is. I'm sure other's will think of interesting uses too!

The example below creates two random 1000x1000 pixel images and compares them. You'll notice the output is always around 34%, this means that the random() function is only about 34% random!

Example:

```Global \$pixelseed = StringSplit('0123456789abcdef', ""), \$image1, \$image2

for \$i=1 to 2000 ;let's make a 1000x1000 pixels image
\$image1 &= \$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]
\$image2 &= \$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]
Next

\$timer = TimerInit()
\$compare = _datacompare(\$image1, \$image2)
ConsoleWrite(\$compare&'% Different'&@CRLF&'Took '&(TimerDiff(\$timer)/1000)&' seconds.'&@CRLF)```

Functions:

```#cs
Function _datacompare(\$data1, \$data2, [\$declimal])
-\$data1: A string of data, any length.
-\$data2: A string of data, must be the same length as \$data1
-\$decimal: 1=Binary 9=Base-10, 15=Base-16, 31=Base-32

Note: If you just want to compare two sets of binary data
you probably want to use base-16. Unless you are sure your
binary is in 1's and 0's.

Returns: A float containing the percentage of difference.
#ce
func _datacompare(\$data1, \$data2, \$decimal=15)
Local \$difference
\$data1 = StringSplit(\$data1, "")
\$data2 = StringSplit(\$data2, "")
\$difference = 0
for \$i=1 to \$data1[0]
if \$data1[\$i] <> \$data2[\$i] Then
\$difference += Abs(_tonum(\$data1[\$i]) - _tonum(\$data2[\$i]))
EndIf
Next
\$difference = ((\$difference/\$data1[0])/\$decimal)*100
Return \$difference
EndFunc

#cs
Function _tonum()
-\$info: A single digit or carachter.

Returns: A 0-based value.
#ce
func _tonum(\$info)
if \$info+0 > 0 Then Return \$info
\$info = StringLower(\$info)
\$return = asc(\$info)-87
switch \$return
Case -39
Return 0
Case Else
Return \$return
EndSwitch
EndFunc```

Comments, questions, criticisms, improvements? Post them!

This is pretty slow right now, about 60 seconds on a 1000x1000 image.

I would really like to get it working faster for more practice uses. And help is much appreciated!

Edited by nullschritt

##### Share on other sites

Hey guys, saw a couple posts in the help section recently asking for a way to compare images, to get how different the two actually are. So I wrote a function that directly compares sets of data and gets how different one character is from the next. In other words this function won't tell you the percentage of data that changed what is does do is tell you is how different the two sets of data are from each other.

You may ask what's the difference? Well put quite simply if you have a string 001 and 021, the percentage of change is 33% however the difference between the two is 3.7%, Simmilarly with the sets of data 001 and 051, the percentage of change is still 33%, however the difference is now 15%. We're not checking to see what has changed, but rather, how much it has changed from it's original value.

While this was primarily deigned to compare the difference between two colors, you could get the relative difference between any two sets of data.

This can also be used to see how random an output is. I'm sure other's will think of interesting uses too!

The example below creates two random 1000x1000 pixel images and compares them. You'll notice the output is always around 34%, this means that the random() function is only about 34% random!

Example:

```Global \$pixelseed = StringSplit('0123456789abcdef', ""), \$image1, \$image2

for \$i=1 to 2000 ;let's make a 1000x1000 pixels image
\$image1 &= \$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]
\$image2 &= \$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]&\$pixelseed[Random(1, \$pixelseed[0], 1)]
Next

\$timer = TimerInit()
\$compare = _datacompare(\$image1, \$image2)
ConsoleWrite(\$compare&'% Different'&@CRLF&'Took '&(TimerDiff(\$timer)/1000)&' seconds.'&@CRLF)```

Functions:

```#cs
Function _datacompare(\$data1, \$data2, [\$declimal])
-\$data1: A string of data, any length.
-\$data2: A string of data, must be the same length as \$data1
-\$decimal: 1=Binary 9=Base-10, 15=Base-16, 31=Base-32

Note: If you just want to compare two sets of binary data
you probably want to use base-16. Unless you are sure your
binary is in 1's and 0's.

Returns: A float containing the percentage of difference.
#ce
func _datacompare(\$data1, \$data2, \$decimal=15)
Local \$difference
\$data1 = StringSplit(\$data1, "")
\$data2 = StringSplit(\$data2, "")
\$difference = 0
for \$i=1 to \$data1[0]
if \$data1[\$i] <> \$data2[\$i] Then
\$difference += Abs(_tonum(\$data1[\$i]) - _tonum(\$data2[\$i]))
EndIf
Next
\$difference = ((\$difference/\$data1[0])/\$decimal)*100
Return \$difference
EndFunc

#cs
Function _tonum()
-\$info: A single digit or carachter.

Returns: A 0-based value.
#ce
func _tonum(\$info)
if \$info+0 > 0 Then Return \$info
\$info = StringLower(\$info)
\$return = asc(\$info)-87
switch \$return
Case -39
Return 0
Case Else
Return \$return
EndSwitch
EndFunc```

Comments, questions, criticisms, improvements? Post them!

This is pretty fast, it only takes about 0.2 seconds to compare two 1000x1000 pixel images, which is around 15.6 mb of data.

This example simulates around a 45 x 45 pixel image and took 0.1 seconds.

A 1000 x 1000 (1,000,000) pixel simulation took 89 seconds.

Monkey's are, like, natures humans.

##### Share on other sites

@jhon you're right, I wasnt fully thinking making the example. While it seems fast enough for comparing data, this doesn't seem that it would be fast enough to compare images in realtime. For that it would need to take under a second.

Any recommendations for speed? Fro anyone?

##### Share on other sites

Not my code, if memory serves this was from monoceres from MANY years back.

```#include <GDIPlus.au3>

\$Pic1 = @DesktopDir&"\untitled.jpg"
\$Pic2 = @DesktopDir&"\untitled2.jpg"

\$StartTimer = TimerInit()
\$Check = _Check(\$Pic1, \$Pic2)
\$EndTimer = TimerDiff(\$StartTimer)
ConsoleWrite("images are the same -> " & \$Check & ", time taken (ms) " & \$EndTimer & @CR)

Func _Check(\$fname1, \$fname2)
_GDIPlus_Startup()

If _CompareBitmaps(\$bm1, \$bm2) = 1 Then
\$Same = True ; the images being compared are the same
Else
\$Same = False ; the images being compared are different
EndIf
_GDIPlus_ImageDispose(\$bm1)
_GDIPlus_ImageDispose(\$bm2)
_GDIPlus_Shutdown()
Return \$Same
EndFunc   ;==>_Check

Func _CompareBitmaps(\$bm1, \$bm2)
\$Bm1W = _GDIPlus_ImageGetWidth(\$bm1)
\$Bm1H = _GDIPlus_ImageGetHeight(\$bm1)
\$BitmapData1 = _GDIPlus_BitmapLockBits(\$bm1, 0, 0, \$Bm1W, \$Bm1H, \$GDIP_ILMREAD, \$GDIP_PXF32RGB)
\$Stride = DllStructGetData(\$BitmapData1, "Stride")
\$Scan0 = DllStructGetData(\$BitmapData1, "Scan0")

\$ptr1 = \$Scan0
\$size1 = (\$Bm1H - 1) * \$Stride + (\$Bm1W - 1) * 4

\$Bm2W = _GDIPlus_ImageGetWidth(\$bm2)
\$Bm2H = _GDIPlus_ImageGetHeight(\$bm2)
\$BitmapData2 = _GDIPlus_BitmapLockBits(\$bm2, 0, 0, \$Bm2W, \$Bm2H, \$GDIP_ILMREAD, \$GDIP_PXF32RGB)
\$Stride = DllStructGetData(\$BitmapData2, "Stride")
\$Scan0 = DllStructGetData(\$BitmapData2, "Scan0")

\$ptr2 = \$Scan0
\$size2 = (\$Bm2H - 1) * \$Stride + (\$Bm2W - 1) * 4

\$smallest = \$size1
If \$size2 < \$smallest Then \$smallest = \$size2
\$call = DllCall("msvcrt.dll", "int:cdecl", "memcmp", "ptr", \$ptr1, "ptr", \$ptr2, "int", \$smallest)

_GDIPlus_BitmapUnlockBits(\$bm1, \$BitmapData1)
_GDIPlus_BitmapUnlockBits(\$bm2, \$BitmapData2)

Return (\$call[0] = 0)
EndFunc   ;==>_CompareBitmaps```

For comparing, I took 2 screenshots of my desktop, 3360X1080 resolution.  Ran test with both images the same, result was True, 98.1 miliseconds.  Made slight change to the original image, ran again, result was False, 88.8 miliseconds.

Ian

Edited by llewxam

My projects:

• IP Scanner - Multi-threaded ping tool to scan your available networks for used and available IP addresses, shows ping times, resolves IPs in to host names, and allows individual IPs to be pinged.
• INFSniff - Great technicians tool - a tool which scans DriverPacks archives for INF files and parses out the HWIDs to a database file, and rapidly scans the local machine's HWIDs, searches the database for matches, and installs them.
• PPK3 (Persistent Process Killer V3) - Another for the techs - suppress running processes that you need to keep away, helpful when fighting spyware/viruses.
• Sync Tool - Folder sync tool with lots of real time information and several checking methods.
• USMT Front End - Front End for Microsoft's User State Migration Tool, including all files needed for USMT 3.01 and 4.01, 32 bit and 64 bit versions.
• Audit Tool - Computer audit tool to gather vital hardware, Windows, and Office information for IT managers and field techs. Capabilities include creating a customized site agent.
• CSV Viewer - Displays CSV files with automatic column sizing and font selection. Lines can also be copied to the clipboard for data extraction.
• MyDirStat - Lists number and size of files on a drive or specified path, allows for deletion within the app.
• 2048 Game - My version of 2048, fun tile game.
• Juice Lab - Ecigarette liquid making calculator.
• Data Protector - Secure notes to save sensitive information.
• VHD Footer - Add a footer to a forensic hard drive image to allow it to be mounted or used as a virtual machine hard drive.
• Find in File - Searches files containing a specified phrase.

##### Share on other sites

@llewxam, your post is a bit off topic here, while it has to do with comparing two images, it has nothing to do with getting their relative difference as a percentage. (the check lighting effects such)

Am looking for anyone who could point out a way to make my code faster, it doesn't seem bloated to me, I'm not sure what would really speed it up.

PS: Thought: Although, if the image was sized down to 100x100 pixels couldn't it still be accurately compared? My thoughts is that everything is preserved in perfect proportion, so the same visual differences should be equally measurable in them. Correct me if I am wrong. (still doesnt means I wouldnt love any advice on making the algorithm faster anyways)

Edited by nullschritt

##### Share on other sites

LOL, sorry, saw the part about comparing images and my brain leapt in that direction!

Ian

My projects:

• IP Scanner - Multi-threaded ping tool to scan your available networks for used and available IP addresses, shows ping times, resolves IPs in to host names, and allows individual IPs to be pinged.
• INFSniff - Great technicians tool - a tool which scans DriverPacks archives for INF files and parses out the HWIDs to a database file, and rapidly scans the local machine's HWIDs, searches the database for matches, and installs them.
• PPK3 (Persistent Process Killer V3) - Another for the techs - suppress running processes that you need to keep away, helpful when fighting spyware/viruses.
• Sync Tool - Folder sync tool with lots of real time information and several checking methods.
• USMT Front End - Front End for Microsoft's User State Migration Tool, including all files needed for USMT 3.01 and 4.01, 32 bit and 64 bit versions.
• Audit Tool - Computer audit tool to gather vital hardware, Windows, and Office information for IT managers and field techs. Capabilities include creating a customized site agent.
• CSV Viewer - Displays CSV files with automatic column sizing and font selection. Lines can also be copied to the clipboard for data extraction.
• MyDirStat - Lists number and size of files on a drive or specified path, allows for deletion within the app.
• 2048 Game - My version of 2048, fun tile game.
• Juice Lab - Ecigarette liquid making calculator.
• Data Protector - Secure notes to save sensitive information.
• VHD Footer - Add a footer to a forensic hard drive image to allow it to be mounted or used as a virtual machine hard drive.
• Find in File - Searches files containing a specified phrase.

## Create an account

Register a new account

• ### Similar Content

• By czardas
Haven't had much time to code recently. However the following thread inspired me.
The debate about linear, parallel and binary search methods was rather interesting and, in an attempt to be diplomatic, I decided to combine @jchd's suggestion with @LarsJ's binary search example. I decided that the binary search algorithm required modification to make it more linear. As usual, 'if you invent something, it probably already exists and if it already exists, it exists for a reason'. My first attempt was not all that good. The code worked but was really a mess. I blame peer pressure (to post an example of a parallel search method).  I will delete that old code in due course.
With a little memory jogging and a glance at the help file, the solution turned out to be quite easy: I just needed a better understanding of Euler. Further modification will be needed to work with more complicated unicode strings. The output could be returned as an array or a delimitered string. I'm not so interested in those details. I'm just going to post the algorithm for now and anyone, who wants to, can modify it to suit their needs. Both arrays must contain at least 1 element.
Local \$aFoo = [0,1,2,3,4,5,6,7,9,10,11,12,13,14,15,16,19,20,23,24,26,30,35,39,40,41] Local \$aBar = [0,1,5,6,7,8,9,10,11,12,13,14,17,18,19,21,24,25,26,27,34,35,38,40] ParallelExponetialSearch(\$aFoo, \$aBar) ; Compares two lists - returning positive matches. Each input array must be unique (individually) and in alphabetical order. Func ParallelExponetialSearch(\$aFoo, \$aBar) Local \$sFind, _ \$iMin_F = -1, \$iMax_F = UBound(\$aFoo) -1, \$Lo_F = \$iMin_F, \$Hi_F, _ \$iMin_B = -1, \$iMax_B = UBound(\$aBar) -1, \$Lo_B = \$iMin_B, \$Hi_B While \$iMin_F < \$iMax_F And \$iMin_B < \$iMax_B ; Toggle Arrays - Which array has most untested elements? This is the one we want to search next, ; so we can bypass more comparisons because (in theory) mismatches have a greater chance of being skipped. If \$iMax_F - \$iMin_F >= \$iMax_B - \$iMin_B Then ; \$aFoo has more (or an equal number of) untested elements \$Hi_F = \$iMax_F \$iMin_B += 1 \$sFind = \$aBar[\$iMin_B] While \$Lo_F < \$Hi_F ; search \$aFoo For \$i = 0 To Floor(Log(\$Hi_F - \$Lo_F) / Log(2)) \$Lo_F = \$iMin_F + 2^\$i If \$aFoo[\$Lo_F] = \$sFind Then \$iMin_F = \$Lo_F ; each match should be added to the output [perhaps an array] ConsoleWrite(\$sFind & " found at \$aFoo[" & \$Lo_F & "] = \$aBar[" & \$iMin_B & "]" & @LF) ExitLoop 2 ElseIf \$aFoo[\$Lo_F] > \$sFind Then \$Hi_F = \$Lo_F -1 \$iMin_F += Floor(2^(\$i -1)) \$Lo_F = \$iMin_F ContinueLoop 2 EndIf Next \$iMin_F = \$Lo_F ; minimum increment is one WEnd Else ; \$aBar has more untested elements \$Hi_B = \$iMax_B \$iMin_F += 1 \$sFind = \$aFoo[\$iMin_F] While \$Lo_B < \$Hi_B ; search \$aBar For \$i = 0 To Floor(Log(\$Hi_B - \$Lo_B) / Log(2)) \$Lo_B = \$iMin_B + 2^\$i If \$aBar[\$Lo_B] = \$sFind Then \$iMin_B = \$Lo_B ; each match should be added to the output [perhaps an array] ConsoleWrite(\$sFind & " found at \$aFoo[" & \$iMin_F & "] = \$aBar[" & \$Lo_B & "]" & @LF) ExitLoop 2 ElseIf \$aBar[\$Lo_B] > \$sFind Then \$Hi_B = \$Lo_B -1 \$iMin_B += Floor(2^(\$i -1)) \$Lo_B = \$iMin_B ContinueLoop 2 EndIf Next \$iMin_B = \$Lo_B ; minimum increment is one WEnd EndIf WEnd EndFunc ;==> ParallelExponetialSearch I hope this will be useful to someone. I believe it deserved a thread of its own!
• By ur
We can get a list of file using the below code.
Local \$aFileList = _FileListToArray(@DesktopDir, "*") Is there any option to use the above one recursively to get sub folders and their contents also.??
And also, is there any way to serialize the above array locally to some file and load it later when we want in another program on another machine so that we can compare its contents with a folder in different machine, which is not network connected also.?
• By ur
Generally we will use tools like Winmerge or Beyond Compare for this purpose.
Are there any UDF or libraries available in AutoIT to compare any two files or folder contents.
• By FMS
Hello,

I'm having trouble whit a scipt what I'm building where this is a snippit from, and hope somebody can help me whit.
The problem lies in when i push the "add" button i want to check if the "user" already exists.
But the search code i build founds 2 hits when i know there is only 1 hit.
does somebody knows what I'm doing wrong?

Ps. if somebody knows " if i found the right user how can i rewrite the password for him/her? " an answer to that will be most appriciated
• By cherrylatte
hi
I'm trying to make a script that runs different functions depending on the local time of the computer
I tried to do
if _NowCalcDate < 2016/04/12 Then
functionA()
Else
functionB()
Endif

and that doesn't seem to work.
I am assuming that value returned from _NowCalcDate doesn't match with the date type I wrote

What should I do? I'd appreciate for any help that's given.
×

• Wiki

• Back

• Git