Jump to content

_WinAPI_FileCompareBinary v0.70 build 2024-03-22 beta - reinventing the wheel ^^


 Share

Recommended Posts

  • UEZ changed the title to _WinAPI_FileCompareBinary build 2024-03-05 beta - reinventing the wheel ^^

Hi, thanks for the release. I believe one wouldn't need an external dll to compare bytes. The trick to comparing it faster is not letting autoit do it x). Simply loading whatever file you want in memory (not as string) and comparing bytes one by one using asm will be faster:

#include <WinAPI.au3>
#include <Memory.au3>
#include <Array.au3>

$iTimer = TimerInit()
$pBuff1 = ReadFile("big file.dat")
$pBuff2 = ReadFile("similarly big file.dat")
$aCmp = CompareFiles($pBuff1,$pBuff2,0)
$aBindex = CompareByteIndex($pBuff1,$pBuff2,$aCmp)
$iTimer = TimerDiff($iTimer)
_MemVirtualFree(DllStructGetPtr($pBuff1),0,$MEM_RELEASE)
_MemVirtualFree(DllStructGetPtr($pBuff2),0,$MEM_RELEASE)
_ArrayDisplay($aBindex,$iTimer & "ms")
Exit

Func CompareByteIndex($pBuffer1,$pBuffer2,$aCmp)
    Local $aReturn[$aCmp[0]+1][3]
    For $i = 1 To $aCmp[0]
        $iOffset = $aCmp[$i]
        $aReturn[$i][0] = Hex($iOffset,8)
        $aReturn[$i][1] = Hex(DllStructGetData($pBuffer1,1,$iOffset),2)
        $aReturn[$i][2] = Hex(DllStructGetData($pBuffer2,1,$iOffset),2)
    Next
    Return $aReturn
EndFunc

Func ReadFile($sPath)
    $hFile = _WinAPI_CreateFile($sPath,2,2)
    $iLen = _WinAPI_GetFileSizeEx($hFile)
    Local $pAlloc = _MemVirtualAlloc(Null,$iLen,$MEM_COMMIT,$PAGE_READWRITE) ; Creating my own pages to avoid conflict with autoit when i want to free them
    $pBuffer = DllStructCreate("Byte[" & $iLen & "]",$pAlloc)
    $iRead = 0
    _WinAPI_ReadFile($hFile,$pBuffer,$iLen,$iRead)
    _WinAPI_CloseHandle($hFile)
    Return $pBuffer
EndFunc

Func CompareFiles($pBuffer1,$pBuffer2,$iStart,$iEnd=-1)
    $iSizeBuffer1 = DllStructGetSize($pBuffer1)
    $iSizeBuffer2 = DllStructGetSize($pBuffer2)
    $iSizeMin = ($iSizeBuffer1 < $iSizeBuffer2) ? $iSizeBuffer1 : $iSizeBuffer2
    If $iEnd > $iSizeMin OR $iEnd = -1 Then $iEnd = $iSizeMin ; Cases for mismatching sizes
    $pAlloc = _MemVirtualAlloc(Null, 128, $MEM_COMMIT, $PAGE_EXECUTE_READWRITE)
    $pOut = DllStructCreate("uint_ptr[" & 0x4000 & "]") ; Modify if you wish to have more
    If @AutoItX64 Then
        $pFunc = DllStructCreate("Byte[128]",$pAlloc)
        DllStructSetData($pFunc, 1, "0x55488BEC4883EC10C745F001000000C745F400000000498BF848897DF8488BC14801F84C8BD24901FA488B7DF84C39CF0F8332000000418A1238100F841C0000004C8B5D30488B7DF048C1E7034901FB488B7DF848FFC749893BFF45F048FFC049FFC2FF45F8EBC14C8B5DF0488B7D304C891F488BE55DC3")
    Else
        $pFunc = DllStructCreate("Byte[90]",$pAlloc)
        DllStructSetData($pFunc, 1, "0x558BEC83EC0CC745F8010000008B7D10897DFC8B450801F88B4D0C01F98B7DFC3B7D140F83230000008A1138100F84120000008B7DFC478B75F8C1E602037518893EFF45F84041FF45FCEBD18B75F88B7D1889378BE55DC21400")
    EndIf
    DllCallAddress("int",DllStructGetPtr($pFunc),"UINT_PTR",DllStructGetPtr($pBuffer1),"UINT_PTR",DllStructGetPtr($pBuffer2),"int",$iStart,"int",$iEnd,"UINT_PTR",DllStructGetPtr($pOut))
    $iLen = DllStructGetData($pOut,1,1)
    Dim $aReturn[$iLen]
    $aReturn[0] = $iLen-1
    For $i = 1 To $aReturn[0]
        $aReturn[$i] = DllStructGetData($pOut,1,$i+1)
    Next
    Return $aReturn
EndFunc

x64 code is slightly different as i don't usually deal with it

I've tried comparing files of 1.5gb and as long as there's enough memory to load the file this should work. Autoit has large address aware on by default so even on x86 you should be able to handle quite a bit. Feel free to improve it, I have no real use for it so I can't come up with any x)

Cheers

Link to comment
Share on other sites

Posted (edited)
6 hours ago, yuser said:

I believe one wouldn't need an external dll to compare bytes.

Why not? Inline assembler is also possible as you have shown but too complicated and maintaining the code is not easy but much shorter. The DLL is also assembler code but outsourced to a DLL. 

As fas as I know x86 code can assign only 2GB of memory.

I will compare large files with your version and my DLL. I assume that the result should not differ much from each other as your asm code doesn't use MMX/SSE. Reading the files to memory is the same way using CreateFileW aka _WinAPI_ReadFile. 

 

Edit: tried to compare ffmpeg.exe (129 mb) with ffplay.exe (128 mb) running it as x86/x64 but it crashes (C0000005 - memory access violation).

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

3 hours ago, UEZ said:

Why not? Inline assembler is also possible as you have shown but too complicated and maintaining the code is not easy but much shorter. The DLL is also assembler code but outsourced to a DLL. 

As fas as I know x86 code can assign only 2GB of memory.

I will compare large files with your version and my DLL. I assume that the result should not differ much from each other as your asm code doesn't use MMX/SSE. Reading the files to memory is the same way using CreateFileW aka _WinAPI_ReadFile. 

 

Edit: tried to compare ffmpeg.exe (129 mb) with ffplay.exe (128 mb) running it as x86/x64 but it crashes (C0000005 - memory access violation).

 

 I recommend reading this link from microsoft, it's informative and I hope it will help you.
https://learn.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases

#include <Memory.au3>
While 1
    $dwPage = _MemVirtualAlloc(Null,0x8000 * 0x1000,$MEM_COMMIT,0x40)
    ConsoleWrite(Hex($dwPage,8) & @CRLF)
    If $dwPage = 0 Then Exit
WEnd

Try running this, and you'll see that you can use more than 2 gb since LAA is enabled. You can use GlobalMemoryStatusEx to get available bytes.

 

As for the crash, you ran out of memory for $pOut (you can get this by debugging it or checking a crash dump). If you plan to compare files that are completely different than each other (which ruins the entire purpose for this code) then you're supposed to create enough uint_ptr for every single byte. To handle too many mismatches, the code can be rewritten so it puts even less pressure on autoit (Adding two bytes after uint_ptr as a container for byte comparison etc). Keep in mind that if you handle the output of files with very little similarity in arrays, you might reach max array size on the way. I hope I'm being clear enough

Cheers x)

Link to comment
Share on other sites

1 hour ago, yuser said:

As for the crash, you ran out of memory for $pOut (you can get this by debugging it or checking a crash dump). If you plan to compare files that are completely different than each other (which ruins the entire purpose for this code) then you're supposed to create enough uint_ptr for every single byte. To handle too many mismatches, the code can be rewritten so it puts even less pressure on autoit (Adding two bytes after uint_ptr as a container for byte comparison etc). Keep in mind that if you handle the output of files with very little similarity in arrays, you might reach max array size on the way. I hope I'm being clear enough

From an application's point of view, it doesn't matter whether you have to look at some differences or possibly all differences - the application should handle this without crashing, without questioning the meaning.

To avoid the limitation of the array size, you should also use structs, which are of course more difficult to handle.

If I use the x86 version of my DLL and the memory is written over the 2GB limit, the application crashes. With the x64 version there is no such limitation.

Of course, to minimize memory usage, you could read and compare blocks of files instead of storing everything in memory.

For me it was instructive to find out how to read out strings generated by the DLL in Autoit, especially since the comparison is much faster.

 

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

3 hours ago, UEZ said:

From an application's point of view, it doesn't matter whether you have to look at some differences or possibly all differences - the application should handle this without crashing, without questioning the meaning.

To avoid the limitation of the array size, you should also use structs, which are of course more difficult to handle.

If I use the x86 version of my DLL and the memory is written over the 2GB limit, the application crashes. With the x64 version there is no such limitation.

Of course, to minimize memory usage, you could read and compare blocks of files instead of storing everything in memory.

For me it was instructive to find out how to read out strings generated by the DLL in Autoit, especially since the comparison is much faster.

 

I provided how you can do it in asm, even clearing the pages created. You simply need to split the file to be compared into parts to accommodate depending on memory available you have. Otherwise on x64 you can't compare files over 2 gb each.

I see that you haven't thoroughly read the link I posted. Simply compile your DLL to handle unsigned pointers properly and it will work above 2gb on x86.

I'm trying to contribute, and my contribution doesn't necessarily have to be code that fits all sizes as I have no intention of spoon feeding anyone who doesn't want to learn anything. If you think this doesn't refer to you, please ignore this post.

Link to comment
Share on other sites

Posted (edited)
On 3/6/2024 at 5:37 PM, yuser said:

I see that you haven't thoroughly read the link I posted. Simply compile your DLL to handle unsigned pointers properly and it will work above 2gb on x86.

Thank you for the link. I haven't had time to read it in detail today, but I will. Maybe I can fix the problem with x86...

By the way, I had also coded parts in ASM to speed up the code instead of using DLLs:

 

Anyhow, thank you for your feedback and no need to feed me with a spoon.

Edited by UEZ
forgot "no"

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

  • UEZ changed the title to _WinAPI_FileCompareBinary v0.70 build 2024-03-22 beta - reinventing the wheel ^^
Posted (edited)

Updated the DLL (old DLL is not compatible with current version) and added UDF version.

Current DLL functions:

  • _WinAPI_FileCompareBinary
  • _WinAPI_FileCompareBinaryString
  • _WinAPI_FileComparePrint

 

If you are interessted, please checkout first post for more details and download options.

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

2 minutes ago, Andreik said:

What flavor of BASIC is the source?

I assume you are asking the source code of the DLL -> Freebasic.

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...