Jump to content
Sign in to follow this  

Fast file Compare

Recommended Posts

I had to compare two files with more than one million lines per file.
I've tested several examples but all of them are too slow.
Most of them are running for several hours to compare 1 million lines.
I have written a script that compare's 2 txt files with 1 million lines in less than 5 minutes. (After the files are loaded in an array)
It writes the missing files to 2 textfiles.
It compares 10.000 lines in 1.8 sec, 100.000 lines in 21 sec, 1000.000 lines in 250 sec on my laptop.
The example script creates 2 array's with 1.000.000 lines and then remove's some entry's.
At the end it writes 2 txt files with the missing lines per array.
Please test it and give commend's
#include <array.au3>
#include <Timers.au3>
#include <file.au3>

Local $NrOfRows = 1000000 ; Set number of rows to test

Local $delString1 = 0
Local $delString2 = 0
Local $Array1[$NrOfRows]
Local $Array2[$NrOfRows]
$StartTime = _Timer_Init()
$Timer = _Timer_Init()

; Creating 2 array's
For $i = 0 to $NrOfRows - 1
    $Array1[$i] = "Just some tekst to emulate data to compare " & $i
$Array2 = $Array1
ConsoleWrite("Array's created in " & Round(_Timer_Diff($Timer)) & " milliseconds" & @CRLF)
$Timer = _Timer_Init()

; removing some entry's from both array's to show functionality
_ArrayDelete($Array1, "333;5555;7777")
_ArrayDelete($Array2, "222;4444;6666")
ConsoleWrite("Removed some value's in " & Round(_Timer_Diff($Timer)) & " milliseconds" & @CRLF)
$Timer = _Timer_Init()

; You neede to sort the array is you use Binary Search
_ArraySort($Array1, 0, 1, 0, 0, 1)
ConsoleWrite("Sorted Array 1 in " & Round(_Timer_Diff($Timer)) & " milliseconds" & @CRLF)
$Timer = _Timer_Init()

; comparing the 2 array's
For $i = 0 to UBound($Array2) - 1
    $Index = _ArrayBinarySearch($Array1, $Array2[$i], 1)

    ; add equal rows to a string
    If $Index <> -1 Then
        $delString1 &= ";" & $Index
        $delString2 &= ";" & $i
ConsoleWrite("Array's compared in " & Round(_Timer_Diff($Timer)) & " milliseconds" & @CRLF)
$Timer = _Timer_Init()

; removing the equal rows from the array's
_ArrayDelete($Array1, $delString1)
_ArrayDelete($Array2, $delString2)
ConsoleWrite("removed equal rows in " & Round(_Timer_Diff($Timer)) & " milliseconds" & @CRLF)
$Timer = _Timer_Init()

; writing the rsult to files
_FileWriteFromArray("missing in array 1.txt", $Array2)
_FileWriteFromArray("missing in array 2.txt", $Array1)
ConsoleWrite("Write missing value's to File in " & Round(_Timer_Diff($Timer)) & " milliseconds" & @CRLF)
$Timer = _Timer_Init()

ConsoleWrite("Compare complete in " &Round(_Timer_Diff($StartTime)) & " milliseconds")
Edited by SnArF

My scripts: _ConsoleWriteLog | _FileArray2D




Share this post

Link to post
Share on other sites

Do you need only the information whether the 2 files are different or also what is different (content)?



Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post

Link to post
Share on other sites


The script shows what's different (Content).

I have a script that makes an index of 2 servers, about 1.500.000 files per server.

The result are saved to 2 text files.

Then the the text files are compared, only the different files are then saved to text files.

The complete process, indexing 2 file server with 1.5 million files each and comparing them takes about 11 minutes, I think that's very fast.

My scripts: _ConsoleWriteLog | _FileArray2D




Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By mLipok
      If you need to compare two files using WinMerge, you can use the _WinMergeCompare2Files function as in the following example:
      _Example() Func _Example() FileCopy(@ScriptFullPath, @ScriptFullPath & '.txt') FileWrite(@ScriptFullPath & '.txt', @CRLF & 'TEST' & @CRLF & @CRLF) _WinMergeCompare2Files(@ScriptFullPath, @ScriptFullPath & '.txt') EndFunc ;==>_Example Func _WinMergeCompare2Files($sLeftFilePath, $sRightFilePath, $fWaitForWinMerge = True) ; Left is orginal , Right is new Local Const $sWinMergeParamsKey = 'HKEY_CURRENT_USER\Software\Thingamahoochie\WinMerge' If FileExists($sLeftFilePath) And FileExists($sRightFilePath) Then Local Const $sWinMergeExecutable = RegRead($sWinMergeParamsKey, 'Executable') Local Const $sFileName = StringRegExp($sLeftFilePath, '(?i).*\\(.*)', 3)[0] Local Const $sWinMergeParams = _ ' /u /e /xq /wl /maximize /dl "Original: ' & $sFileName & '" /dr "New version: ' & $sFileName & '" ' & _ '"' & $sLeftFilePath & '"' & ' ' & '"' & $sRightFilePath & '"' If $fWaitForWinMerge Then ShellExecuteWait($sWinMergeExecutable, $sWinMergeParams) Else ShellExecute($sWinMergeExecutable, $sWinMergeParams) EndIf EndIf EndFunc ;==>_WinMergeCompare2Files
    • By PCI
      Hi everyone , hope some masters and MVP could help me on this.

      I have 2 files to compare file1.txt and file2.txt
      Both files have like 20000 lines and it's hard for me to go through them line by line.

      Here's an example of the lines :


      I need to read the first string 212121212121 before ý from file1.txt and compare it to the same string on file2.txt then if anything on the same line from both files is different then copy the whole line in another file result.txt

      I'm really sorry i got stuck on this as i could figure out if i should use FileRead/_FileReadToArray or FileReadLine or StringMid
      Please help me at least how to start my script
      Thank you so much

  • Create New...