Jump to content

Which is the fastest method to compare two array


rootx
 Share

Recommended Posts

I write this script to compare to array, well Do you have a better way to do it?

#include <Constants.au3>
#include <Array.au3>
#include <File.au3>
#include <MsgBoxConstants.au3>

$a = FileSelectFolder("seleziona cartella A",@ScriptDir)
$fold_A = _FileListToArrayRec($a,"*",2,1,0,1)

$b = FileSelectFolder("seleziona cartella B",@ScriptDir)

$fold_B = _FileListToArrayRec($b,"*",2,1,0,1)

$tota = UBound($fold_A)-1
$totb = UBound($fold_B)-1

If $tota > $totb Then
    findx($fold_A,$fold_B)
Else
    findx($fold_B,$fold_A)
EndIf


Func findx($min,$max)
    For $i = 0 To UBound($min)-1
        $found = _ArraySearch($max, $min[$i])
        If $found = -1 Then
            ConsoleWrite($min[$i]&@CRLF)
        EndIf
    Next
EndFunc

 

Link to comment
Share on other sites

I'm a bit of an AutoIT novice, so excuse the lack of code examples. I think your problem depends on the relative sizes of the two arrays. If one of the arrays is small (2-10 items), then your method is quite effective, the problem will occur when the two arrays are of significant and similar sizes.

If you have a couple of arrays holding 100 000 items each, then that becomes a very large comparison. You could improve things quite a bit by sorting the arrays first and then dividing them into manageable buckets of data. For example, if your stored data is alpanumeric then make 36 arrays, depending on the first letter of your data 0-9 a-z. Then you only have to compare your "a" bucket to your "a" bucket. The overhead of creating the buckets versus your simplistic search of the entire range will depend on the data... 

Any help?

Edit: You don't need to split the smaller array into buckets, just the large one, then iterate the small one.

Second Edit (sorry!): Obviously this can be more sophisticated, if you don't have a uniform data distribution, change your strategy - you want to try and make the buckets near uniform size.

Edited by SlackerAl

Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.

Link to comment
Share on other sites

Like this..

 

#include <Constants.au3>
#include <Array.au3>
#include <File.au3>
#include <MsgBoxConstants.au3>

$a = FileSelectFolder("seleziona cartella A",@ScriptDir)
$fold_A = _FileListToArrayRec($a,"*",2,1,0,1)

$b = FileSelectFolder("seleziona cartella B",@ScriptDir)

$fold_B = _FileListToArrayRec($b,"*",2,1,0,1)

$tota = UBound($fold_A)-1
$totb = UBound($fold_B)-1

If $tota > $totb Then
    findx($fold_A,$fold_B)
Else
    findx($fold_B,$fold_A)
EndIf


Func findx($min,$max)
    _ArraySort($min, 0, 1)
    _ArraySort($max, 0, 1)
    For $i = 0 To UBound($min)-1
        $found = _ArrayBinarySearch($max, $min[$i], 1)
        If $found = -1 Then
            ConsoleWrite($i&" "&$min[$i]&@CRLF)
        EndIf
    Next
EndFunc

 

Link to comment
Share on other sites

not tested but something like below.

Better way is hard to say as that depends on your requirements, speed wanted etc. maybe powershell is quicker on the cmdline just comparing 2 folders.

#include <Constants.au3>
#include <Array.au3>
#include <File.au3>
#include <MsgBoxConstants.au3>

$a = FileSelectFolder("seleziona cartella A",@ScriptDir)
$fold_A = _FileListToArrayRec($a,"*",2,1,0,1)

$b = FileSelectFolder("seleziona cartella B",@ScriptDir)

$fold_B = _FileListToArrayRec($b,"*",2,1,0,1)

$tota = UBound($fold_A)-1
$totb = UBound($fold_B)-1

_ArraySort($fold_a, 0, 1)
_ArraySort($fold_b, 0, 1)

$i=1
$j=1
while ($i < $totA) or ($j<totB)
    if $fold_a[$i]=$fold_b[$j] then
       consolewrite("Equal")
       $i=$i+1
       $j=$j+1
    elseif $fold_a[$i]<$fold_b[$j] then
            consolewrite("Not Equal")
            $i=$i+1
    else
            consolewrite("Not Equal")
            $j=$j+1
    endif
wend
Link to comment
Share on other sites

It all depends on what you call "equal" applied to arrays.

In my book, [London", "Paris", "Melbourne"] is definitely not equal to [London", "Melbourne", "Paris"]. Both arrays contain the same elements but they are certainly not equal. Similarly to itineraries, a picture on screen isn't equal to the same picture once flipped or with all pixels moved around.

It's only when you consider 1D arrays and call them sets that {4, 31, 9} = {9, 4, 31}

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I just have to compare the contents of two folders. With a sort of arrays before the loop for the comparison gets fast, _ArrayBinarySearch run very fats, 4 example in this moment I run a diff for to folders with 26.186 files, 172 folders TOT 249 GB I have a correct result in a few seconds.

With winmerge.... after 15 min I have 50% of total process....

Another method could be the C++ memcmp function.

Thx guys.

Link to comment
Share on other sites

?

#include <Constants.au3>
#include <Array.au3>
#include <File.au3>

$a = FileSelectFolder("seleziona cartella A",@ScriptDir)
$fold_A = _FileListToArrayRec($a,"*",2,1,0,1)

$b = FileSelectFolder("seleziona cartella B",@ScriptDir)
$fold_B = _FileListToArrayRec($b,"*",2,1,0,1)

local $big = (UBound($fold_A)>UBound($fold_B)) ? $fold_A : $fold_B
local $small = (UBound($fold_A)<=UBound($fold_B)) ? $fold_A : $fold_B
findx($small, $big)

Func findx($min, $max)
  Local $sd = ObjCreate("Scripting.Dictionary")
  $sd.CompareMode = 1   ; case insensitive
  For $i In $max
      $sd.Item($i)
  Next
  For $i In $min
      If $sd.Exists($i) Then $sd.Remove($i)
  Next
  $asd = $sd.Keys()
  _ArrayDisplay($asd, "$asd")
EndFunc

 

Link to comment
Share on other sites

18 minutes ago, rootx said:

I just have to compare the contents of two folders. With a sort of arrays before the loop for the comparison gets fast, _ArrayBinarySearch run very fats, 4 example in this moment I run a diff for to folders with 26.186 files, 172 folders TOT 249 GB I have a correct result in a few seconds.

With winmerge.... after 15 min I have 50% of total process....

15 minutes?
Does this mean you not just compare the file and directory names but the file contents as well?
Just comparing the file/directory names should happen in under a minute.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

If the amount of data is really huge then maybe it's worth a try to the SQLite way (untested, possible typos...)

#include <SQLite.au3>
;#include <SQLite.dll.au3>
#include <Array.au3>

$a = FileSelectFolder("seleziona cartella A",@ScriptDir)
$fold_A = _FileListToArrayRec($a,"*",2,1,0,1)

$b = FileSelectFolder("seleziona cartella B",@ScriptDir)
$fold_B = _FileListToArrayRec($b,"*",2,1,0,1)

 $res = _ArrayCompareAndGetData($fold_A, $fold_B, 1)
 _ArrayDisplay($res)
 $res = _ArrayCompareAndGetData($fold_A, $fold_B, 0)
 _ArrayDisplay($res)


;==============================================================
; $flag = 1 : return items from array1 which don't exist in array2 (with index)
; $flag = 2 : return items from array2 which don't exist in array1 (with index)
; $flag = 0 : return both
;==============================================================

Func _ArrayCompareAndGetData($array1, $array2, $flag = 0)
  If (not IsArray($array1) AND $array1<>"") OR _ 
    (not IsArray($array2) AND $array2<>"") Then Return SetError(1, 0, 0)
  If $flag < 0 OR $flag > 2 Then $flag = 0
  If $array2 = "" Then $flag = 1  ;Return SetError(2, 0, $array1)
  If $array1 = "" Then Return SetError(3, 0, $array2)
  Local $array, $aTemp, $iRows, $iColumns
  _SQLite_Startup()
  _SQLite_Open()   ; ':memory:'
  _SQLite_Exec (-1, "CREATE TABLE table1 (id, items1); CREATE TABLE table2 (id, items2);") 
  _SQLite_Exec(-1, "Begin;")
  For $i = 0 to UBound($array1)-1
        _SQLite_Exec(-1, "INSERT INTO table1 VALUES (" & $i & ", " & _SQLite_FastEscape($array1[$i]) & ");")
  Next
  For $i = 0 to UBound($array2)-1
        _SQLite_Exec(-1, "INSERT INTO table2 VALUES (" & $i & ", " & _SQLite_FastEscape($array2[$i]) & ");")
  Next
  _SQLite_Exec(-1, "Commit;")

 Switch $flag
   Case 1
     _SQLite_GetTable2d(-1, "SELECT * FROM table1 WHERE items1 NOT IN (SELECT items2 FROM table2) ;", $array, $iRows, $iColumns)   
      $array[0][0] = UBound($array)-1
      $array[0][1] = ""

   Case 2
     _SQLite_GetTable2d(-1, "SELECT * FROM table2 WHERE items2 NOT IN (SELECT items1 FROM table1) ;", $array, $iRows, $iColumns)  
      $array[0][0] = UBound($array)-1
      $array[0][1] = ""

   Case 0
     _SQLite_GetTable2d(-1, "SELECT * FROM table1 WHERE items1 NOT IN (SELECT items2 FROM table2) ;", $array, $iRows, $iColumns)   
     _SQLite_GetTable2d(-1, "SELECT * FROM table2 WHERE items2 NOT IN (SELECT items1 FROM table1) ;", $aTemp, $iRows, $iColumns)  
     Local $n = UBound($array)-1, $m = UBound($aTemp)-1
      Local $s = ($n > $m) ? $n : $m
      Redim $array[$s+1][4]
      For $i = 0 to $m
        $array[$i][2] = $aTemp[$i][0]
        $array[$i][3] = $aTemp[$i][1]
      Next
      $array[0][0] = $n
      $array[0][1] = ""
      $array[0][2] = $m
      $array[0][3] = ""

 EndSwitch

  _SQLite_Close()
  _SQLite_Shutdown()
    Return $array
EndFunc

 

Link to comment
Share on other sites

@junkew got me thinking powershell

#include <AutoItConstants.au3>

$folderA="'DIRTEST\A'"
$folderB="'DIRTEST\B'"

$iPid = run("cmd /c powershell $dir1 = " & $folderA & " ; $dir2 = " & $folderB & " ; $d1 = get-childitem -path $dir1 -recurse ; $d2 = get-childitem -path $dir2 -recurse ; (compare-object $d1 $d2)" ,"" ,@SW_HIDE, $STDOUT_CHILD)

$sOut = ""

While 1
    $sOut &= StdoutRead($iPID)
    If @error Then ExitLoop
WEnd

msgbox(0, '' , $sOut)

DiffTwoDirectories.PNG

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

6 hours ago, rootx said:

With a sort of arrays before the loop for the comparison gets fast, _ArrayBinarySearch run very fast.

With initial sorting you don't even have to use dichotomy (_ArrayBinarySearch) for comparing contents, as just a linear scan of both arrays in parallel will do. You gain a log2 N factor.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

If there is only one row in one of the arrays and 100,000 rows in the other array, I doubt that a linear search is faster on average.

Even with the same number of rows in the two arrays, it's easy to make examples where a binary search is faster. It depends on the data.

Link to comment
Share on other sites

I do mean what I wrote: "a linear scan of both arrays in parallel".

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Lot of focus on the solution comparing 2 arrays not sure if the solution is right to the problem

But looking at the problem it seems to be

  • n folders with m files
  • find the duplicates and report the folder?

Solution direction with 1 array

dir c: /b /s > c_files.txt

or

dir c:\folder1 /b /s > files.txt
dir c:\folder2 /b /s >> files.txt

and then some code as a solution direction to validate (not run/tested)

  • missing part to sort it first (but then i need 2 columns: one with filename and 1 with foldername)
#include <Constants.au3>
#include <Array.au3>
#include <File.au3>
#include <MsgBoxConstants.au3>

Run(@ComSpec & " /k dir c:\tmp")

$sFilePath = @TempDir & "\filedata.txt"

FileReadToArray($sFilePath, $aRetArray)

$totA = ubound($aRetArray)

$i=1
while ($i < $totA-1)
    if $aRetArray[$i]=$aRetArray[$i+1] then
       consolewrite("Duplicate file")
       $i=$i+1
    endif
wend

 

Link to comment
Share on other sites

Comparing two arrays in two ways: Binary search and linear scan:

;#AutoIt3Wrapper_UseX64=y

#AutoIt3Wrapper_Au3Check_Parameters=-d -w 1 -w 2 -w 3 -w 4 -w 5 -w 6

Opt( "MustDeclareVars", 1 )

;#include <Array.au3>

Example()

Func Example()
  ; Read 10,000 unique ascendingly sorted strings into $aAr1
  Local $aAr1 = FileReadToArray( "tst00.txt" )
  Local $iRows = UBound( $aAr1 )
  ;_ArrayDisplay( $aAr1 )

  For $i = 0 To $iRows - 2
    If $aAr1[$i] = $aAr1[$i+1] Then ConsoleWrite( "Strings " & $i & " and " & $i + 1 & " are identical" & @CRLF )
  Next

  ; Copy 100 uniformly distributed strings from $aAr1 into $aAr2
  ; Add a "x" to every second string to make them different
  ; $aAr2 is also unique and ascendingly sorted
  ; 50 identical strings in $aAr1 and $aAr2
  ; 50 different strings in $aAr1 and $aAr2
  Local $aAr2[100]
  For $i = 0 To 99
    $aAr2[$i] = $aAr1[$i*100+99]
    If Mod( $i, 2 ) Then $aAr2[$i] &= "x"
  Next
  ;_ArrayDisplay( $aAr2 )

  Local $hTimer, $t, $t1 = 0, $t2 = 0
  Local $s, $lo, $hi, $mi
  Local $iTests = 10

  For $k = 0 To $iTests - 1

    ; Compare $aAr1 and $aAr2 through
    ; a binary search and a linear scan.
    ; It's utilized that both arrays are sorted.

    ; Binary search
    $mi = -1
    $hTimer = TimerInit()
    For $i = 0 To 99
      $s = $aAr2[$i]
      For $j = $mi + 1 To $iRows - 1
        $lo = $j
        $hi = $iRows - 1
        Do
          $mi = Int( ( $lo + $hi ) / 2 )
          Switch StringCompare( $s, $aAr1[$mi] )
            Case -1
              $hi = $mi - 1
            Case  1
              $lo = $mi + 1
            Case  0
              ;ConsoleWrite( "Found OK: $i, $mi, $s = " & $i & ", " & $mi & ", " & $s & @CRLF )
              ExitLoop 2
          EndSwitch
        Until $lo > $hi
        ;ConsoleWrite( "Found NO: $i, $mi, $s = " & $i & ", " & $mi & ", " & $s & @CRLF )
        ExitLoop
      Next
    Next
    $t = TimerDiff( $hTimer )
    ConsoleWrite( "Test " & $k & " Binary search: " & $t & @CRLF )
    $t1 += $t

    ; Linear scan
    $j = -1
    $hTimer = TimerInit()
    For $i = 0 To 99
      $s = $aAr2[$i]
      For $j = $j + 1 To $iRows - 1
        Switch StringCompare( $aAr1[$j], $s )
          Case  0
            ;ConsoleWrite( "Found OK: $i, $j, $s = " & $i & ", " & $j & ", " & $s & @CRLF )
            ExitLoop
          Case 1
            ;ConsoleWrite( "Found NO: $i, $j, $s = " & $i & ", " & $j & ", " & $s & @CRLF )
            ExitLoop
        EndSwitch
      Next
    Next
    $t = TimerDiff( $hTimer )
    ConsoleWrite( "Test " & $k & " Linear scan:   " & $t & @CRLF )
    $t2 += $t

  Next

  ConsoleWrite( @CRLF )
  ConsoleWrite( "Average Binary search: " & $t1/$iTests & @CRLF )
  ConsoleWrite( "Average Linear scan:   " & $t2/$iTests & @CRLF )
EndFunc

 

Console output:

Test 0 Binary search: 2.74602585342339
Test 0 Linear scan:   10.0892624626688
Test 1 Binary search: 2.6498927853102
Test 1 Linear scan:   9.90807794812693
Test 2 Binary search: 3.73727691310346
Test 2 Linear scan:   12.7233639370786
Test 3 Binary search: 2.68950958283235
Test 3 Linear scan:   10.5594002626344
Test 4 Binary search: 2.70751721806969
Test 4 Linear scan:   10.0164008000931
Test 5 Binary search: 4.07360413122856
Test 5 Linear scan:   9.97927736744995
Test 6 Binary search: 2.5834030552031
Test 6 Linear scan:   9.90918611029538
Test 7 Binary search: 2.49724344660598
Test 7 Linear scan:   9.86818411006267
Test 8 Binary search: 2.49004039251104
Test 8 Linear scan:   9.85100759645166
Test 9 Binary search: 2.56511837942365
Test 9 Linear scan:   9.85322392078857

Average Binary search: 2.87396317577114
Average Linear scan:   10.275738451565

 

Anything wrong with code or results?

tst00.au3 and tst00.txt: Arrays.7z

 

What about 2D arrays?

Edited by LarsJ
Link to comment
Share on other sites

Binary search only works if the elements have been  sorted. Since initial sorting is required, then a linear scan (comparison) should be faster. How can it be otherwise?

Edit: I believe the problem with the code is that you start the linear scan at element 0 on each run. You are testing the same elements over and over again.

Edited by czardas
Link to comment
Share on other sites

From around post 5 it has been assumed that both arrays are sorted.

I do not start the linear scan at element 0 on each run. I'm not testing the same elements over and over again. Read the code.

And I'm pretty sure nothing is wrong with the code. It was a rhetorical question.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...