rootx Posted August 18, 2017 Posted August 18, 2017 I write this script to compare to array, well Do you have a better way to do it? #include <Constants.au3> #include <Array.au3> #include <File.au3> #include <MsgBoxConstants.au3> $a = FileSelectFolder("seleziona cartella A",@ScriptDir) $fold_A = _FileListToArrayRec($a,"*",2,1,0,1) $b = FileSelectFolder("seleziona cartella B",@ScriptDir) $fold_B = _FileListToArrayRec($b,"*",2,1,0,1) $tota = UBound($fold_A)-1 $totb = UBound($fold_B)-1 If $tota > $totb Then findx($fold_A,$fold_B) Else findx($fold_B,$fold_A) EndIf Func findx($min,$max) For $i = 0 To UBound($min)-1 $found = _ArraySearch($max, $min[$i]) If $found = -1 Then ConsoleWrite($min[$i]&@CRLF) EndIf Next EndFunc
SlackerAl Posted August 18, 2017 Posted August 18, 2017 (edited) I'm a bit of an AutoIT novice, so excuse the lack of code examples. I think your problem depends on the relative sizes of the two arrays. If one of the arrays is small (2-10 items), then your method is quite effective, the problem will occur when the two arrays are of significant and similar sizes. If you have a couple of arrays holding 100 000 items each, then that becomes a very large comparison. You could improve things quite a bit by sorting the arrays first and then dividing them into manageable buckets of data. For example, if your stored data is alpanumeric then make 36 arrays, depending on the first letter of your data 0-9 a-z. Then you only have to compare your "a" bucket to your "a" bucket. The overhead of creating the buckets versus your simplistic search of the entire range will depend on the data... Any help? Edit: You don't need to split the smaller array into buckets, just the large one, then iterate the small one. Second Edit (sorry!): Obviously this can be more sophisticated, if you don't have a uniform data distribution, change your strategy - you want to try and make the buckets near uniform size. Edited August 18, 2017 by SlackerAl rootx 1 Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.
SlackerAl Posted August 18, 2017 Posted August 18, 2017 Seems an intermediate solution (in terms of effort) would be to sort the large array and then use _ArrayBinarySearch perhaps? rootx 1 Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.
mikell Posted August 18, 2017 Posted August 18, 2017 You might have a look at this thread. For comparing 1D arrays the Scripting.Dictionary way is by far much faster rootx 1
rootx Posted August 18, 2017 Author Posted August 18, 2017 Like this.. #include <Constants.au3> #include <Array.au3> #include <File.au3> #include <MsgBoxConstants.au3> $a = FileSelectFolder("seleziona cartella A",@ScriptDir) $fold_A = _FileListToArrayRec($a,"*",2,1,0,1) $b = FileSelectFolder("seleziona cartella B",@ScriptDir) $fold_B = _FileListToArrayRec($b,"*",2,1,0,1) $tota = UBound($fold_A)-1 $totb = UBound($fold_B)-1 If $tota > $totb Then findx($fold_A,$fold_B) Else findx($fold_B,$fold_A) EndIf Func findx($min,$max) _ArraySort($min, 0, 1) _ArraySort($max, 0, 1) For $i = 0 To UBound($min)-1 $found = _ArrayBinarySearch($max, $min[$i], 1) If $found = -1 Then ConsoleWrite($i&" "&$min[$i]&@CRLF) EndIf Next EndFunc
junkew Posted August 18, 2017 Posted August 18, 2017 not tested but something like below. Better way is hard to say as that depends on your requirements, speed wanted etc. maybe powershell is quicker on the cmdline just comparing 2 folders. #include <Constants.au3> #include <Array.au3> #include <File.au3> #include <MsgBoxConstants.au3> $a = FileSelectFolder("seleziona cartella A",@ScriptDir) $fold_A = _FileListToArrayRec($a,"*",2,1,0,1) $b = FileSelectFolder("seleziona cartella B",@ScriptDir) $fold_B = _FileListToArrayRec($b,"*",2,1,0,1) $tota = UBound($fold_A)-1 $totb = UBound($fold_B)-1 _ArraySort($fold_a, 0, 1) _ArraySort($fold_b, 0, 1) $i=1 $j=1 while ($i < $totA) or ($j<totB) if $fold_a[$i]=$fold_b[$j] then consolewrite("Equal") $i=$i+1 $j=$j+1 elseif $fold_a[$i]<$fold_b[$j] then consolewrite("Not Equal") $i=$i+1 else consolewrite("Not Equal") $j=$j+1 endif wend FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets
jchd Posted August 18, 2017 Posted August 18, 2017 It all depends on what you call "equal" applied to arrays. In my book, [London", "Paris", "Melbourne"] is definitely not equal to [London", "Melbourne", "Paris"]. Both arrays contain the same elements but they are certainly not equal. Similarly to itineraries, a picture on screen isn't equal to the same picture once flipped or with all pixels moved around. It's only when you consider 1D arrays and call them sets that {4, 31, 9} = {9, 4, 31} This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
rootx Posted August 19, 2017 Author Posted August 19, 2017 I just have to compare the contents of two folders. With a sort of arrays before the loop for the comparison gets fast, _ArrayBinarySearch run very fats, 4 example in this moment I run a diff for to folders with 26.186 files, 172 folders TOT 249 GB I have a correct result in a few seconds. With winmerge.... after 15 min I have 50% of total process.... Another method could be the C++ memcmp function. Thx guys.
mikell Posted August 19, 2017 Posted August 19, 2017 ? #include <Constants.au3> #include <Array.au3> #include <File.au3> $a = FileSelectFolder("seleziona cartella A",@ScriptDir) $fold_A = _FileListToArrayRec($a,"*",2,1,0,1) $b = FileSelectFolder("seleziona cartella B",@ScriptDir) $fold_B = _FileListToArrayRec($b,"*",2,1,0,1) local $big = (UBound($fold_A)>UBound($fold_B)) ? $fold_A : $fold_B local $small = (UBound($fold_A)<=UBound($fold_B)) ? $fold_A : $fold_B findx($small, $big) Func findx($min, $max) Local $sd = ObjCreate("Scripting.Dictionary") $sd.CompareMode = 1 ; case insensitive For $i In $max $sd.Item($i) Next For $i In $min If $sd.Exists($i) Then $sd.Remove($i) Next $asd = $sd.Keys() _ArrayDisplay($asd, "$asd") EndFunc rootx 1
rootx Posted August 19, 2017 Author Posted August 19, 2017 Your approach is a bit faster, it would be nice to do a test with a huge amount and variety of data to understand the differences in practical terms.
water Posted August 19, 2017 Posted August 19, 2017 18 minutes ago, rootx said: I just have to compare the contents of two folders. With a sort of arrays before the loop for the comparison gets fast, _ArrayBinarySearch run very fats, 4 example in this moment I run a diff for to folders with 26.186 files, 172 folders TOT 249 GB I have a correct result in a few seconds. With winmerge.... after 15 min I have 50% of total process.... 15 minutes? Does this mean you not just compare the file and directory names but the file contents as well? Just comparing the file/directory names should happen in under a minute. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
mikell Posted August 19, 2017 Posted August 19, 2017 If the amount of data is really huge then maybe it's worth a try to the SQLite way (untested, possible typos...) expandcollapse popup#include <SQLite.au3> ;#include <SQLite.dll.au3> #include <Array.au3> $a = FileSelectFolder("seleziona cartella A",@ScriptDir) $fold_A = _FileListToArrayRec($a,"*",2,1,0,1) $b = FileSelectFolder("seleziona cartella B",@ScriptDir) $fold_B = _FileListToArrayRec($b,"*",2,1,0,1) $res = _ArrayCompareAndGetData($fold_A, $fold_B, 1) _ArrayDisplay($res) $res = _ArrayCompareAndGetData($fold_A, $fold_B, 0) _ArrayDisplay($res) ;============================================================== ; $flag = 1 : return items from array1 which don't exist in array2 (with index) ; $flag = 2 : return items from array2 which don't exist in array1 (with index) ; $flag = 0 : return both ;============================================================== Func _ArrayCompareAndGetData($array1, $array2, $flag = 0) If (not IsArray($array1) AND $array1<>"") OR _ (not IsArray($array2) AND $array2<>"") Then Return SetError(1, 0, 0) If $flag < 0 OR $flag > 2 Then $flag = 0 If $array2 = "" Then $flag = 1 ;Return SetError(2, 0, $array1) If $array1 = "" Then Return SetError(3, 0, $array2) Local $array, $aTemp, $iRows, $iColumns _SQLite_Startup() _SQLite_Open() ; ':memory:' _SQLite_Exec (-1, "CREATE TABLE table1 (id, items1); CREATE TABLE table2 (id, items2);") _SQLite_Exec(-1, "Begin;") For $i = 0 to UBound($array1)-1 _SQLite_Exec(-1, "INSERT INTO table1 VALUES (" & $i & ", " & _SQLite_FastEscape($array1[$i]) & ");") Next For $i = 0 to UBound($array2)-1 _SQLite_Exec(-1, "INSERT INTO table2 VALUES (" & $i & ", " & _SQLite_FastEscape($array2[$i]) & ");") Next _SQLite_Exec(-1, "Commit;") Switch $flag Case 1 _SQLite_GetTable2d(-1, "SELECT * FROM table1 WHERE items1 NOT IN (SELECT items2 FROM table2) ;", $array, $iRows, $iColumns) $array[0][0] = UBound($array)-1 $array[0][1] = "" Case 2 _SQLite_GetTable2d(-1, "SELECT * FROM table2 WHERE items2 NOT IN (SELECT items1 FROM table1) ;", $array, $iRows, $iColumns) $array[0][0] = UBound($array)-1 $array[0][1] = "" Case 0 _SQLite_GetTable2d(-1, "SELECT * FROM table1 WHERE items1 NOT IN (SELECT items2 FROM table2) ;", $array, $iRows, $iColumns) _SQLite_GetTable2d(-1, "SELECT * FROM table2 WHERE items2 NOT IN (SELECT items1 FROM table1) ;", $aTemp, $iRows, $iColumns) Local $n = UBound($array)-1, $m = UBound($aTemp)-1 Local $s = ($n > $m) ? $n : $m Redim $array[$s+1][4] For $i = 0 to $m $array[$i][2] = $aTemp[$i][0] $array[$i][3] = $aTemp[$i][1] Next $array[0][0] = $n $array[0][1] = "" $array[0][2] = $m $array[0][3] = "" EndSwitch _SQLite_Close() _SQLite_Shutdown() Return $array EndFunc
iamtheky Posted August 19, 2017 Posted August 19, 2017 (edited) @junkew got me thinking powershell #include <AutoItConstants.au3> $folderA="'DIRTEST\A'" $folderB="'DIRTEST\B'" $iPid = run("cmd /c powershell $dir1 = " & $folderA & " ; $dir2 = " & $folderB & " ; $d1 = get-childitem -path $dir1 -recurse ; $d2 = get-childitem -path $dir2 -recurse ; (compare-object $d1 $d2)" ,"" ,@SW_HIDE, $STDOUT_CHILD) $sOut = "" While 1 $sOut &= StdoutRead($iPID) If @error Then ExitLoop WEnd msgbox(0, '' , $sOut) Edited August 19, 2017 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
jchd Posted August 19, 2017 Posted August 19, 2017 6 hours ago, rootx said: With a sort of arrays before the loop for the comparison gets fast, _ArrayBinarySearch run very fast. With initial sorting you don't even have to use dichotomy (_ArrayBinarySearch) for comparing contents, as just a linear scan of both arrays in parallel will do. You gain a log2 N factor. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
LarsJ Posted August 20, 2017 Posted August 20, 2017 If there is only one row in one of the arrays and 100,000 rows in the other array, I doubt that a linear search is faster on average. Even with the same number of rows in the two arrays, it's easy to make examples where a binary search is faster. It depends on the data. rootx 1 Controls, File Explorer, ROT objects, UI Automation, Windows Message MonitorCompiled code: Accessing AutoIt variables, DotNet.au3 UDF, Using C# and VB codeShell menus: The Context menu, The Favorites menu. Shell related: Control Panel, System Image ListsGraphics related: Rubik's Cube, OpenGL without external libraries, Navigating in an image, Non-rectangular selectionsListView controls: Colors and fonts, Multi-line header, Multi-line items, Checkboxes and icons, Incremental searchListView controls: Virtual ListViews, Editing cells, Data display functions
jchd Posted August 20, 2017 Posted August 20, 2017 I do mean what I wrote: "a linear scan of both arrays in parallel". This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
junkew Posted August 21, 2017 Posted August 21, 2017 Lot of focus on the solution comparing 2 arrays not sure if the solution is right to the problem But looking at the problem it seems to be n folders with m files find the duplicates and report the folder? Solution direction with 1 array dir c: /b /s > c_files.txt or dir c:\folder1 /b /s > files.txt dir c:\folder2 /b /s >> files.txt and then some code as a solution direction to validate (not run/tested) missing part to sort it first (but then i need 2 columns: one with filename and 1 with foldername) #include <Constants.au3> #include <Array.au3> #include <File.au3> #include <MsgBoxConstants.au3> Run(@ComSpec & " /k dir c:\tmp") $sFilePath = @TempDir & "\filedata.txt" FileReadToArray($sFilePath, $aRetArray) $totA = ubound($aRetArray) $i=1 while ($i < $totA-1) if $aRetArray[$i]=$aRetArray[$i+1] then consolewrite("Duplicate file") $i=$i+1 endif wend FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets
LarsJ Posted August 21, 2017 Posted August 21, 2017 (edited) Comparing two arrays in two ways: Binary search and linear scan: expandcollapse popup;#AutoIt3Wrapper_UseX64=y #AutoIt3Wrapper_Au3Check_Parameters=-d -w 1 -w 2 -w 3 -w 4 -w 5 -w 6 Opt( "MustDeclareVars", 1 ) ;#include <Array.au3> Example() Func Example() ; Read 10,000 unique ascendingly sorted strings into $aAr1 Local $aAr1 = FileReadToArray( "tst00.txt" ) Local $iRows = UBound( $aAr1 ) ;_ArrayDisplay( $aAr1 ) For $i = 0 To $iRows - 2 If $aAr1[$i] = $aAr1[$i+1] Then ConsoleWrite( "Strings " & $i & " and " & $i + 1 & " are identical" & @CRLF ) Next ; Copy 100 uniformly distributed strings from $aAr1 into $aAr2 ; Add a "x" to every second string to make them different ; $aAr2 is also unique and ascendingly sorted ; 50 identical strings in $aAr1 and $aAr2 ; 50 different strings in $aAr1 and $aAr2 Local $aAr2[100] For $i = 0 To 99 $aAr2[$i] = $aAr1[$i*100+99] If Mod( $i, 2 ) Then $aAr2[$i] &= "x" Next ;_ArrayDisplay( $aAr2 ) Local $hTimer, $t, $t1 = 0, $t2 = 0 Local $s, $lo, $hi, $mi Local $iTests = 10 For $k = 0 To $iTests - 1 ; Compare $aAr1 and $aAr2 through ; a binary search and a linear scan. ; It's utilized that both arrays are sorted. ; Binary search $mi = -1 $hTimer = TimerInit() For $i = 0 To 99 $s = $aAr2[$i] For $j = $mi + 1 To $iRows - 1 $lo = $j $hi = $iRows - 1 Do $mi = Int( ( $lo + $hi ) / 2 ) Switch StringCompare( $s, $aAr1[$mi] ) Case -1 $hi = $mi - 1 Case 1 $lo = $mi + 1 Case 0 ;ConsoleWrite( "Found OK: $i, $mi, $s = " & $i & ", " & $mi & ", " & $s & @CRLF ) ExitLoop 2 EndSwitch Until $lo > $hi ;ConsoleWrite( "Found NO: $i, $mi, $s = " & $i & ", " & $mi & ", " & $s & @CRLF ) ExitLoop Next Next $t = TimerDiff( $hTimer ) ConsoleWrite( "Test " & $k & " Binary search: " & $t & @CRLF ) $t1 += $t ; Linear scan $j = -1 $hTimer = TimerInit() For $i = 0 To 99 $s = $aAr2[$i] For $j = $j + 1 To $iRows - 1 Switch StringCompare( $aAr1[$j], $s ) Case 0 ;ConsoleWrite( "Found OK: $i, $j, $s = " & $i & ", " & $j & ", " & $s & @CRLF ) ExitLoop Case 1 ;ConsoleWrite( "Found NO: $i, $j, $s = " & $i & ", " & $j & ", " & $s & @CRLF ) ExitLoop EndSwitch Next Next $t = TimerDiff( $hTimer ) ConsoleWrite( "Test " & $k & " Linear scan: " & $t & @CRLF ) $t2 += $t Next ConsoleWrite( @CRLF ) ConsoleWrite( "Average Binary search: " & $t1/$iTests & @CRLF ) ConsoleWrite( "Average Linear scan: " & $t2/$iTests & @CRLF ) EndFunc Console output: Test 0 Binary search: 2.74602585342339 Test 0 Linear scan: 10.0892624626688 Test 1 Binary search: 2.6498927853102 Test 1 Linear scan: 9.90807794812693 Test 2 Binary search: 3.73727691310346 Test 2 Linear scan: 12.7233639370786 Test 3 Binary search: 2.68950958283235 Test 3 Linear scan: 10.5594002626344 Test 4 Binary search: 2.70751721806969 Test 4 Linear scan: 10.0164008000931 Test 5 Binary search: 4.07360413122856 Test 5 Linear scan: 9.97927736744995 Test 6 Binary search: 2.5834030552031 Test 6 Linear scan: 9.90918611029538 Test 7 Binary search: 2.49724344660598 Test 7 Linear scan: 9.86818411006267 Test 8 Binary search: 2.49004039251104 Test 8 Linear scan: 9.85100759645166 Test 9 Binary search: 2.56511837942365 Test 9 Linear scan: 9.85322392078857 Average Binary search: 2.87396317577114 Average Linear scan: 10.275738451565 Anything wrong with code or results? tst00.au3 and tst00.txt: Arrays.7z What about 2D arrays? Edited August 21, 2017 by LarsJ Controls, File Explorer, ROT objects, UI Automation, Windows Message MonitorCompiled code: Accessing AutoIt variables, DotNet.au3 UDF, Using C# and VB codeShell menus: The Context menu, The Favorites menu. Shell related: Control Panel, System Image ListsGraphics related: Rubik's Cube, OpenGL without external libraries, Navigating in an image, Non-rectangular selectionsListView controls: Colors and fonts, Multi-line header, Multi-line items, Checkboxes and icons, Incremental searchListView controls: Virtual ListViews, Editing cells, Data display functions
czardas Posted August 21, 2017 Posted August 21, 2017 (edited) Binary search only works if the elements have been sorted. Since initial sorting is required, then a linear scan (comparison) should be faster. How can it be otherwise? Edit: I believe the problem with the code is that you start the linear scan at element 0 on each run. You are testing the same elements over and over again. Edited September 12, 2017 by czardas operator64 ArrayWorkshop
LarsJ Posted August 22, 2017 Posted August 22, 2017 From around post 5 it has been assumed that both arrays are sorted. I do not start the linear scan at element 0 on each run. I'm not testing the same elements over and over again. Read the code. And I'm pretty sure nothing is wrong with the code. It was a rhetorical question. Controls, File Explorer, ROT objects, UI Automation, Windows Message MonitorCompiled code: Accessing AutoIt variables, DotNet.au3 UDF, Using C# and VB codeShell menus: The Context menu, The Favorites menu. Shell related: Control Panel, System Image ListsGraphics related: Rubik's Cube, OpenGL without external libraries, Navigating in an image, Non-rectangular selectionsListView controls: Colors and fonts, Multi-line header, Multi-line items, Checkboxes and icons, Incremental searchListView controls: Virtual ListViews, Editing cells, Data display functions
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now