sshrum

Funcs to evaluate 2 arrays into a 3rd?

18 posts in this topic

Are there any functions for analyzing 2 arrays...like to find similar and/or to find dissimilar entries and have that output to a 3rd array short of doing it myself?

I'm creating a DIR-cmd array (array 1) that would be compared to a database-created array (array 2).  I'm trying to create a new 3rd array of any entries not in the database that are in the DIR-cmd array (new files).

Subsequently, I then want to do the same sort of thing but create another array that would list files in the database array that are not in the DIR array (deleted files)

TIA


Sean Shrum :: http://www.shrum.net

All my published AU3-based apps and utilities

'Make it idiot-proof, and someone will make a better idiot'

 

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Not sure if there is a simpler way than just doing a loop which searches every value in the generated array against your control array. For example:

#include <Array.au3>

Local $control_Array = StringSplit("apple,bread,dog,cat,engine,frog,giant,horse,indigo", ",")
_ArraySort($control_Array)

Local $generated_Array = StringSplit("zed,bread,yale,cat,engine,kite,giant,lion,indigo", ",")
Dim $final_Array[1]

Local $j = 0, $i, $r

For $i = 1 to Ubound($generated_Array) - 1
    $r = _ArrayBinarySearch($control_Array, $generated_Array[$i], 1)
    If $r <> - 1 Then
        ReDim $final_Array[Ubound($final_Array)+1]
        $final_Array[$j] = $generated_Array[$i]
        $j += 1
    EndIf
Next

;clean up (remove the last empty entry)
 _ArrayDelete($final_Array, Ubound($final_Array) - 1)


_ArrayDisplay($final_Array, '$final_Array')

Are there any functions for analyzing 2 arrays...like to find similar and/or to find dissimilar entries and have that output to a 3rd array short of doing it myself?

I'm creating a DIR-cmd array (array 1) that would be compared to a database-created array (array 2).  I'm trying to create a new 3rd array of any entries not in the database that are in the DIR-cmd array (new files).

Subsequently, I then want to do the same sort of thing but create another array that would list files in the database array that are not in the DIR array (deleted files)

TIA

Edited by mpower

Share this post


Link to post
Share on other sites

sshrum,

A quick search brought up this thread. :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Actually, here is a comparison of a few methods and Scripting Dictionary seems to be extremely fast compared to all other methods. This method is new to me so I am not sure of any limitations.

#include <Array.au3>

Local $control_Array = StringSplit("5,7,8,12,18,19,20,24,25,36,38,40,41,42,46,50,54,61,62,64,65,66,67,68,69,70,71,"& _
                                  "72,76,77,81,84,86,88,89,95,96,99,101,102,103,105,106,113,117,122,125,130,132,137,"& _
                                  "143,145,146,147,156,157,160,163,165,168,172,173,176,178,180,181,193,198,204,205,"& _
                                  "211,218,220,222,225,231,234,235,237,240,241,243,244,249,251,256,257,260,265,267,"& _
                                  "270,271,272,274,275,276,279,280,284,290,291,295,297,299,302,308,309,310,311,324,"& _
                                  "326,327,329,339,345,346,350,356,357,358,360,368,369,370,371,375,376,377,381,390,"& _
                                  "392,393,398,402,403,404,406,407,408,420,421,423,427,430,436,437,446,453,458,460,"& _
                                  "461,467,473,474,475,478,481,484,496,500,502,504,505,507,508,511,514,517,518,520,"& _
                                  "521,529,532,535,541,545,546,547,549,551,552,553,556,557,559,560,573,577,581,582,"& _
                                  "585,587,589,594,604,607,609,615,626,628,634,649,650,655,662,667,668,670,673,675,"& _
                                  "681,683,685,694,696,700,704,712,714,718,720,726,729,732,734,736,737,744,751,760,"& _
                                  "761,763,769,770,776,778,783,795,800,805,806,809,812,814,815,820,822,824,826,827,"& _
                                  "832,841,848,849,850,855,865,868,874,878,879,881,883,885,886,887,888,890,893,894,"& _
                                  "899,901,910,916,917,924,929,934,935,939,942,943,946,947,948,951,953,956,958,959,"& _
                                  "960,966,971,975,980,986,989,990,994,996", ",", 2)
_ArraySort($control_Array)

Local $generated_Array = StringSplit("1,2,3,12,13,15,17,19,23,26,32,35,42,43,47,49,52,56,58,60,67,68,69,72,73,76,78,"& _
                                    "84,85,86,88,90,91,98,103,104,105,108,113,116,118,119,120,122,123,124,125,129,130,"& _
                                    "134,135,138,142,144,145,163,168,173,174,175,180,181,183,188,189,192,193,194,199,"& _
                                    "200,204,208,217,220,224,227,228,234,238,241,251,252,254,260,274,277,283,287,290,"& _
                                    "292,297,302,305,313,316,322,323,324,328,333,334,337,341,342,344,348,355,357,360,"& _
                                    "363,367,371,373,374,381,382,384,385,389,392,407,408,409,411,413,418,422,424,425,"& _
                                    "430,431,434,444,449,450,451,453,454,457,466,469,475,477,479,480,487,491,495,499,"& _
                                    "500,503,508,510,511,512,513,517,544,546,549,556,560,567,569,570,571,572,576,578,"& _
                                    "580,585,587,595,599,600,601,608,615,618,619,624,627,629,633,636,637,639,642,643,"& _
                                    "646,650,651,659,662,665,670,673,686,689,690,692,693,697,702,704,713,718,720,721,"& _
                                    "724,726,727,731,733,736,739,742,743,746,748,749,751,753,754,759,762,767,770,772,"& _
                                    "773,777,781,783,788,790,794,798,800,812,815,820,824,825,826,827,829,834,835,836,"& _
                                    "837,840,841,842,843,844,851,857,858,866,871,874,880,882,888,889,898,899,901,906,"& _
                                    "907,909,913,914,920,921,924,925,926,927,931,936,937,942,943,945,947,949,952,954,"& _
                                    "956,958,959,960,962,965,975,984,986,994,996", ",", 2)

Dim $final_Array[1]

Local $j = 0, $i, $r

$timer = TimerInit()

For $i = 0 to Ubound($generated_Array) - 1
    $r = _ArrayBinarySearch($control_Array, $generated_Array[$i])
    If $r <> - 1 Then
        ReDim $final_Array[Ubound($final_Array)+1]
        $final_Array[$j] = $generated_Array[$i]
        $j += 1
    EndIf
Next

$tdiff_asb = TimerDiff($timer)

_ArrayDisplay($final_Array, '$final_Array _ArrayBinarySearch method')

Local $aBoth[0]

$timer2 = TimerInit()

For $i = ubound($control_Array) - 1 to 0 step -1
    $iMatch = _ArraySearch($generated_Array , $control_Array[$i])
    If $iMatch <> -1 Then _ArrayAdd($aBoth , $control_Array[$i])
Next

For $i = ubound($generated_Array) - 1 to 0 step -1
    $iMatch = _ArraySearch($control_Array , $generated_Array[$i])
    If $iMatch <> -1 Then _ArrayAdd($aBoth , $generated_Array[$i])
Next

$tdiff_as = TimerDiff($timer2)

_ArrayDisplay($final_Array, '$aBoth _ArraySearch method')

$timer3 = TimerInit()

_Separate($control_Array, $generated_Array)

$tdiff_sep = TimerDiff($timer3)

Func _Separate(ByRef $in0, ByRef $in1)
    $in0 = _ArrayUnique($in0, 0, Default, Default, 0)
    $in1 = _ArrayUnique($in1, 0, Default, Default, 0)
    Local $z[2] = [UBound($in0), UBound($in1)], $low = 1 * ($z[0] > $z[1]), $aTemp[$z[Not $low]][3], $aOut = $aTemp, $aNdx[3]
    For $i = 0 To $z[Not $low] - 1
        If $i < $z[0] Then $aTemp[$i][0] = $in0[$i]
        If $i < $z[1] Then $aTemp[$i][1] = $in1[$i]
    Next
    For $i = 0 To $z[$low] - 1
        $x = _ArrayFindAll($aTemp, $aTemp[$i][$low], 0, 0, 1, 0, Not $low)
        If Not @error Then ; both
            For $j = 0 To UBound($x) - 1
                $aTemp[$x[$j]][2] = 1
            Next
            $aOut[$aNdx[2]][2] = $aTemp[$i][$low]
            $aNdx[2] += 1
        Else ; only in $low
            $aOut[$aNdx[$low]][$low] = $aTemp[$i][$low]
            $aNdx[$low] += 1
        EndIf
    Next
    For $i = 0 To $z[Not $low] - 1
        If $aTemp[$i][2] <> 1 Then
            $aOut[$aNdx[Not $low]][Not $low] = $aTemp[$i][Not $low]
            $aNdx[Not $low] += 1
        EndIf
    Next
    ReDim $aOut[_ArrayMax($aNdx)][3]
    Return $aOut
EndFunc   ;==>_Separate

$timer4 = TimerInit()

$sda = ObjCreate("Scripting.Dictionary")
$sdb = ObjCreate("Scripting.Dictionary")
$sdc = ObjCreate("Scripting.Dictionary")

For $i In $control_Array
    $sda.Item($i)
Next
For $i In $generated_Array
    $sdb.Item($i)
Next

For $i In $control_Array
    If $sdb.Exists($i) Then $sdc.Item($i)
Next
$asd3 = $sdc.Keys()

$tdiff_scr = TimerDiff($timer4)

_ArrayDisplay($asd3, '$asd3 Scripting Dictionary method')

ConsoleWrite('_ArrayBinarySearch method took '&Round($tdiff_asb, 2)&' ms'&@CRLF)
ConsoleWrite('_ArraySearch method took '&Round($tdiff_as, 2)&' ms'&@CRLF)
ConsoleWrite('_Separate method took '&Round($tdiff_sep, 2)&' ms'&@CRLF)
ConsoleWrite('Scripting Dictionary method took '&Round($tdiff_scr, 2)&' ms'&@CRLF)

Share this post


Link to post
Share on other sites

The thing to keep in mind is that I'm dealing with 2 arrays with over 70,000 records each.  Most of the search options presented, while effective on small sets of data, tend to choke on this many records.  Even doing ArraySearch or even ArrayBinarySearch after doing ArraySort still proves too time consuming (let's just say I haven't sat around long enough for it to complete before breaking it).


Sean Shrum :: http://www.shrum.net

All my published AU3-based apps and utilities

'Make it idiot-proof, and someone will make a better idiot'

 

Share this post


Link to post
Share on other sites

sshrum,

If your data-sets are that size it sounds as if you need to think about a database solution - especially as one of your sets comes from a database already. ;)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

For now I'm running this. At first it's sluggish but it gets faster as it goes as I'm remove entries that match making the search deal with less and less over time....

For $i = $aFiles[0]-1 to 0 step -1
    $iMatch = _ArrayBinarySearch($aDatabase , $aFiles[$i], 1)
    If $iMatch <> -1 Then
        _Arraydelete($aDatabase, $iMatch)
        _ArrayDelete($aFiles, $i)
        ConsoleWrite("=")
    Else
        ConsoleWrite("/")
    EndIf
Next
_FileWriteFromArray($sPlayer & "\deleted.txt", $aDatabase, 1)
_FileWriteFromArray($sPlayer & "\new.txt", $aFiles, 1)

Not sure if it's 100% but will see after I wake up tomorrow...hopefully I'll have 2 files with the results I want.


Sean Shrum :: http://www.shrum.net

All my published AU3-based apps and utilities

'Make it idiot-proof, and someone will make a better idiot'

 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

The database route is still an option...UNION and all but my SQL-foo hasn't been used in awhile. :-P

 

...actually if someone has a code snippet on doing the union comparisons that would be awesome.  Each array is just 1 field with the full pathname.

Edited by sshrum

Sean Shrum :: http://www.shrum.net

All my published AU3-based apps and utilities

'Make it idiot-proof, and someone will make a better idiot'

 

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

I just tried this:

#include <Array.au3>
#include <File.au3>

Global $control_Array, $generated_Array
_FileReadToArray(@ScriptDir & '\control_array.txt', $control_Array)
_ArraySort($control_Array)
_FileReadToArray(@ScriptDir & '\generated_array.txt', $generated_Array)
_ArraySort($generated_Array)

ConsoleWrite('$control_Array has '&Ubound($control_Array)-1&' items.'&@CRLF)
ConsoleWrite('$generated_Array has '&Ubound($generated_Array)-1&' items.'&@CRLF)

Dim $final_Array[1]

Local $j = 0, $i, $r

$timer = TimerInit()

For $i = 0 to Ubound($generated_Array) - 1
    $r = _ArrayBinarySearch($control_Array, $generated_Array[$i])
    If $r <> - 1 Then
        ReDim $final_Array[Ubound($final_Array)+1]
        $final_Array[$j] = $generated_Array[$i]
        $j += 1
    EndIf
Next

_ArrayDelete($final_Array, Ubound($final_Array)-1)
$final_Array = _ArrayUnique($final_Array)
_ArraySort($final_Array)

$tdiff_asb = TimerDiff($timer)

$timer2 = TimerInit()

$sda = ObjCreate("Scripting.Dictionary")
$sdb = ObjCreate("Scripting.Dictionary")
$sdc = ObjCreate("Scripting.Dictionary")

For $i In $control_Array
    $sda.Item($i)
Next
For $i In $generated_Array
    $sdb.Item($i)
Next
For $i In $control_Array
    If $sdb.Exists($i) Then $sdc.Item($i)
Next
$asd3 = $sdc.Keys()
$asd3 = _ArrayUnique($asd3)
$tdiff_scr = TimerDiff($timer2)

ConsoleWrite('_ArrayBinarySearch method took '&Round($tdiff_asb/1000, 2)&' seconds. Matches found: '&Ubound($final_Array)-1&@CRLF)
ConsoleWrite('Scripting Dictionary method took '&Round($tdiff_scr/1000, 2)&' seconds. Matches found: '&Ubound($asd3)-1&@CRLF)

My results were:

 

$control_Array has 90000 items.

$generated_Array has 77508 items.
_ArrayBinarySearch method took 54.58 seconds. Matches found: 17008
Scripting Dictionary method took 1.78 seconds. Matches found: 17008

 

generated_array.txt

control_array.txt

Edited by mpower

Share this post


Link to post
Share on other sites

You're better use the power of your database engine to perform the comparison instead of relying on pedestrian slower applicative code.

Here are some topics which you could use to get started:


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Found it was quick enough for my use to just create a 3rd array and load it with the index values of the entries in the files array that failed _ArrayBinarySearch on the database array.  

I guess you'd call that a key-reference array.

Got the idea from the code snippets above.  Thx.

Searches against my two 70,000+ entry arrays now takes ~6 seconds.


Sean Shrum :: http://www.shrum.net

All my published AU3-based apps and utilities

'Make it idiot-proof, and someone will make a better idiot'

 

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

I just tried this:

For $i = 0 to Ubound($generated_Array) - 1
    $r = _ArrayBinarySearch($control_Array, $generated_Array[$i])
    If $r <> - 1 Then
        ReDim $final_Array[Ubound($final_Array)+1]
        $final_Array[$j] = $generated_Array[$i]
        $j += 1
    EndIf
Next

My results were:

 

This is completely non-effective because of doing Redim inside of loop!

allocate dimense of final array to the same size as input array BEFORE main loop and at the end AFTER main loop do just one Redim to the correct size.

Edited by Zedna

Share this post


Link to post
Share on other sites

Thanks Zedna! You are absolutely right, moving ReDim outside the loop has increased the functions speed 10-fold!!!

Dim $final_Array[Ubound($generated_Array)-1]

Local $j = 0, $i, $r

For $i = 0 to Ubound($generated_Array) - 1
    $r = _ArrayBinarySearch($control_Array, $generated_Array[$i])
    If $r <> - 1 Then
        $final_Array[$j] = $generated_Array[$i]
        $j += 1
    EndIf
Next
ReDim $final_Array[$j]

$final_Array = _ArrayUnique($final_Array)
_ArraySort($final_Array)

Now this functions can compare the two arrays (one with ~90k rows and other with ~77k rows) in just 5 seconds (previously nearly 55 seconds)!!

Still though, the Scripting Dictionary method is faster (1.72 seconds).

Share this post


Link to post
Share on other sites

Zedna, is there a way to include SQLite capability without Administrator Rights ?

Share this post


Link to post
Share on other sites

For us AutoIt users, SQLite is nothing more than a simple DLL. So running as admin is no more an issue than running any other non-SQLite script, provided of course that you have the required DLL in some user-reachable place.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Thanks Zedna! You are absolutely right, moving ReDim outside the loop has increased the functions speed 10-fold!!!

Dim $final_Array[Ubound($generated_Array)-1]

Local $j = 0, $i, $r

For $i = 0 to Ubound($generated_Array) - 1
    $r = _ArrayBinarySearch($control_Array, $generated_Array[$i])
    If $r <> - 1 Then
        $final_Array[$j] = $generated_Array[$i]
        $j += 1
    EndIf
Next
ReDim $final_Array[$j]

$final_Array = _ArrayUnique($final_Array)
_ArraySort($final_Array)

Now this functions can compare the two arrays (one with ~90k rows and other with ~77k rows) in just 5 seconds (previously nearly 55 seconds)!!

Still though, the Scripting Dictionary method is faster (1.72 seconds).

 

You can speed up this little bit by doing local modified copy of Func _ArrayBinarySearch() and removing all not neccessary checking from beginning of that function (IsArray,UBound, $iStart,$iEnd). You can do some checking (for example array boundary) only once before main loop and remove it from Func _ArrayBinarySearch().

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

The database route is still an option...UNION and all but my SQL-foo hasn't been used in awhile. :-P

 

...actually if someone has a code snippet on doing the union comparisons that would be awesome.  Each array is just 1 field with the full pathname.

 

I have used this problem to post a possible solution as an example on using my ArraySQL udf in >this post on the "Example Scripts" forum.

Have a look if you are interested on trying with sql

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now