Sign in to follow this  
Followers 0
Kapz

Challenge: Array compare script

6 posts in this topic

In my lab I have been asked to compare the directory of one computer to the directory of another to determine what data is missing. As there are over 12,000 "runs" or groups on our main computer and roughly 2500 runs on the other computer, I quickly decided that an automated program would facilatate things enormously. I created a script to create a large array out of the directory listings of both computers using "./" as the delimeter to seperate each group of data. Within these groups of data, I had to make sure that the same number of lines exist so I created sub-arrays out of each original element and determined the amount of sub-arryas within each. If the first elelement of these sub-arrays matched between the two directory listings then I went on to check if the number of elements were the same . If they were then that entire group was recorded on a text file and the next element was compared. The problem with my script is that it runs way too slow. After leaving it over night it only managed to get through 200 or so elements from the master list of 2500. I was curious to see if there was a much faster/better way of going about solving this problem. I have posted my script below along with a small portion of the text files that I am comparing. Any help would be greatly appreciated.

HotKeySet("{ESC}", "MyExit")

    $delay1 = 200
    $delay2 = 500
    
;;;;;;;;Create a string with slave file
    WinActivate("daw50 - Notepad","")
    Sleep($delay2)
    Send("^a")
    Send("^c")
    $daw50 = ClipGet()
    Sleep($delay2)

;;;;;;;;Create a string with master file
    WinActivate("match - Notepad","")
    Sleep($delay2)
    Send("^a")
    Send("^c")
    $map41 = ClipGet()

;;;;;;;;Strip CR
    $50 = StringStripCR($daw50)
    $41 = StringStripCR($map41)


;;;;;;;;Create Array for slave and master files respectively using ./ as the delimeter
    $50 = StringSplit($50, "./", 1)
    $41 = StringSplit($41, "./", 1)

    WinActivate("new - Notepad","")


;;;;;;;;For loop for elements of master array
    For $x = 1 to UBound($41) - 1

    ;;;;;;;;;Split the master array into subelements by generating a new array for each element     
        $m41 = StringSplit($41[$x], @LF)

        $start = 1


    ;If $x > 100 Then $start = 100
    ;If $x > 300 Then $start = 300
    ;If $x > 500 Then $start = 500
    ;If $x > 700 Then $start = 700
    ;If $x > 900 Then $start = 900


    ;;;;;;;;;For loop for elements of slave array
        For $y = $start to UBound($50) - 1

    ;;;;;;;;;Split the slave array into subelements by generating a new array for each element  
        $d50 = StringSplit($50[$y], @LF)


        ;;;;;;;;;If the first element in the sub-array of each element of the master file 
        ;;;;;;;;;is equal to the first elelement in the sub-array of the the element in the slave 
        ;;;;;;;;;file then check to make sure that the same number of elements exist, if so, then 
        ;;;;;;;;;the element is recorded.

            If $m41[1] = $d50[1] Then
                If UBound($d50) = UBound($m41) Then
                    WinActivate("new - Notepad","")
                    ClipPut($41[$x])
                    Send("^v")
                    Send("{ENTER}")
                    ClipPut("----------------------------------------------------------------")
                    Send("^v")
                    Send("{ENTER}")
                    ExitLoop

                Else
                EndIf
            EndIf

    ;;;;;;;;;If it reaches the end of the array and hasn't found a match then record the element as not matching
        If $y > UBound($d50) - 3 Then
            WinActivate("nomatch - Notepad","")
            ClipPut($41[$x])
            Send("^v")
            Send("{ENTER}")
            ClipPut("----------------------------------------------------------------")
            Send("^v")
            Send("{ENTER}")
            ExitLoop
        Else
        EndIf
        Next
    Next

      MsgBox(0,"done","Finished!")
  

    Func MyExit()
        Exit
    EndFunc

master directory list:

./09031-01:
total 6
drwxrwxrwx 3 test512 Nov3Language/
drwxrwxrwx 3 test512 Nov3MotAttDC/
drwxrwxrwx 3 test512 Nov3languageDC/

./09031-01/Language:
total 2
drwxrwxrwx 5 test512 Nov311%03%05@/

./09031-01/Language/11%03%05@:
total 6
drwxrwxrwx 2 test512 Nov32/
drwxrwxrwx 2 test512 Nov33/
drwxrwxrwx 2 test512 Nov34/

./09031-01/Language/11%03%05@/2:
total 62926
-rw-rw-rw- 1 test223048 Nov3config
-rw-rw-rw- 1 test 27847456 Nov3e,rfhp0.1Hz
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 88744 Nov3hs_file

./09031-01/Language/11%03%05@/3:
total 62926
-rw-rw-rw- 1 test223048 Nov3config
-rw-rw-rw- 1 test 27847456 Nov3e,rfhp0.1Hz
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 88744 Nov3hs_file

./09031-01/Language/11%03%05@/4:
total 60750
-rw-rw-rw- 1 test223048 Nov3config
-rw-rw-rw- 1 test 26734736 Nov3e,rfhp0.1Hz
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 88744 Nov3hs_file

./09031-01/MotAttDC:
total 2
drwxrwxrwx 3 test512 Nov311%03%05@/

./09031-01/MotAttDC/11%03%05@:
total 2
drwxrwxrwx 2 test512 Nov31/

./09031-01/MotAttDC/11%03%05@/1:
total 437870
-rw-rw-rw- 1 test 219724856 Nov3c,rfDC
-rw-rw-rw- 1 test223048 Nov3config
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 88744 Nov3hs_file

./09031-01/languageDC:
total 2
drwxrwxrwx 3 test512 Nov311%03%05@/

./09031-01/languageDC/11%03%05@:
total 2
drwxrwxrwx 2 test512 Nov31/

./09031-01/languageDC/11%03%05@/1:
total 116094
-rw-rw-rw- 1 test 55051256 Nov3c,rfDC
-rw-rw-rw- 1 test223048 Nov3config
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test 1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 88744 Nov3hs_file

./2002b:
total 14
drwxrwxrwx 3 test512 Jun 122002 10min/
drwxrwxrwx 3 test512 Jun 122002 10minC/
drwxrwxrwx 5 test512 Jun 122002 Language/
drwxrwxrwx 5 test512 Aug 302002 SEP/
drwxrwxrwx 3 test512 Aug 302002 SEP1000/
drwxrwxrwx 3 test512 Aug 302002 SEPMotor/
drwxrwxrwx 3 test512 Aug 302002 SEPlongISI/

./2002b/10min:
total 2
drwxrwxrwx 2 test512 Jun 122002 06%12%02@/

./2002b/10min/06%12%02@:
total 0

./2002b/10minC:
total 2
drwxrwxrwx 2 test512 Jun 122002 06%12%02@/

./2002b/10minC/06%12%02@:
total 0

./2002b/Language:
total 6
drwxrwxrwx 3 test512 Jun 112002 06%11%02@/
drwxrwxrwx 2 test512 Jun 122002 06%11%02@/
drwxrwxrwx 2 test512 Jun 122002 06%12%02@/

./2002b/Language/06%11%02@:
total 2
drwxrwxrwx 2 test512 Jun 252002 1/

./2002b/Language/06%11%02@/1:
total 0

./2002b/Language/06%11%02@:
total 0

./2002b/Language/06%12%02@:
total 0

./2002b/SEP:
total 6
drwxrwxrwx 2 test512 Jun 132002 06%13%02@/
drwxrwxrwx 2 test512 Jun 242002 06%13%02@/
drwxrwxrwx 3 test512 Aug 302002 08%30%02@/

./2002b/SEP/06%13%02@:
total 0

./2002b/SEP/06%13%02@:
total 0

./2002b/SEP/08%30%02@:

-----------------------------------------

Slave directory list:

./09031-01:
total 6
drwxrwxrwx 3 test512 Nov3Language/
drwxrwxrwx 3 test512 Nov3languageDC/
drwxrwxrwx 3 test512 Nov3MotAttDC/

./09031-01/Language:
total 2
drwxrwxrwx 5 test512 Nov311%03%05@/

./09031-01/Language/11%03%05@:
total 6
drwxrwxrwx 2 test512 Nov32/
drwxrwxrwx 2 test512 Nov33/
drwxrwxrwx 2 test512 Nov34/

./09031-01/Language/11%03%05@/2:
total 62926
-rw-rw-rw- 1 test 223048 Nov3config
-rw-rw-rw- 1 test27847456 Nov3e,rfhp0.1Hz
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test88744 Nov3hs_file

./09031-01/Language/11%03%05@/3:
total 62926
-rw-rw-rw- 1 test 223048 Nov3config
-rw-rw-rw- 1 test27847456 Nov3e,rfhp0.1Hz
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test88744 Nov3hs_file

./09031-01/Language/11%03%05@/4:
total 60750
-rw-rw-rw- 1 test 223048 Nov3config
-rw-rw-rw- 1 test26734736 Nov3e,rfhp0.1Hz
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test88744 Nov3hs_file

./09031-01/languageDC:
total 2
drwxrwxrwx 3 test512 Nov311%03%05@/

./09031-01/languageDC/11%03%05@:
total 2
drwxrwxrwx 2 test512 Nov31/

./09031-01/languageDC/11%03%05@/1:
total 116094
-rw-rw-rw- 1 test55051256 Nov3c,rfDC
-rw-rw-rw- 1 test 223048 Nov3config
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test88744 Nov3hs_file

./09031-01/MotAttDC:
total 2
drwxrwxrwx 3 test512 Nov311%03%05@/

./09031-01/MotAttDC/11%03%05@:
total 2
drwxrwxrwx 2 test512 Nov31/

./09031-01/MotAttDC/11%03%05@/1:
total 437870
-rw-rw-rw- 1 test219724856 Nov3c,rfDC
-rw-rw-rw- 1 test 223048 Nov3config
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Nov3e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test88744 Nov3hs_file

./09038-01:
total 4
drwxrwxrwx 3 test512 Oct 26Language/
drwxrwxrwx 3 test512 Oct 26MotAttDC/

./09038-01/Language:
total 2
drwxrwxrwx 3 test512 Oct 2610%26%05@/

./09038-01/Language/10%26%05@:
total 2
drwxrwxrwx 2 test512 Oct 261/

./09038-01/Language/10%26%05@/1:
total 73856
-rw-rw-rw- 1 test 223048 Oct 26config
-rw-rw-rw- 1 test33411056 Oct 26e,rfhp0.1Hz
-rw-rw-rw- 1 test1997600 Oct 26e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Oct 26e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 107056 Oct 26hs_file

./09038-01/MotAttDC:
total 2
drwxrwxrwx 3 test512 Oct 2610%26%05@/

./09038-01/MotAttDC/10%26%05@:
total 2
drwxrwxrwx 2 test512 Oct 261/

./09038-01/MotAttDC/10%26%05@/1:
total 688128
-rw-rw-rw- 1 test347761016 Oct 26c,rfDC
-rw-rw-rw- 1 test 223048 Oct 26config
-rw-rw-rw- 1 test1997600 Oct 26e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Oct 26e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 107056 Oct 26hs_file

./09042-01:
total 4
drwxrwxrwx 3 test512 Jul 21Language/
drwxrwxrwx 3 test512 Jul 21Motor/

./09042-01/Language:
total 2
drwxrwxrwx 4 test512 Jul 2107%21%05@/

./09042-01/Language/07%21%05@:
total 4
drwxrwxrwx 2 test512 Jul 212/
drwxrwxrwx 2 test512 Jul 213/

./09042-01/Language/07%21%05@/2:
total 60976
-rw-rw-rw- 1 test 223048 Jul 21config
-rw-rw-rw- 1 test26734736 Jul 21e,rfhp0.1Hz
-rw-rw-rw- 1 test1997600 Jul 21e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Jul 21e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 189352 Jul 21hs_file

./09042-01/Language/07%21%05@/3:
total 60976
-rw-rw-rw- 1 test 223048 Jul 21config
-rw-rw-rw- 1 test26734736 Jul 21e,rfhp0.1Hz
-rw-rw-rw- 1 test1997600 Jul 21e,rfhp1.0Hz,COH
-rw-rw-rw- 1 test1997600 Jul 21e,rfhp1.0Hz,COH1
-rw-rw-rw- 1 test 189352 Jul 21hs_file

Share this post


Link to post
Share on other sites



i wrote this real fast

it works in seconds

and it works

#include <GUIConstants.au3>

$loc_master = "C:\temp\master.txt"
$loc_slave = "C:\temp\slave.txt"
$loc_differ = "C:\temp\differ.txt"

fileopen($loc_master, 0)
fileopen($loc_slave, 0)
fileopen($loc_differ, 1)

$m_length = FileGetSize($loc_master)
$s_length = FileGetSize($loc_slave)
FileWriteLine($loc_differ, "File size difference")
FileWriteLine($loc_differ, $m_length & " - master")
FileWriteLine($loc_differ, $s_length & " - slave")
FileWriteLine($loc_differ, "  ")

for $x = 1 to 25000
    
    $m_read = FileReadLine($loc_master, $x)
    if @error Then Exit
    $s_read = FileReadLine($loc_slave, $x)
    if @error Then Exit
    
    If $m_read <> $s_read Then
        FileWriteLine($loc_differ, " Difference found at line number " & $x)
        FileWriteLine($loc_differ, $m_read)
        FileWriteLine($loc_differ, $s_read)
        FileWriteLine($loc_differ, "  ")
    EndIf
Next

hope you like it

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

i wrote this real fast

it works in seconds

and it works

hope you like it

8)

Thank you for your reply. I'm afraid that the script that you created, although much cleaner than my own doesn't accomplish the same thing. The two files that I am comparing are of varying lengths, one is roughly 6 times larger than the other and the small script you provided doesn't account for that as it compares two text files with the same number of lines. Also, I need the "lines" to be split into groups using ./ as the delimeter as those groups are what I'm actually comparing and not each individual line. I really do appreaciate your help although I think that the script is much more complex than a simple text-file-comparison program.

Share this post


Link to post
Share on other sites

Hi,

I'm interpreting what you refer to as "runs" or "groups" as meaning folders or directories. If that is correct, why not take a DIR /b /s *.* > filelist.txt on each drive of interest on each machine. Having done that merge all filelist.txt files so that you have one big filelist for each machine.

Next compare each entry of each list to the other list to see which filepaths are missing on either machine. Note: for all comparisons you'd need to note, but ignore drive letters, unless you plan to make the computers exact mirror images of each other. Then, for all the filespaths that are on both machines compare the file/directory attributes from each machine for each matching filepath. Obviously while this processing takes place, you would write a log file of all the mismatchs and the type of mismatch. You could then use the content of the log file as the basis of an XCOPY script to make the actual file lists match.

This is not elegant, but it is much less complex, it addresses the content of each folder and each instance where same named filepaths have different content based on attributes, and probably faster too.

I did forget above to mention that each filepath that mismatched on attributes should be subjected to a file compare operation too. Someone would have to make a knowledge based judgement call for each of them. There is also a small but distinct possibility that a filepath could match on name and attributes and still have differing content.

I may have overlooked something, but if my stated assumption is correct, this with maybe some tweaking would get it done.

B)


[font="Verdana"]Thanks for the response.Gene[/font]Yes, I know the punctuation is not right...

Share this post


Link to post
Share on other sites

I may have overlooked something, but if my stated assumption is correct, this with maybe some tweaking would get it done.

B)

That seems logical, however, I'm dealing with Sun computers running an old and very outdated version of Unix which prevents me from listing the files any other way than the way it chooses to when I created the directory tree files on both boxes.

The master list consists of 2500 elements, this is the only list that I care about, the other list contains roughly 12,500 elements and all I need to do is check to make sure that every single element on the 2500 list is on the list with 12,500 elements, so its really a one-way check. I need two lists of what is and what isnt on the 12,500 list....hmmm, now that I think of it, it would seem logical to call the larger list the master list and the smaller the slave list even though I'm only concerned with the list of 2500. For the files and directories within each element (or group or folder or whatever you call it) it doesn't matter what their specific file names are since I'm just looking to make sure that the size of that array is correct....I might be able to create a new 2 dimensional array with the title of every directory listed in one dimension and the corresponding number of elements within that directory in the other dimension although aside from organizing it a bit better, I don't really see it speeding up the process.

I was thinking about eliminating the notepad and windows activity but I don't think that will shave that much time off of the process in the end and this way I can see what its doing.....I know there is some great software for excel called Synkronizer which checks excel spreadsheets with a huge number of cells and checks for discrepencies....I could convert each directory into a name with the number of elements at the end of it to make it unique and then compare all 2500 to the other 12,500 in a matter of seconds with that software......however I would have to write a program to go back through both of the orginal files to track down the element titles within each directory and list as they were listed originally (before I converted them to a number).......is this my best option??

Share this post


Link to post
Share on other sites

That seems logical, however, I'm dealing with Sun computers running an old and very outdated version of Unix which prevents me from listing the files any other way than the way it chooses to when I created the directory tree files on both boxes.

I could convert each directory into a name with the number of elements at the end of it to make it unique and then compare all 2500 to the other 12,500 in a matter of seconds with that software......however I would have to write a program to go back through both of the orginal files to track down the element titles within each directory and list as they were listed originally (before I converted them to a number).......is this my best option??

I don't know squat about Sun stations or Unix, but if your last suggestion will simplifiy your process, it may be quicker and it should reduce your opportunities for error. B)


[font="Verdana"]Thanks for the response.Gene[/font]Yes, I know the punctuation is not right...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0