Sign in to follow this  
Followers 0
Champak

Problem with comparing files

8 posts in this topic

#1 ·  Posted (edited)

Help please, this pretty much works with one exception. Basically, if there is a large sequential number in "FILE A", let's say "123456789", sequential numbers like "123", "456", "234", "789", "2345" and so on from "FILE B", will not be appended over to "FILE A" because it is seeing them within "123456789". Every other number that doesn't carry the same sequential theme is appended fine.

Func Compare_F();///////////////////////////////////////////////////////////////////////////////////////
Local $differences

    $BackupFText = FileRead($BackupF , FileGetSize($BackupF ))
    $Comparing = FileOpen($CompareF, 0)
    $DifferenceF = FileOpen($BackupF, 1)

    While 1
        $sWord = FileReadLine($Comparing)
        If @error Then ExitLoop
        If Not StringInStr($BackupFText, $sWord) Then
            FileWriteLine($DifferenceF, $sWord)
            $differences = $differences + 1
        Else            
            ;
        EndIf
    WEnd

    FileClose($Comparing)
    FileClose($DifferenceF)
    
    If $differences > 0 Then
        MsgBox(48, "Accounts", $differences & " new accounts were appended.")
    Else
        MsgBox(48, "Accounts", "There are no new accounts to append.")  
    EndIf
    
EndFunc

Thanks

Edited by Champak

Share this post


Link to post
Share on other sites



Help please, this pretty much works with one exception. Basically, if there is a large sequential number in "FILE A", let's say "123456789", sequential numbers like "123", "456", "234", "789", "2345" and so on from "FILE B", will not be appended over to "FILE A" because it is seeing them within "123456789". Every other number that doesn't carry the same sequential theme is appended fine.

Func Compare_F();///////////////////////////////////////////////////////////////////////////////////////
Local $differences

    $BackupFText = FileRead($BackupF , FileGetSize($BackupF ))
    $Comparing = FileOpen($CompareF, 0)
    $DifferenceF = FileOpen($BackupF, 1)

    While 1
        $sWord = FileReadLine($Comparing)
        If @error Then ExitLoop
        If Not StringInStr($BackupFText, $sWord) Then
            FileWriteLine($DifferenceF, $sWord)
            $differences = $differences + 1
        Else            
            ;
        EndIf
    WEnd

    FileClose($Comparing)
    FileClose($DifferenceF)
    
    If $differences > 0 Then
        MsgBox(48, "Accounts", $differences & " new accounts were appended.")
    Else
        MsgBox(48, "Accounts", "There are no new accounts to append.")  
    EndIf
    
EndFunc
oÝ÷ Ù8ZK?ªê-y.çi×¥h­z»huçâæÞ)â§uéÝx(}æÖ­iÈnL¨¹ªÞr©j¸§Ú-éâÉnuëay×¥g­z¸§¶¢jZ®+(Z+{¦¦W²Ë¦x2¢èZvÞu«ZÂÝ¥êåwzX¦×o ,V®¶­s`vÆR b33c·5v÷&BÒfÆU&VDÆæRb33c´6ö×&ær bW'&÷"FVâWDÆö÷ bæ÷B7G&ætå7G"b33c´&6·WeFWBÂ5$Äbfײb33c·5v÷&Bfײ5$ÄbFVà fÆUw&FTÆæRb33c´FffW&Væ6TbÂb33c·5v÷&B b33c¶FffW&Væ6W2Òb33c¶FffW&Væ6W2² VÇ6P ° VæD` tVæ

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Thank you.

Share this post


Link to post
Share on other sites

Can someone show me how to make this faster. I put it to the test last night reading a 3MB file comparing it into a 1MB file, and the thing is still going now eight hours later.

Share this post


Link to post
Share on other sites

Can someone show me how to make this faster. I put it to the test last night reading a 3MB file comparing it into a 1MB file, and the thing is still going now eight hours later.

One thing would be to sort/index the data into arrays, so it could all be done in memory speeds to avoid the disk operations.

The StringInStr() test is slow because it has to test against every position in the string, even though you are only interested in whole "words" or lines.

Another reall loss is in the number of compares. You need to do fewer compares by sorting/indexing your list of numbers, or using other feature of the data set to limit how much you have to compare to. What are the features of the data set? How many lines? All integer numbers? How short is the shortest? How long is the longest? Are they decimal equivents of ASCII strings? etc.

For example, if you were testing "12345", and your list was sorted, once you tested against a value higher than that you wouldn't have to bother testing against the rest. If the list was indexed it might be done even quicker. It all depends on the data type.

With better descriptions of the data you'll get more helpfull answers on this.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Sorry, I didn't realise that so many things impacted how fast it sorted.

Well, "FILE A", the one I'm going to compare AGAINST "FILE B" to see if "FILE B" already has the numbers, will be anywhere from 500KB to 1MB, All are numbers only with a minimum of 5 digits and a max of 10 digits, up to 40,000 lines.

"FILE B", the one being compared against and, then appended to, digit type and length is the same, except this one will continue to increase in size for now at least. Currently it is 5MB and over 400,000 lines.

the file on both sides will basically look like this:

79130858

46075058

64525311

46626

21204985

101111195

61128134

13645093

24617639

34854757

6688618

92174011

45234026

82940182

21892994

Sorry but I just can't get arrays. I hope this helps you help me. Thanks.

Edited by Champak

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Sorry, I didn't realise that so many things impacted how fast it sorted.

Well, "FILE A", the one I'm going to compare AGAINST "FILE B" to see if "FILE B" already has the numbers, will be anywhere from 500KB to 1MB, All are numbers only with a minimum of 5 digits and a max of 10 digits, up to 40,000 lines.

"FILE B", the one being compared against and, then appended to, digit type and length is the same, except this one will continue to increase in size for now at least. Currently it is 5MB and over 400,000 lines.

the file on both sides will basically look like this:

79130858

46075058

64525311

46626

21204985

101111195

61128134

13645093

24617639

34854757

6688618

92174011

45234026

82940182

21892994

Sorry but I just can't get arrays. I hope this helps you help me. Thanks.

First things first. This is a big enough job to justify a real database engine instead of just some AutoIT scripting. Doesn't have to be hugely complicated or even cost anything. Some options:

OpenOffice Base

SqlLite (Edit: Found interesting thread about using SqlLite in AutoIT.)

SmallSQL

MySQL).

That said, things can be greatly improved by simply sorting your list file. After the initial processing required to sort it the first time, and provided you insert new numbers in proper sorted locations later so the list stays sorted, that should cut your search time just less than 50%.

The other thing is an index file of your list, but you are already afraid of arrays, that is probably asking too much of you. Maybe use a self-indexing INI file. Use the first two digits as section names and (assuming no leading zeroes) you have 90 sections to hold your numbers in, and only have to search 1/90th of data. But this too will become unwieldy with enough data lines entered, bringing you back to databases that do it for you with behind-the-curtain magic.

Sorting and indexing are both things a real database engine would handle well for you. Your task is not too big to do in AutoIT, but there are easier, better ways using an actual database program, and zero chance of doing it practicaly without using arrays extensively.

Hope that helps... :)

Edited by PsaltyDS

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0