Jump to content

How to compare text 2 files with large size ?


Nubie
 Share

Recommended Posts

Nubie,

This may be of some use. It returns all lines in file2 that do not exist in file1.

It does files of 10,000 lines in appx. 8 sec. (running on an ancient, memory constrained PC).

#include <array.au3>

local $fl_name1 = @scriptdir & 'f11.txt'    ; test file #1
local $fl_name2 = @scriptdir & 'f12.txt'    ; test file #2
local $hlf, $ix = 10000                     ; number of lines for test files

; generate two test files (format = 'random uppercase alpha' and '=' and 'random number between 0 and 9' and '0'
; file #1 does not contain any values starting with 'z' therefore resulting string will only have value starting with 'z'

for $i = 1 to 2
    consolewrite(eval('fl_name' & $i) & @lf)
    $hfl = fileopen(eval('fl_name' & $i),2)
    if $hfl = -1 then msgbox(0,'ERROR','Error opening file = ' & eval('fl_name' & $i))
    for $k = 0 to $ix
        if $i = 1 then
            filewrite($hfl,chr(random(65,89,1)) & '=' & chr(random(48,57,1)) & '0' & @CRLF)
        Else
            filewrite($hfl,chr(random(65,90,1)) & '=' & chr(random(48,57,1)) & '0' & @CRLF)
        endif
    Next
    fileclose($hfl)
    $hfl = 0
next

local $st = timerinit()
local $sDiff = _FindUniqueInFile2(fileread($fl_name1),fileread($fl_name2))
consolewrite('time to run func = ' & round(timerdiff($st)/1000,2) & @lf)

$aDiff = stringsplit($sDiff,'|',2)
_arraydisplay($aDiff)

func _FindUniqueInFile2($str1,$str2)

    local $a10 = stringsplit($str2,@crlf,3), $out_str
    for $i = 0 to ubound($a10) - 1
        if not stringinstr($str1,$a10[$i]) then $out_str &= $a10[$i] & '|'
    Next
    $out_str = stringtrimright($out_str,1)
    return $out_str

endfunc

Good Luck,

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Nubie,

This one uses dynamically created variables (assign/eval) and does 2 files of 100000 in under 5 secs (again on a row boat PC).

#include <array.au3>

local $fl_name1 = @scriptdir & 'f11.txt'    ; test file #1
local $fl_name2 = @scriptdir & 'f12.txt'    ; test file #2
local $hlf, $ix = 100000            ; number of lines for test files

; generate two test files (format = 'random uppercase alpha' and '=' and 'random number between 0 and 9' and '0'
; file #1 does not contain any values starting with 'z' therefore resulting string will only have value starting with 'z'

for $i = 1 to 2
    consolewrite(eval('fl_name' & $i) & @lf)
    $hfl = fileopen(eval('fl_name' & $i),2)
    if $hfl = -1 then msgbox(0,'ERROR','Error opening file = ' & eval('fl_name' & $i))
    for $k = 0 to $ix
        if $i = 1 then
            filewrite($hfl,chr(random(65,89,1)) & '=' & chr(random(48,57,1)) & '0' & @CRLF)
        Else
            filewrite($hfl,chr(random(65,90,1)) & '=' & chr(random(48,57,1)) & '0' & @CRLF)
        endif
    Next
    fileclose($hfl)
    $hfl = 0
next

local $st = timerinit()
local $sDiff = _FindUniqueInFile2(fileread($fl_name1),fileread($fl_name2))
consolewrite('time to run func = ' & round(timerdiff($st)/1000,2) & @lf)

$aDiff = stringsplit($sDiff,'|',2)
_arraydisplay($aDiff,ubound($aDiff))

func _FindUniqueInFile2($str1,$str2)

    local $afile1 = stringsplit($str1,@crlf,3)
    local $afile2 = stringsplit($str2,@crlf,3)
    local $out_str

    for $i = 0 to ubound($afile1) - 1
        assign('s' & $afile1[$i],1)
    Next
    for $i = 0 to ubound($afile2) - 1
        if isdeclared('s' & $afile2[$i]) then
        else
            $out_str &= stringleft($afile2[$i],1) & '=' & stringright($afile2[$i],2) & '|'
        endif
    Next
    $out_str = stringtrimright($out_str,1)
    return $out_str

endfunc

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

What do you think should be here?

No no, you didn't understand I mean. I want File2 is the main, don't care strings in File1 have alive or not. With your example, result to File3 I want will nothing, don't need A=10 in File1 because in File2 don't have any A=?

Nubie

Try or

No thanks! I need learn something for make my project too. A project with my questions for Test (for teachers testing to students), then players can Answers anything. Then I need compare them with my Answers for correct result. You must understand players can write anything or not, then I must know they have correct or wrong with my Answers

Is it impossible ?

Nubie,

This may be of some use. It returns all lines in file2 that do not exist in file1.

It does files of 10,000 lines in appx. 8 sec. (running on an ancient, memory constrained PC).

Good Luck,

kylomas

Thanks but it's not what I want :( Edited by Nubie
Link to comment
Share on other sites

Nubie,

You've been shown several techniques for this, time to adapt one for yourself. We also seem to be shooting at a moving target. The code that I offered was for a very specific input (as defined by you) and a very specific output (again, as defined by you). It is meant as an example of how you might approach your task.

Give one of these techniques a try, if you run into trouble post again. If you do post again please define all possible inputs and expected outcomes.

Good Luck,

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Oh sorry my wrong...

Fisrt:

I want a way can check files on clients in my project. Clients can have junk files,virus,etc... I don't care that. Then I will send a list files from server to clients for they read, check all files I want in clients and compare with my list. If they don't have files or check result is not correct with in my list, I'll give they download files needed to their clients

I'm stucking at compare results, don't know what files they'll needed. I have try alot way here but all not like I want

Second:

In my questions/answers project, I'm stucking like if they answers "1+1=blah blah" or "how many days in a years=don't know"

Sorry I'm too bad :(

Edited by Nubie
Link to comment
Share on other sites

Nubie,

You have two lists (files). There are 5 possible outcomes:

elements in list1 that are not in list2

elements in list2 that are not in list1

elements that are common to both lists

elements that are not common to both lists

all elements

I have no idea what you want to do with the questions/answers project. If you need to compare all elements on the right side to all elements on the left side then the answer is the same as above. Just split the expression and treat them as list1 and list2.

There is enough info here to do any of the above.

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...