ter-pierre Posted February 14, 2005 Share Posted February 14, 2005 Hi guys. I have posted a topic but i think that i cant show (or express) corectly... I have 2 files. the first file contains ID;USER (4400 lines) the second fiel contains USER;GROUPMEMBER (more than 65000 lines) I need to merge this 2 fiels on a file with ID;USER;GROUPMEMBER i try to read line by line the file 1 and find on the file 2 the field USER, but this job takes too long (more than 40 hours), but is functional. I use the code above: $file2=FileOpen("C:\tmp1-2.txt",0) While 1 $ID_USER=FileReadLine($FILE2) If @error = -1 Then ExitLoop $SPLIT=StringSplit($CAD_USER,",") $ID=$SPLIT[2] $USER=$SPLIT[1] GRUPO($USER) WEnd Exit Func GRUPO($USER) $file1=FileOpen("C:\tmp1-4.txt",0) While 1 $USER_GRP=FileReadLine($FILE1) If @error = -1 Then ExitLoop $SPLIT2=StringSplit($USER_GRP,";") $USER1=$SPLIT2[1] $GROUP=$SPLIT2[2] If $USER=$USER1 Then FileWriteLine("C:\TMP3-1.TXT",$ID&"|"&$USER&"|"&$GROUP) WEnd FileClose($file1) EndFunc Some one have a better idea? Thanks Link to comment Share on other sites More sharing options...
Andre Posted February 14, 2005 Share Posted February 14, 2005 Hi, I did not try this myself but try the _FileReadToArray function??? Andre What about Windows without using AutoIt ?It would be the same as driving a car without an steering Wheel! Link to comment Share on other sites More sharing options...
ter-pierre Posted February 14, 2005 Author Share Posted February 14, 2005 FileReadToArray is a UDF? where can i find? Link to comment Share on other sites More sharing options...
Andre Posted February 14, 2005 Share Posted February 14, 2005 Hi, #include <file.au3> _FileReadToArray($sFilePath, ByRef $aArray) Is included in the last AutoIt version. Andre What about Windows without using AutoIt ?It would be the same as driving a car without an steering Wheel! Link to comment Share on other sites More sharing options...
ter-pierre Posted February 14, 2005 Author Share Posted February 14, 2005 Thanks Andre, but using this funtion takes to long so (or more). i try using this code: $file1=FileOpen("C:\tmp1-2.txt",0) #include "FILE.AU3" Dim $FILE _FileReadToArray("C:\tmp1-4.txt",$FILE) While 1 $USER_ID=FileReadLine($FILE1) If @error = -1 Then ExitLoop $SPLIT2=StringSplit($USER_ID,",") $USER=$SPLIT2[1] For $n = 1 to $FILE[0] $FILE[$n]= StringReplace($FILE[$n],$USER, $USER_ID) FileWrite("c:\tmp6.txt",$FILE[$n]) IF $n < $FILE[0] THEN FileWrite("tempfilename.txt",@lf&@CR) Next If $USER="aabreu" Then ExitLoop WEnd Exit that is the right way to do? have another mode to use the function ? Thanks Link to comment Share on other sites More sharing options...
Andre Posted February 14, 2005 Share Posted February 14, 2005 Hi, As far as i can see this is correctly done. Perhaps someone else on the Forum has experience with such large files. Andre What about Windows without using AutoIt ?It would be the same as driving a car without an steering Wheel! Link to comment Share on other sites More sharing options...
lupusbalo Posted February 14, 2005 Share Posted February 14, 2005 (edited) in fact y'oure "reading" more an more of file 2 for each file1 record which is inefficientI think you're loking for an algorythm!! i'm i right?are your files sorted??if not, the very first clue is TO SORT the 2 files with same "key". Here the master key will be USER: sort both files on that key when (or if it's already) done, what you need then, is a very classical algorythm to "merge" 2 sequential files - you can proceed using fileread or filereadtoarray- file1 will be "master file"- i assumed there is at least one but only one record for each "ID/USER" on file1 (if not you need to review the USER_file1>USER_file2 case)pseudocode for algorythm:open "result file" or create "new result array" initialize EOF flags initialize "other processing" (log file, counters...) access first record of each file/array; either read (or set $ifile1=1 $ifile2=1) while not (end of both files/arrays) Select case USER_file1=USER_file2; "normal case" write correct data (agregated from both files) to fileresult (or array) "other prcessing" ; eg increment counter for USER_file1 record# access new record in file2 (or increment $ifile2) if no more record in file 2 then set "EOF file2" flag case USER_file1<USER_file2 ; no (no more) data to process for USER_file1 some "other processing" ;(eg log number of fileresult records for USER_file1 key) access new record in file1 (or increment $ifile1) if no more record in file 1 then set "EOF file1" flag else initialize "other processing" for new USER key from file1 case USER_file1>USER_file2; should not probably occur if error issue error message and exit or continue (ignore see note 1) write msg to log, USER_File2 key with "no record in file 1" situation access new record in file2 (or increment $ifile2) if no more record in file 2 then set "EOF file2" flag End select Wend write final info (eg counters....) to log etc....EDIT> * note 1: depending on "quality" of your files i suggest that you proceed with next records, this will give you oportunity to browse the 2 files entirely and to find potential "structure errors in them) <EDIThope this helps Edited February 14, 2005 by lupusbalo Link to comment Share on other sites More sharing options...
ter-pierre Posted February 14, 2005 Author Share Posted February 14, 2005 yes. the files are sorted. atthach a file with parts os the 2 files. the first 5 lines are the first five lines of tmp1-2.txt, and the follow lines are the relative portion of tmp1-4.txt thanks lupusbalotmp.txt Link to comment Share on other sites More sharing options...
ter-pierre Posted February 14, 2005 Author Share Posted February 14, 2005 my problem is that my tmp1-2.txt file have just 1 line with each USER, and tmp1-4.txt file have more than 1 line with each USER. Link to comment Share on other sites More sharing options...
lupusbalo Posted February 14, 2005 Share Posted February 14, 2005 (edited) my problem is that my tmp1-2.txt file have just 1 line with each USER, and tmp1-4.txt file have more than 1 line with each USER.<{POST_SNAPBACK}>OK so the algorythm is fine file1 (or array) should be your TMP1-2file2 (or array) should be your TMP1-4let me explain a litle bit more why your algo is so loooooooooooooong in your algorythm for each record (1 then 2 then ...... ) in TMP1-2, you read 1 then 2 then 3 .......... then 59998 then 59999 then 60000 records in fileTMP1-4 just to skip to the right recordwhich means at the End of the day roughly 1800 MILLIONS READ/ARRAY ACCES EDIT> which is JUST the sum of the known SERIE: of first 60000 integers: SIGMA(i, i=1 to 60000) <EDITyou probably understand why it could be long!!!!!!!!!!!!! the algo i gave you reads only once, each record of file2 Edited February 14, 2005 by lupusbalo Link to comment Share on other sites More sharing options...
ter-pierre Posted February 14, 2005 Author Share Posted February 14, 2005 Luposalo, thank you very much, but i dont now about EOF flags, i dont know about array, i dont know about string... realy i have no experience with this kind of codes... if you can help me with this code, when you come i pay the beer... please :"> Link to comment Share on other sites More sharing options...
Developers Jos Posted February 14, 2005 Developers Share Posted February 14, 2005 when you come i pay the beer... please :"><{POST_SNAPBACK}>Just 1 ?? Func _Appendfile($Source, $Target) FileWriteLine($Target, FileRead($Source, FileGetSize($Source))) EndFunc ;==>_Appendfile SciTE4AutoIt3 Full installer Download page  - Beta files    Read before posting   How to post scriptsource   Forum etiquette Forum Rules  Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
normeus Posted February 14, 2005 Share Posted February 14, 2005 (edited) so here is the deal: I figured that since it took 40 hours to run your program. You wouldn't mind doing a little extra work (learning something new). to run your merge in 10 seconds ( for 200,000 records on a 500 mhz computer) download GAWK ( it is a UNIX tool ported to windows a.k.a. AWK) type this program and save it as "match101.awk" MAKE SURE YOU CHANGE THE INPUT LINE FROM "C:/TMP1.TXT" to your file containing "a.ribeiro,1001020558" etc.. MAKE SURE YOU CHANGE THE OUTPUT LINE FROM "C:/TMP3.TXT" (front slash) run the program like this (from a dos box if you installed awk in gnu and your program was saved in c: and the input file is c:/tmp2.txt ) "c:\gnu\gawk -fc:/match101.awk c:/tmp2.txt" # code starts here BEGIN { print "BEGIN TIME.."strftime("%H:%M:%S") # TIME FOR YOUR BENCHMARK #This will load file 1 into memory 20,000 names in ram shouldn't be a big deal while (( getline loadrec < "C:/TMP1.TXT") > 0 ) {recnum++ split(loadrec,temparr,",") #I am using "," to split the record table[ temparr[1]] = temparr[2] } } { numfield = split($0,temparr,";") #using ";" to split the record change as you need it if (temparr[1] in table ) { line= table[temparr[1]]";"temparr[1]";"temparr[2] print line > "C:/TMP3.TXT" } else {badname++ print $0 } } END { print "temp1.txt records " recnum print "temp2.txt records " NR print "names not found " badname+0 print "END TIME......... "strftime("%H:%M:%S")# TIME FOR YOUR BENCHMARK } # code ends here if you are willing to do this (should be about an hour ) you will be doing your match in seconds I LOVE AUTOIT3 BUT THERE ARE TOOLS THAT HANDLE FILES BETTER. edit: used code to make it pretty Edited February 14, 2005 by normeus http://www.autoitscript.com/autoit3/scite/...iTe4AutoIt3.exe Link to comment Share on other sites More sharing options...
ter-pierre Posted February 14, 2005 Author Share Posted February 14, 2005 Thanks you all Andre Lupusbalo JdeB normeus Isolve my problem with your help. above the code that i use (sure not the better, but...) Dim $ARRAY12, $ARRAY14 $n=1 $nn=1 #include <file.au3> _FileReadToArray("C:\tmp1-2.txt", $ARRAY12) _FileReadToArray("C:\tmp1-4.txt", $ARRAY14) $file3=FileOpen("C:\tmp6.txt",2) For $n="1" To $ARRAY12[0] $SPLIT_FILE12=StringSplit($ARRAY12[$n],";") $user_filE12=$split_file12[1] $SPLIT_FILE14=StringSplit($ARRAY14[$nn],";") $user_filE14=$split_file14[1] Select case $user_filE12=$user_filE14 While $nn<=$ARRAY14[0] FileWriteLine($file3,$split_file12[2]&";"&$split_file12[1]&";"&$split_file14[2]&";") $nn=$nn+1 $SPLIT_FILE14=StringSplit($ARRAY14[$nn],";") $user_filE14=$split_file14[1] If $user_filE12<>$user_filE14 Then ExitLoop WEnd case $user_filE12<$user_filE14 While 1 $nn=$nn+1 If $user_filE12<>$user_filE14 Then ExitLoop WEnd case $user_filE12>$user_filE14 MsgBox(0,"test","ERRO!!!") Endselect Next When some you guys come, i pay all beers !!! Link to comment Share on other sites More sharing options...
lupusbalo Posted February 14, 2005 Share Posted February 14, 2005 (edited) merge_stuff.zip@ter-pierreEDIT> I didn't see your last post before "sending" mine!1- congratulations2- the following may help you anyway??3- i forgot the beer!! (should be many!!!!) 4- discard the immediate folowing line, as you seem to actually understand something about programing, i'm sorry <EDITit would be rather difficult, if you don't know anything about programing :"> so, it should be unusual but i'll provide the solution. the final script to merge, a script to generate test files, final result and associated logfile.files were 5000 and 75000 records respectively (approximately YOUR figures)the final merge runs in a bit more than a minute on a P4 2,6Gh!!!! far from your 40 hours actually file generation is 4 times longer, because of a lot "random" calls (could have been made better but.......that's it for now !!)EDIT> I used file processing (vs arrays) because it can be used in case of HUGE files (10 exp 6 or more records) where array processing will probably lack of memory (algorythm are identical - need only change how to get data) <EDITattached - test files generation script - extract of test file file1 (out of 5000 records) - extract of test file file2 (out of 75000 records) - merge script - extract of resulting file - extract of the logfile Edited February 14, 2005 by lupusbalo Link to comment Share on other sites More sharing options...
lupusbalo Posted February 15, 2005 Share Posted February 15, 2005 (edited) not a "common day to day" need, but just for fun: , test merge on big files:start logfile>2005-02-15 00:05:16 : for key:AAADIUSKAU records processed: 122005-02-15 00:05:16 : for key:AAAFUMZOGSQJFZ records processed: 26.................... rest of logfile2005-02-15 00:39:04 : for key:ZZZNYLUSFDUNZ records processed: 292005-02-15 00:39:04 : for key:ZZZOBGOEUOIWVAK records processed: 42005-02-15 00:39:04 : for key:ZZZORBMJCONR records processed: 222005-02-15 00:39:04 : for key:ZZZSBUSKP records processed: 32005-02-15 00:39:05 : C:\Mes documents\Auto_IT scripts\TMP1-2.txt: 149 999 records processed (*) 2005-02-15 00:39:15 : C:\Mes documents\Auto_IT scripts\TMP1-4.txt: 2 249 048 records processed (*) 2005-02-15 00:39:15 : merge duration: 2028 SEC. (*) <End Logfile (*) "real file", only some editing (red, bold, 000's) to improve clarity 2,4 Millions records ~34MIN who says AutoIt is Slow?? Edited February 15, 2005 by lupusbalo Link to comment Share on other sites More sharing options...
SlimShady Posted February 15, 2005 Share Posted February 15, 2005 Every (big) script can be speeded up by coding it efficiently. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now