Tankbuster Posted July 27, 2009 Share Posted July 27, 2009 (edited) I tried FS but I was not able to find a matching hint for my problem. Let me first describe my problem: - A very large file (let say:900MB) ascii/text with a line pattern like: X1:11111111:0100 X1:22222222:0200 X1:33333333:0300 X2:11111111:0200 This is a simplified version of my file, but I guess you are able to see the format. The first key is a sub key to the second field (example:Bindernumber) 1-9 The mayor key is the second field, a reference number (example Personal number) My problem: I would like to read from that file all lines that matches field two something like this Example Code call: $arrayMatchingLines=getFileContent("11111111") and it should return: [0]: X1:11111111:0100 [1]: X2:11111111:0200 As the file could be very large, I doubt that reading line by line with FileReadLine is a good idea, also reading the file with FileRead looks not promising. Of course I tried it and it worked but as I read the complete file memory performance was not that good. So is there a wonderful UDF out there, that is able to read in a fast and small way a file? A RegExpReadFileContent UDF? For a different way of understanding: in UNIX I would do something like this (do not correct my syntax. I use this just for describing) Example Unix process: grep '11111111' file_a > tmpfile foreach line ( tmpfile ) do_some_thing_wonderfull_with $line end Maybe this makes it clear. I tried FileReadLine and FileRead. But maybe I just missed a simple step to have a small and fast RegExpReadFile...... Thank you very much. (I hope I described my problem clear enough...and if I missed FS just slap me for that) Edited July 27, 2009 by Tankbuster Link to comment Share on other sites More sharing options...
Rarst Posted July 27, 2009 Share Posted July 27, 2009 I assume from format example all lines have fixed length? You can try to get handle with FileOpen, then read line by line with FileRead, using handle and count of characters equal to one line. AutoIt tag at Rarst.net Link to comment Share on other sites More sharing options...
Authenticity Posted July 27, 2009 Share Posted July 27, 2009 expandcollapse popupProcessSetPriority(@AutoItPID, 3) Global Const $sFile = @ScriptDir & '\test.txt' ; 32MB Global const $iBuffSize = 0x100000*32 Global $hFile, $iSize, $iRead, $sText, $sTemp Global $avArray[1] = [0], $aMatch $iSize = FileGetSize($sFile) $iRead = 0 $sTemp = '' $sText = '' $hFile = FileOpen($sFile, 0) If $hFile <> -1 Then While $iRead < $iSize $sText = $sTemp & FileRead($hFile, $iBuffSize) If StringRight($sText, 2) <> @CRLF Then Local $iLen = StringLen($sText) Local $iPos = StringInStr($sText, @CRLF, 0, -1) $sTemp = StringRight($sText, $iLen - $iPos - 1) $sText = StringTrimRight($sTemp, $iLen - $iPos) Else $sTemp = '' EndIf $aMatch = StringRegExp($sText, '(?m)^[^:]++:(1+):', 3) If IsArray($aMatch) Then Local $iUpperBound = UBound($aMatch) ReDim $avArray[$avArray[0]+$iUpperBound+1] For $i = 1 To $iUpperBound-1 $avArray[0] += 1 $avArray[$avArray[0]] = $aMatch[$i] Next EndIf $iRead += $iBuffSize WEnd FileClose($hFile) ReDim $avArray[$avArray[0]+1] EndIf Link to comment Share on other sites More sharing options...
GEOSoft Posted July 27, 2009 Share Posted July 27, 2009 I'm not sure if this will be faster on a large file or not but you can give it a try. C:\Test.txt was just your example. ; #include<array.au3> ;; For _ArrayDisplay() only $sFileIn = "C:\Test.txt" $sFileOut = @ScriptDir & "\results.txt" $sFind = "11111111" $hRun = RunWait(@Comspec & " /c Findstr.exe " & '"' & $sFind & '" ' & '"' & _ $sFileIn & '"' & ' > "' & $sFileOut & '"', @ScriptDir, @SW_Hide) $sHold = FileRead($sFileOut) FileDelete($sFileOut) $aRegExp = StringRegExp($sHold, "(?m:^)(.*\d)", 3) If NOT @Error Then _ArrayDisplay($aRegExp, "Results") EndIf ; I couldn't get StdOutRead() displaying the proper results with a Run() command so I did it this way (creating the file) instead. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Tankbuster Posted July 27, 2009 Author Share Posted July 27, 2009 (edited) I assume from format example all lines have fixed length? You can try to get handle with FileOpen, then read line by line with FileRead, using handle and count of characters equal to one line. Thx for the try. But reading line by line will be horror and a waste of time (at least in my example) Yes, you are right filehandle is better than using the filename (i guess the filename opens and closes the file so even worse), but your guess will fit for smaller files (as far as I understood). thx for the helpinghand. [autoit]ProcessSetPriority(@AutoItPID, 3) ... This looks like a good approach. I will try it, by only taking a look to your code, it looks like your are reading a chunk of 32Mb and search for the pattern. Yes, that is maybe less fast for smaller files, but fast for big files (like I need it) it will fit perfectly. It looks a good balance between speed and resource safe. Thank your very much. Maybe I will increase the chunk based on the local system memory. I'm not sure if this will be faster on a large file or not but you can give it a try. C:\Test.txt was just your example. ... I will give this a try too, but actually during my planing I skiped the RUN() command, I thought it will be slow. But hey, you wrote it, so I will test it. Maybe I will post the speed result here when I'm done :-) Thx to all! Edited July 27, 2009 by Tankbuster Link to comment Share on other sites More sharing options...
Tankbuster Posted July 29, 2009 Author Share Posted July 29, 2009 So here is my interims result (based on my project), but it gives you some sort of comparison. I used the two functions with the same file and the same calling functions (so my debug messages are equal to both. I used a very small file for testing and I called it several times (because that is the real usage!) in a loop (but only the calling function name was changed) Search for 13 different values: Authenticity solution: Runtime: 0.322s GEOSoft soltion: 1.193s So when I'm calc it correct Authenticity beats the Run() command by 300%. After this very fast test I tried it with the big file......Authenticity finished after 10 minutes and the RUN() command was aborted after 3 hours. So my guess was correct, that on big files the solution from Authenticity gained more profit out of the chunck reading. Example search for one value in a file When you search only for one value in a big file: Authenticity: 0.240s GEOSoft soltion: 0.297s But thanks for both ways, it was easier for me to compare. Link to comment Share on other sites More sharing options...
GEOSoft Posted July 29, 2009 Share Posted July 29, 2009 So here is my interims result (based on my project), but it gives you some sort of comparison.I used the two functions with the same file and the same calling functions (so my debug messages are equal to both.I used a very small file for testing and I called it several times (because that is the real usage!) in a loop (but only the calling function name was changed)Search for 13 different values:Authenticity solution: Runtime: 0.322sGEOSoft soltion: 1.193sSo when I'm calc it correct Authenticity beats the Run() command by 300%.After this very fast test I tried it with the big file......Authenticity finished after 10 minutes and the RUN() command was aborted after 3 hours.So my guess was correct, that on big files the solution from Authenticity gained more profit out of the chunck reading.Example search for one value in a fileWhen you search only for one value in a big file:Authenticity: 0.240sGEOSoft soltion: 0.297sBut thanks for both ways, it was easier for me to compare.Good to know. Thanks George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Tankbuster Posted July 29, 2009 Author Share Posted July 29, 2009 (edited) But as I did further tests, it is not a global speed statement. It really depends what you are trying to do. Also in the accepted solution there is room for improvement. (as I use fixed line length, i could improve the read size to fit to my lines, so I do not need to search from right for the next @CRLF) But that of course fits only to my file, in case of not fixed length. By Consolewrite I found that the backward string operation took most of the time. Anyway: I got some ideas when to use the one or the other function. I found another thread, with some usefull stuff that maybe also affect this here: http://www.autoitscript.com/forum/index.php?showtopic=97494 Edited July 29, 2009 by Tankbuster Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now