SpookMeister Posted January 21, 2009 Share Posted January 21, 2009 (edited) I'm occasionally dealing with some seriously large (900+ MB) text files. So far I have been using FileOpen and FileReadLine to first get a count of the lines just so I can give some kind of progress on where in the file the "work portion" of my script is at. Just looping through the file and incrementing a counter in this method can take several minutes, and this is before I even start my search/data gather/whatever. Has someone worked out a faster method to get a line count? Edited January 21, 2009 by SpookMeister [u]Helpful tips:[/u]If you want better answers to your questions, take the time to reproduce your issue in a small "stand alone" example script whenever possible. Also, make sure you tell us 1) what you tried, 2) what you expected to happen, and 3) what happened instead.[u]Useful links:[/u]BrettF's update to LxP's "How to AutoIt" pdfValuater's Autoit 1-2-3 Download page for the latest versions of Autoit and SciTE[quote]<glyph> For example - if you came in here asking "how do I use a jackhammer" we might ask "why do you need to use a jackhammer"<glyph> If the answer to the latter question is "to knock my grandmother's head off to let out the evil spirits that gave her cancer", then maybe the problem is actually unrelated to jackhammers[/quote] Link to comment Share on other sites More sharing options...
Paulie Posted January 21, 2009 Share Posted January 21, 2009 _FileCountLines() Link to comment Share on other sites More sharing options...
KaFu Posted January 21, 2009 Share Posted January 21, 2009 Guess that won't work/make it faster for +900MB files as the UDF contains this row $sFileContent = StringStripWS(FileRead($hFile), 2) Hmmm, and I fear there is no really superior method to count linebreaks, as they are just character occurences of @lf or chr(10)... Cheers OS: Win10-22H2 - 64bit - German, AutoIt Version: 3.3.16.1, AutoIt Editor: SciTE, Website: https://funk.eu AMT - Auto-Movie-Thumbnailer (2022-Nov-26) BIC - Batch-Image-Cropper (2023-Apr-01) COP - Color Picker (2009-May-21) DCS - Dynamic Cursor Selector (2024-Feb-16) HMW - Hide my Windows (2018-Sep-16) HRC - HotKey Resolution Changer (2012-May-16) ICU - Icon Configuration Utility (2018-Sep-16) SMF - Search my Files (2023-Jun-03) - THE file info and duplicates search tool SSD - Set Sound Device (2017-Sep-16) Link to comment Share on other sites More sharing options...
SpookMeister Posted January 21, 2009 Author Share Posted January 21, 2009 (edited) I did a little creative thinking and came up with a workable solution for my needs. Because I am only needing the number of lines for progress information, and in my case this does not need to be exact, I grabbed a sample of the file and averaged out the number of characters per line in the sample. Then, by multiplying the current line by the average I am able to gauge where I am in the the file by comparing that to the file size. Here is an example of searching a large file for a string of text. expandcollapse popup#include <array.au3> #include <string.au3> HotKeySet("{ESC}", "Terminate") Dim $a_Results[1] $a_Results[0] = 0 ; Select file $Path = FileOpenDialog("Select the file to search", @WorkingDir & "\", "All Files (*.*)") If @error Then MsgBox(0, "Error", "Failed to locate file") Exit EndIf ; Request search string from the user $s_SearchString = InputBox("Search String", "Enter the string that you want to search for:") If @error Then MsgBox(0, "Error", "No search string entered") Exit EndIf ; Find the Average number of Characters (Bytes) Per Line from a sample of the file $i_FileSize = FileGetSize($Path) $h_file = FileOpen($Path, 0) If $h_file = -1 Then MsgBox(0, "Error", "Unable to open file.") Exit EndIf $i_bytes = 0 For $i = 1 To 50000 ; the 50k is arbitrary, but I found it took less than half a second to complete $line = FileReadLine($h_file) If @error = -1 Then ExitLoop $i_bytes += StringLen($line) Next FileClose($h_file) $n_ABPL = $i_bytes / $i ; Average Bytes Per Line ; Re-open the file and begin search $h_file = FileOpen($Path, 0) $i_LineCount = 0 $i_sub = 0 While 1 $line = FileReadLine($h_file) If @error = -1 Then ExitLoop $i_LineCount += 1 $i_sub += 1 Select Case StringInStr($line, $s_SearchString) ; if string is found add it to the array _ArrayAdd($a_Results, $line) $a_Results[0] += 1 Case $i_sub >= 5000 ; every 5k lines update the tooltip $n_Estimate = Int($i_LineCount * $n_ABPL) $prog = _StringAddThousandsSep($n_Estimate) & " / " & _StringAddThousandsSep($i_FileSize) $msg = "Searching: " & @LF & _ $Path & @LF & @LF & _ "For the string:" & @LF & _ $s_SearchString & @LF & @LF & _ "Estimated Progress: " & $prog ToolTip($msg) $i_sub = 0 EndSelect WEnd ToolTip("") _ArrayDisplay($a_Results) Func Terminate() MsgBox(0, "Abort", "Search aborted by user") Exit EndFunc ;==>Terminate Edited January 21, 2009 by SpookMeister [u]Helpful tips:[/u]If you want better answers to your questions, take the time to reproduce your issue in a small "stand alone" example script whenever possible. Also, make sure you tell us 1) what you tried, 2) what you expected to happen, and 3) what happened instead.[u]Useful links:[/u]BrettF's update to LxP's "How to AutoIt" pdfValuater's Autoit 1-2-3 Download page for the latest versions of Autoit and SciTE[quote]<glyph> For example - if you came in here asking "how do I use a jackhammer" we might ask "why do you need to use a jackhammer"<glyph> If the answer to the latter question is "to knock my grandmother's head off to let out the evil spirits that gave her cancer", then maybe the problem is actually unrelated to jackhammers[/quote] Link to comment Share on other sites More sharing options...
wiela Posted January 4, 2010 Share Posted January 4, 2010 (edited) I did a little creative thinking and came up with a workable solution for my needs. <...> Well, in that/such cases (huge text files; various bulk text/data processing) you should/could consider using special utilities, such as word count (wc) utility and the others from gnuwin32 coreutils (i don't think, that portability is the issue; AutoIt is not efficient (if suitable at all) for such cases either...) Local $starttime = _Timer_Init() RunWait(@ComSpec & " /c wc -l C:\tmp\test.txt | cut -d ' ' -f 1 > C:\tmp\line_cnt.txt") Local $fh = FileOpen("C:\tmp\line_cnt.txt", 0) ConsoleWrite(@CRLF & "File size (MB): " & FileGetSize("C:\tmp\test.txt")/1048576) ConsoleWrite(@CRLF & "Line count: " & FileReadLine($fh)) ConsoleWrite(@CRLF & "Count took (ms): " & _Timer_Diff($starttime) & @CRLF) Results: File size (MB): 888.109111785889 Line count: 71214500 Count took (ms): 107967.818535596 Note line count and the fact that test had run on the old laptop with a slow disk... Overall speed increase should/could be up to 100x if the number of lines is around a few millions and up to 10x in the "extreme" cases like in this test. Edited January 4, 2010 by wiela Link to comment Share on other sites More sharing options...
GEOSoft Posted January 4, 2010 Share Posted January 4, 2010 I did a little creative thinking and came up with a workable solution for my needs. Because I am only needing the number of lines for progress information, and in my case this does not need to be exact, I grabbed a sample of the file and averaged out the number of characters per line in the sample. Then, by multiplying the current line by the average I am able to gauge where I am in the the file by comparing that to the file size. Here is an example of searching a large file for a string of text. expandcollapse popup#include <array.au3> #include <string.au3> HotKeySet("{ESC}", "Terminate") Dim $a_Results[1] $a_Results[0] = 0 ; Select file $Path = FileOpenDialog("Select the file to search", @WorkingDir & "\", "All Files (*.*)") If @error Then MsgBox(0, "Error", "Failed to locate file") Exit EndIf ; Request search string from the user $s_SearchString = InputBox("Search String", "Enter the string that you want to search for:") If @error Then MsgBox(0, "Error", "No search string entered") Exit EndIf ; Find the Average number of Characters (Bytes) Per Line from a sample of the file $i_FileSize = FileGetSize($Path) $h_file = FileOpen($Path, 0) If $h_file = -1 Then MsgBox(0, "Error", "Unable to open file.") Exit EndIf $i_bytes = 0 For $i = 1 To 50000 ; the 50k is arbitrary, but I found it took less than half a second to complete $line = FileReadLine($h_file) If @error = -1 Then ExitLoop $i_bytes += StringLen($line) Next FileClose($h_file) $n_ABPL = $i_bytes / $i ; Average Bytes Per Line ; Re-open the file and begin search $h_file = FileOpen($Path, 0) $i_LineCount = 0 $i_sub = 0 While 1 $line = FileReadLine($h_file) If @error = -1 Then ExitLoop $i_LineCount += 1 $i_sub += 1 Select Case StringInStr($line, $s_SearchString) ; if string is found add it to the array _ArrayAdd($a_Results, $line) $a_Results[0] += 1 Case $i_sub >= 5000 ; every 5k lines update the tooltip $n_Estimate = Int($i_LineCount * $n_ABPL) $prog = _StringAddThousandsSep($n_Estimate) & " / " & _StringAddThousandsSep($i_FileSize) $msg = "Searching: " & @LF & _ $Path & @LF & @LF & _ "For the string:" & @LF & _ $s_SearchString & @LF & @LF & _ "Estimated Progress: " & $prog ToolTip($msg) $i_sub = 0 EndSelect WEnd ToolTip("") _ArrayDisplay($a_Results) Func Terminate() MsgBox(0, "Abort", "Search aborted by user") Exit EndFunc ;==>Terminate That could have been done much easier using a Regular Expression. Just writing it on the fly without testing even the SRE #include<array.au3> ;; For _ArrayDisplay() only $Path = FileOpenDialog("Select the file to search", @WorkingDir & "\", "All Files (*.*)") If @error Then MsgBox(0, "Error", "Failed to locate file") Exit EndIf $s_SearchString = InputBox("Search String", "Enter the string that you want to search for:") If @error Then MsgBox(0, "Error", "No search string entered") Exit EndIf $aFound = _BuildArray($s_SearchString) If NOT @Error Then _ArrayDisplay($aFound) Else MsgBox(0, "Ooops", "Something is amiss") EndIf Func _BuildArray($sFind, $icase = 0);; If not 0 then match is case-sensitive $sCase = "(?i)" If $iCase Then $sCase = "" $aResults = StringRegExp(FileRead($Path), $sCase & "(?m:^).*" & $sFind & ".*(?:\v|$)+", 3) If @Error Then Return SetError(1) Return $aResults EndFunc George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now