buymeapc Posted July 30, 2013 Share Posted July 30, 2013 Hi all, So, I have several large text files, upwards of 500MB or more, that I need to read and add a string to the beginning of each line in the file. It's a pretty easy task, but it seems to take a long time with the current way I'm doing it. I would love to use _FileReadToArray(), but I get memory allocation errors. Is there a way to make this quicker? Here's my sample code. Thank you! $sFileName = @ScriptDir & "\File.txt" $sNewFileName = @ScriptDir & "\NewFile.txt" ; If a test file doesn't exist, create one that's 10mb If FileExists($sFileName) = 0 Then Do FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.") $iSize = FileGetSize($sFileName) Until ($iSize / 1048576) > 10 EndIf ; start a timer $iTimer = TimerInit() ; what I want to add to the beginning of each line $sPrePend = "1234" $hFile = FileOpen($sFileName, 0) While 1 $sLine = FileReadLine($hFile) If @error = -1 Then ExitLoop; If eof, exitloop... ; ...otherwise, prepend the line and write it to the new file $sLine = $sPrePend & $sLine FileWriteLine($sNewFileName, $sLine) WEnd MsgBox(0, "Time this took", TimerDiff($iTimer)) FileClose($hFile) Link to comment Share on other sites More sharing options...
DW1 Posted July 30, 2013 Share Posted July 30, 2013 (edited) open the new file for writing instead of passing a filename to FileWriteLine() Currently, each call to FileWriteLine() is doing a FileOpen(), FileWriteLine(), FileClose() which is a lot more expensive than passing an already open for writing, handle. $sFileName = @ScriptDir & "\File.txt" $sNewFileName = @ScriptDir & "\NewFile.txt" ; If a test file doesn't exist, create one that's 10mb If FileExists($sFileName) = 0 Then Do FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.") $iSize = FileGetSize($sFileName) Until ($iSize / 1048576) > 10 EndIf ; start a timer $iTimer = TimerInit() ; what I want to add to the beginning of each line $sPrePend = "1234" $hFile = FileOpen($sFileName, 0) $hNewFile = FileOpen($sNewFileName, 1) While 1 $sLine = FileReadLine($hFile) If @error = -1 Then ExitLoop; If eof, exitloop... ; ...otherwise, prepend the line and write it to the new file $sLine = $sPrePend & $sLine FileWriteLine($hNewFile, $sLine) WEnd MsgBox(0, "Time this took", TimerDiff($iTimer)) FileClose($hFile) FileClose($hNewFile) Edited July 30, 2013 by danwilli AutoIt3 Online Help Link to comment Share on other sites More sharing options...
buymeapc Posted July 30, 2013 Author Share Posted July 30, 2013 OK, I'm impressed. The time went from 45 seconds to create a 10mb file to less than 1 second. Thanks for the help!! Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 iirc, it's even faster to read/write the entire file instead of reading/writing it line by line. will make some tests... [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
Moderators JLogan3o13 Posted July 30, 2013 Moderators Share Posted July 30, 2013 I would normally do it through an array, but it seems (at 10mb anyway) that danwilli's method is a skosh faster (1.22s over 1.5). ;Assuming file already exists #include <File.au3> Local $aArray $sFileName = @ScriptDir & "\File.txt" $sNewFileName = @ScriptDir & "\NewFile.txt" ; start a timer $iTimer = TimerInit() ; what I want to add to the beginning of each line $sPrePend = "1234" $hNewFile = FileOpen($sNewFileName, 1) _FileReadToArray($sFileName, $aArray) For $i = 1 To $aArray[0] FileWriteLine($hNewFile, $sPrePend & $aArray[$i]) Next MsgBox(0, "Time this took", TimerDiff($iTimer)) FileClose($hNewFile) "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum! Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 expandcollapse popup;http://www.autoitscript.com/forum/topic/153108-prepending-each-line-of-a-large-text-file/ ;Post #2 ;D:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\SLICER\Avatar\photo-thumb-22566.jpg ;by danwilli ;Script grabbed by SLICER by Edano here: http://www.autoitscript.com/forum/topic/152402-slicer-autoit-forum-script-grabber/?p=1093575 $sFileName = @ScriptDir & "\File.txt" $sNewFileName = @ScriptDir & "\NewFile.txt" ; If a test file doesn't exist, create one that's 10mb If FileExists($sFileName) = 0 Then Do FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.") $iSize = FileGetSize($sFileName) Until ($iSize / 1048576) > 10 EndIf ; start a timer $iTimer = TimerInit() ; what I want to add to the beginning of each line $sPrePend = "1234" $hFile = FileOpen($sFileName, 0) $hNewFile = FileOpen($sNewFileName, 1) While 1 $sLine = FileReadLine($hFile) If @error = -1 Then ExitLoop; If eof, exitloop... ; ...otherwise, prepend the line and write it to the new file $sLine = $sPrePend & $sLine FileWriteLine($hNewFile, $sLine) WEnd MsgBox(0, "Time this took", TimerDiff($iTimer)) FileClose($hFile) FileClose($hNewFile) ; start a timer $iTimer = TimerInit() ; what I want to add to the beginning of each line $sPrePend = "1234" $hFile = FileOpen($sFileName, 0) $hNewFile = FileOpen($sNewFileName, 2) ;While 1 $sLine = FileRead($hFile) ; If @error = -1 Then ExitLoop; If eof, exitloop... ; ...otherwise, prepend the line and write it to the new file ; $sLine = $sPrePend & $sLine FileWrite($hNewFile, StringTrimRight($sPrePend&StringReplace($sLine,@CRLF,@CRLF&$sPrePend),6)) ;WEnd MsgBox(0, "Time this took", TimerDiff($iTimer)) FileClose($hFile) FileClose($hNewFile) [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
Moderators JLogan3o13 Posted July 30, 2013 Moderators Share Posted July 30, 2013 On my test machine, that script takes much longer than either the original posted by danwilli, or the array method (2.589) "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum! Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 On my test machine, that script takes much longer than either the original posted by danwilli, or the array method (2.589) . hm, not on mine. and the threadowner mentioned that the array method is not viable. [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
Moderators JLogan3o13 Posted July 30, 2013 Moderators Share Posted July 30, 2013 Not to hijack the thread too much, but what OS are you running it on? I get the following results on WIN7 x64: Danwilli's: 1141.31422588465 Mine: Removed - I missed his statement amount memory allocation errors Yours: 2542.42430448358 "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum! Link to comment Share on other sites More sharing options...
DW1 Posted July 30, 2013 Share Posted July 30, 2013 OP has a working solution, so hijack away We should be testing against a 500mb+ file though, as the OP only used 10mb for a quick repro. AutoIt3 Online Help Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 winxp sp3 line: 1190 ms file 1150 ms but a series of trials show that it's pretty much equal [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 OP has a working solution, so hijack away We should be testing against a 500mb+ file though, as the OP only used 10mb for a quick repro. . i wonder if stringreplace can do a 500 mb file ? [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 tested 100 mb: 17,704 ms for readline, 16,291 ms for readfile. [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
DW1 Posted July 30, 2013 Share Posted July 30, 2013 (edited) On a 500mb file line: <3mb memory, 37 seconds file: inflated to 2GB memory until "Error allocating memory" there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable. tested 100 mb: 17,704 ms for readline, 16,291 ms for readfile. What was the memory usage of the readfile method? Edited July 30, 2013 by danwilli AutoIt3 Online Help Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 On a 500mb file line: <3mb memory, 37 seconds file: inflated to 2GB memory until "Error allocating memory" there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable. What was the memory usage of the readfile method? . hui ! 4 mb on the lineread 400-600 mb on the fileread for the 100 mb file [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 since this is hijacked, another question : what does iniread/write do ? also opens and closes the file for each command ? [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted July 30, 2013 Moderators Share Posted July 30, 2013 Edano,I believe so - how would it work otherwise? But as you normally only read and write once per script it is not too much of an overhead. And as ini files are limited to 32kb per section, there is no problem at all when compared to a 500MB file! M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Edano Posted July 30, 2013 Share Posted July 30, 2013 gary frost gave me this once: . Func _IniReadSection($x, $y, ByRef $Key, ByRef $Val) $t = StringRegExp(@CRLF & FileRead($x) & @CRLF & "[", "(?s)(?i)\n\s*\[\s*" & $y & "\s*\]\s*\r\n(.*?)\[", 3) $Key = StringRegExp(@LF & $t[0], "\n\s*(.*?)\s*=", 3) $Val = StringRegExp(@LF & $t[0], "\n\s*.*?\s*=(.*?)\r", 3) Local $t = UBound($Key), $i_key[$t + 1], $i_val[$t + 1] $i_key[0] = $t $i_val[0] = $t For $i = 0 To $t - 1 $i_key[$i + 1] = $Key[$i] $i_val[$i + 1] = $Val[$i] Next EndFnc ;==>_IniReadSection [color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font] Link to comment Share on other sites More sharing options...
MarkRobbins Posted July 30, 2013 Share Posted July 30, 2013 Write the line to a new file, use Dos COPY to join the existing, then delete and rename. Link to comment Share on other sites More sharing options...
Moderators JLogan3o13 Posted July 31, 2013 Moderators Share Posted July 31, 2013 How does that solve the OPs problem, as he wants to append every line with the new data? "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now