Sign in to follow this  
Followers 0
buymeapc

Prepending Each Line of a LARGE Text File?

25 posts in this topic

Hi all,

So, I have several large text files, upwards of 500MB or more, that I need to read and add a string to the beginning of each line in the file. It's a pretty easy task, but it seems to take a long time with the current way I'm doing it. I would love to use _FileReadToArray(), but I get memory allocation errors.

Is there a way to make this quicker? Here's my sample code.

Thank you!

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($sNewFileName, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

open the new file for writing instead of passing a filename to FileWriteLine()

Currently, each call to FileWriteLine() is doing a FileOpen(), FileWriteLine(), FileClose() which is a lot more expensive than passing an already open for writing, handle.

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 1)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($hNewFile, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)
Edited by danwilli

Share this post


Link to post
Share on other sites

OK, I'm impressed. The time went from 45 seconds to create a 10mb file to less than 1 second.

Thanks for the help!!

Share this post


Link to post
Share on other sites

iirc, it's even faster to read/write the entire file instead of reading/writing it line by line. will make some tests...


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

I would normally do it through an array, but it seems (at 10mb anyway) that danwilli's method is a skosh faster (1.22s over 1.5).

;Assuming file already exists

#include <File.au3>

Local $aArray

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"
$hNewFile = FileOpen($sNewFileName, 1)

_FileReadToArray($sFileName, $aArray)

For $i = 1 To $aArray[0]
    FileWriteLine($hNewFile, $sPrePend & $aArray[$i])
Next

MsgBox(0, "Time this took", TimerDiff($iTimer))

FileClose($hNewFile)

√-1 2^3 ∑ π, and it was delicious!

Share this post


Link to post
Share on other sites

;http://www.autoitscript.com/forum/topic/153108-prepending-each-line-of-a-large-text-file/
;Post #2
;D:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\SLICER\Avatar\photo-thumb-22566.jpg
;by danwilli

;Script grabbed by SLICER by Edano here: http://www.autoitscript.com/forum/topic/152402-slicer-autoit-forum-script-grabber/?p=1093575

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 1)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($hNewFile, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 2)
;While 1
    $sLine = FileRead($hFile)
;    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
;    $sLine = $sPrePend & $sLine
    FileWrite($hNewFile, StringTrimRight($sPrePend&StringReplace($sLine,@CRLF,@CRLF&$sPrePend),6))
;WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

On my test machine, that script takes much longer than either the original posted by danwilli, or the array method (2.589)


√-1 2^3 ∑ π, and it was delicious!

Share this post


Link to post
Share on other sites

On my test machine, that script takes much longer than either the original posted by danwilli, or the array method (2.589)

.

hm, not on mine. and the threadowner mentioned that the array method is not viable.


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

Not to hijack the thread too much, but what OS are you running it on? I get the following results on WIN7 x64:

Danwilli's:

   1141.31422588465

Mine:

   Removed - I missed his statement amount memory allocation errors

Yours:

   2542.42430448358


√-1 2^3 ∑ π, and it was delicious!

Share this post


Link to post
Share on other sites

OP has a working solution, so hijack away :P

We should be testing against a 500mb+ file though, as the OP only used 10mb for a quick repro.

Share this post


Link to post
Share on other sites

winxp sp3

line: 1190 ms

file 1150 ms

but a series of trials show that it's pretty much equal


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

OP has a working solution, so hijack away :P

We should be testing against a 500mb+ file though, as the OP only used 10mb for a quick repro.

.

i wonder if stringreplace can do a 500 mb file ?


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

tested 100 mb: 17,704 ms for readline, 16,291 ms for readfile.


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

On a 500mb file

line: <3mb memory, 37 seconds

file: inflated to 2GB memory until "Error allocating memory"

there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable.

 

tested 100 mb: 17,704 ms for readline, 16,291 ms for readfile.

What was the memory usage of the readfile method?

Edited by danwilli

Share this post


Link to post
Share on other sites

On a 500mb file

line: <3mb memory, 37 seconds

file: inflated to 2GB memory until "Error allocating memory"

there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable.

 

What was the memory usage of the readfile method?

.

hui !

4 mb on the lineread

400-600 mb on the fileread

for the 100 mb file


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

since this is hijacked, another question :

what does iniread/write do ? also opens and closes the file for each command ?


[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

Edano,

I believe so - how would it work otherwise? But as you normally only read and write once per script it is not too much of an overhead. ;)

And as ini files are limited to 32kb per section, there is no problem at all when compared to a 500MB file! :D

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

gary frost gave me this once:

.

Func _IniReadSection($x, $y, ByRef $Key, ByRef $Val)
    $t = StringRegExp(@CRLF & FileRead($x) & @CRLF & "[", "(?s)(?i)\n\s*\[\s*" & $y & "\s*\]\s*\r\n(.*?)\[", 3)
    $Key = StringRegExp(@LF & $t[0], "\n\s*(.*?)\s*=", 3)
    $Val = StringRegExp(@LF & $t[0], "\n\s*.*?\s*=(.*?)\r", 3)

    Local $t = UBound($Key), $i_key[$t + 1], $i_val[$t + 1]
    $i_key[0] = $t
    $i_val[0] = $t
    For $i = 0 To $t - 1
        $i_key[$i + 1] = $Key[$i]
        $i_val[$i + 1] = $Val[$i]
    Next
EndFnc   ;==>_IniReadSection

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Share this post


Link to post
Share on other sites

Write the line to a new file, use Dos COPY to join the existing, then delete and rename.

Share this post


Link to post
Share on other sites

How does that solve the OPs problem, as he wants to append every line with the new data?


√-1 2^3 ∑ π, and it was delicious!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0