Jump to content

Prepending Each Line of a LARGE Text File?


Recommended Posts

Hi all,

So, I have several large text files, upwards of 500MB or more, that I need to read and add a string to the beginning of each line in the file. It's a pretty easy task, but it seems to take a long time with the current way I'm doing it. I would love to use _FileReadToArray(), but I get memory allocation errors.

Is there a way to make this quicker? Here's my sample code.

Thank you!

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($sNewFileName, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
Link to comment
Share on other sites

open the new file for writing instead of passing a filename to FileWriteLine()

Currently, each call to FileWriteLine() is doing a FileOpen(), FileWriteLine(), FileClose() which is a lot more expensive than passing an already open for writing, handle.

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 1)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($hNewFile, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)
Edited by danwilli
Link to comment
Share on other sites

iirc, it's even faster to read/write the entire file instead of reading/writing it line by line. will make some tests...

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Link to comment
Share on other sites

  • Moderators

I would normally do it through an array, but it seems (at 10mb anyway) that danwilli's method is a skosh faster (1.22s over 1.5).

;Assuming file already exists

#include <File.au3>

Local $aArray

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"
$hNewFile = FileOpen($sNewFileName, 1)

_FileReadToArray($sFileName, $aArray)

For $i = 1 To $aArray[0]
    FileWriteLine($hNewFile, $sPrePend & $aArray[$i])
Next

MsgBox(0, "Time this took", TimerDiff($iTimer))

FileClose($hNewFile)

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

;http://www.autoitscript.com/forum/topic/153108-prepending-each-line-of-a-large-text-file/
;Post #2
;D:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\SLICER\Avatar\photo-thumb-22566.jpg
;by danwilli

;Script grabbed by SLICER by Edano here: http://www.autoitscript.com/forum/topic/152402-slicer-autoit-forum-script-grabber/?p=1093575

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 1)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($hNewFile, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 2)
;While 1
    $sLine = FileRead($hFile)
;    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
;    $sLine = $sPrePend & $sLine
    FileWrite($hNewFile, StringTrimRight($sPrePend&StringReplace($sLine,@CRLF,@CRLF&$sPrePend),6))
;WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Link to comment
Share on other sites

  • Moderators

On my test machine, that script takes much longer than either the original posted by danwilli, or the array method (2.589)

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

On my test machine, that script takes much longer than either the original posted by danwilli, or the array method (2.589)

.

hm, not on mine. and the threadowner mentioned that the array method is not viable.

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Link to comment
Share on other sites

  • Moderators

Not to hijack the thread too much, but what OS are you running it on? I get the following results on WIN7 x64:

Danwilli's:

   1141.31422588465

Mine:

   Removed - I missed his statement amount memory allocation errors

Yours:

   2542.42430448358

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

OP has a working solution, so hijack away :P

We should be testing against a 500mb+ file though, as the OP only used 10mb for a quick repro.

.

i wonder if stringreplace can do a 500 mb file ?

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Link to comment
Share on other sites

On a 500mb file

line: <3mb memory, 37 seconds

file: inflated to 2GB memory until "Error allocating memory"

there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable.

 

tested 100 mb: 17,704 ms for readline, 16,291 ms for readfile.

What was the memory usage of the readfile method?

Edited by danwilli
Link to comment
Share on other sites

On a 500mb file

line: <3mb memory, 37 seconds

file: inflated to 2GB memory until "Error allocating memory"

there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable.

 

What was the memory usage of the readfile method?

.

hui !

4 mb on the lineread

400-600 mb on the fileread

for the 100 mb file

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Link to comment
Share on other sites

since this is hijacked, another question :

what does iniread/write do ? also opens and closes the file for each command ?

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Link to comment
Share on other sites

  • Moderators

Edano,

I believe so - how would it work otherwise? But as you normally only read and write once per script it is not too much of an overhead. ;)

And as ini files are limited to 32kb per section, there is no problem at all when compared to a 500MB file! :D

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

gary frost gave me this once:

.

Func _IniReadSection($x, $y, ByRef $Key, ByRef $Val)
    $t = StringRegExp(@CRLF & FileRead($x) & @CRLF & "[", "(?s)(?i)\n\s*\[\s*" & $y & "\s*\]\s*\r\n(.*?)\[", 3)
    $Key = StringRegExp(@LF & $t[0], "\n\s*(.*?)\s*=", 3)
    $Val = StringRegExp(@LF & $t[0], "\n\s*.*?\s*=(.*?)\r", 3)

    Local $t = UBound($Key), $i_key[$t + 1], $i_val[$t + 1]
    $i_key[0] = $t
    $i_val[0] = $t
    For $i = 0 To $t - 1
        $i_key[$i + 1] = $Key[$i]
        $i_val[$i + 1] = $Val[$i]
    Next
EndFnc   ;==>_IniReadSection

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Link to comment
Share on other sites

  • Moderators

How does that solve the OPs problem, as he wants to append every line with the new data?

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...