Jump to content

Recommended Posts

Posted

Hi all,

So, I have several large text files, upwards of 500MB or more, that I need to read and add a string to the beginning of each line in the file. It's a pretty easy task, but it seems to take a long time with the current way I'm doing it. I would love to use _FileReadToArray(), but I get memory allocation errors.

Is there a way to make this quicker? Here's my sample code.

Thank you!

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($sNewFileName, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
Posted (edited)

open the new file for writing instead of passing a filename to FileWriteLine()

Currently, each call to FileWriteLine() is doing a FileOpen(), FileWriteLine(), FileClose() which is a lot more expensive than passing an already open for writing, handle.

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 1)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($hNewFile, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)
Edited by danwilli
Posted

iirc, it's even faster to read/write the entire file instead of reading/writing it line by line. will make some tests...

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

  • Moderators
Posted

I would normally do it through an array, but it seems (at 10mb anyway) that danwilli's method is a skosh faster (1.22s over 1.5).

;Assuming file already exists

#include <File.au3>

Local $aArray

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"
$hNewFile = FileOpen($sNewFileName, 1)

_FileReadToArray($sFileName, $aArray)

For $i = 1 To $aArray[0]
    FileWriteLine($hNewFile, $sPrePend & $aArray[$i])
Next

MsgBox(0, "Time this took", TimerDiff($iTimer))

FileClose($hNewFile)

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Posted

;http://www.autoitscript.com/forum/topic/153108-prepending-each-line-of-a-large-text-file/
;Post #2
;D:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\SLICER\Avatar\photo-thumb-22566.jpg
;by danwilli

;Script grabbed by SLICER by Edano here: http://www.autoitscript.com/forum/topic/152402-slicer-autoit-forum-script-grabber/?p=1093575

$sFileName = @ScriptDir & "\File.txt"
$sNewFileName = @ScriptDir & "\NewFile.txt"

; If a test file doesn't exist, create one that's 10mb
If FileExists($sFileName) = 0 Then
    Do
        FileWriteLine($sFileName, "Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here. Lots of text goes here.")
        $iSize = FileGetSize($sFileName)
    Until ($iSize / 1048576) > 10
EndIf

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 1)
While 1
    $sLine = FileReadLine($hFile)
    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
    $sLine = $sPrePend & $sLine
    FileWriteLine($hNewFile, $sLine)
WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)

; start a timer
$iTimer = TimerInit()

; what I want to add to the beginning of each line
$sPrePend = "1234"

$hFile = FileOpen($sFileName, 0)
$hNewFile = FileOpen($sNewFileName, 2)
;While 1
    $sLine = FileRead($hFile)
;    If @error = -1 Then ExitLoop; If eof, exitloop...
    ; ...otherwise, prepend the line and write it to the new file
;    $sLine = $sPrePend & $sLine
    FileWrite($hNewFile, StringTrimRight($sPrePend&StringReplace($sLine,@CRLF,@CRLF&$sPrePend),6))
;WEnd
MsgBox(0, "Time this took", TimerDiff($iTimer))
FileClose($hFile)
FileClose($hNewFile)

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Posted

On my test machine, that script takes much longer than either the original posted by danwilli, or the array method (2.589)

.

hm, not on mine. and the threadowner mentioned that the array method is not viable.

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

  • Moderators
Posted

Not to hijack the thread too much, but what OS are you running it on? I get the following results on WIN7 x64:

Danwilli's:

   1141.31422588465

Mine:

   Removed - I missed his statement amount memory allocation errors

Yours:

   2542.42430448358

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Posted

winxp sp3

line: 1190 ms

file 1150 ms

but a series of trials show that it's pretty much equal

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Posted

OP has a working solution, so hijack away :P

We should be testing against a 500mb+ file though, as the OP only used 10mb for a quick repro.

.

i wonder if stringreplace can do a 500 mb file ?

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Posted

tested 100 mb: 17,704 ms for readline, 16,291 ms for readfile.

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Posted (edited)

On a 500mb file

line: <3mb memory, 37 seconds

file: inflated to 2GB memory until "Error allocating memory"

there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable.

 

tested 100 mb: 17,704 ms for readline, 16,291 ms for readfile.

What was the memory usage of the readfile method?

Edited by danwilli
Posted

On a 500mb file

line: <3mb memory, 37 seconds

file: inflated to 2GB memory until "Error allocating memory"

there is probably a way to do this in chunks with perhaps StringRegExpReplace that might be quicker, but 37 seconds for 500mb seems reasonable.

 

What was the memory usage of the readfile method?

.

hui !

4 mb on the lineread

400-600 mb on the fileread

for the 100 mb file

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Posted

since this is hijacked, another question :

what does iniread/write do ? also opens and closes the file for each command ?

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

  • Moderators
Posted

Edano,

I believe so - how would it work otherwise? But as you normally only read and write once per script it is not too much of an overhead. ;)

And as ini files are limited to 32kb per section, there is no problem at all when compared to a 500MB file! :D

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Posted

gary frost gave me this once:

.

Func _IniReadSection($x, $y, ByRef $Key, ByRef $Val)
    $t = StringRegExp(@CRLF & FileRead($x) & @CRLF & "[", "(?s)(?i)\n\s*\[\s*" & $y & "\s*\]\s*\r\n(.*?)\[", 3)
    $Key = StringRegExp(@LF & $t[0], "\n\s*(.*?)\s*=", 3)
    $Val = StringRegExp(@LF & $t[0], "\n\s*.*?\s*=(.*?)\r", 3)

    Local $t = UBound($Key), $i_key[$t + 1], $i_val[$t + 1]
    $i_key[0] = $t
    $i_val[0] = $t
    For $i = 0 To $t - 1
        $i_key[$i + 1] = $Key[$i]
        $i_val[$i + 1] = $Val[$i]
    Next
EndFnc   ;==>_IniReadSection

[color=rgb(255,0,0);][font="'comic sans ms', cursive;"]FukuLeaks[/color][/font]

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...