Sign in to follow this  
Followers 0
SiteMaze

Text file splitter by line

9 posts in this topic

I have text file that have more than 1/2 million lines, for which I wanted to split the text file by lines.

I wrote a simple script, as below. It works for small files but not for big files. What gives?

The original text file is around 500mb, but my computer got ample of free memory.

$file = FileOpen("C:\CD8M.csv",0)

$i = 1
$oldstring = ""
$newstring = ""
$j = 50000 ; the line where to split

While 1
    $line = FileReadLine($file,$i)
    If @error = -1 then ExitLoop
        If $i < $j Then
            $oldstring = $oldstring & $line & @CRLF
        Else
            $newstring = $newstring & $line & @CRLF
        EndIf   
    $i = $i + 1
WEnd

FileClose($file)


$file = FileOpen("C:\first.csv",1)
    FileWrite($file,$oldstring)
FileClose($file)

$file = FileOpen("C:\second.csv",1)
    FileWrite($file,$newstring)
FileClose($file)

Share this post


Link to post
Share on other sites



Well, either they moved it or I'm going crazy and it was always in General Help and Support.

Nah, I saw it there too. Mods are fast.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Nah, I saw it there too. Mods are fast.

Ok good... Wasn't too sure about myself for a second there.

Anyways, since this is my third post in this thread I decided it should be helpful.

I found this in the helpfile which might be part of your problem:

From a performance standpoint it is a bad idea to read line by line specifying "line" parameter whose value is incrementing by one. This forces AutoIt to reread the file from the beginning until it reach the specified line.

I'm guessing that there is an AutoIt limit or something here (even though it's not mentioned that I can see).

What do you mean when you say it doesn't work? Does AutoIt crash or does the script execute successfully and not produce the desired files?

If you're positive it works on small files and it doesn't work on your huge file then logically their is nothing wrong with your code but rather a limit to either AutoIt or your computer.

Edited by Piano_Man

My Programs[list][*]Knight Media Player[*]Multiple Desktops[*]Daily Comics[*]Journal[/list]

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Hi,

All the above is true;

1. Additionally, you can optimise your code "&=";

2. It is 3x faster to just StringSplit a fileread, if you have memory etc;

; filelargesplit.au3
#include<file.au3>
$file = FileOpen("C:\CD8M.csv", 0)
;~ Local $file1 = FileOpenDialog("test.csv", @ScriptDir & "\", "csv (*.csv)", 1 + 4),$i
;~ Local $file1 = @ScriptDir & "\DELL9150FastSearchAllNew.txt",$i
ConsoleWrite(_FileCountLines($file1) & @LF)
Local $time1 = TimerInit()
$file = FileOpen($file1, 0)
;~ L$i = 1
$oldstring = ""
$newstring = ""
$j = 50000 ; the line where to split
;~ FileReadLine($file,$i)
While 1
    $line = FileReadLine($file)
    If @error = -1 Then ExitLoop
    If $i < $j Then
        $oldstring &= $line & @CRLF
    Else
        $newstring &= $line & @CRLF
    EndIf
    $i += 1
WEnd
FileClose($file)
$file = FileOpen(@ScriptDir & "\first.csv", 1)
FileWrite($file, $oldstring)
FileClose($file)
$file = FileOpen(@ScriptDir & "\second.csv", 1)
FileWrite($file, $newstring)
FileClose($file)
ConsoleWrite("filelargesplit.au3=" & Round(TimerDiff($time1)) & " msec" & @LF)
#cs
    606268
    filelargesplit.au3=18325 msec
#ce
;========================================================================
Local $c = FileDelete(@ScriptDir & "\first.csv"), $c = FileDelete(@ScriptDir & "\second.csv")
Local $time1 = TimerInit(), $fileRead = FileRead($file1), $i_pos = StringInStr($fileRead, @CRLF, 0, 50000)
FileWrite(@ScriptDir & "\first.csv", StringLeft($fileRead, $i_pos - 1))
FileWrite(@ScriptDir & "\second.csv", StringMid($fileRead, $i_pos + 2))
ConsoleWrite("fileread=" & Round(TimerDiff($time1)) & " msec" & @LF)
#cs
    606268 lines
    filelargesplit.au3=18736 msec
    fileread=5623 msec
#ce
Best, randall Edited by randallc

Share this post


Link to post
Share on other sites

What do you mean when you say it doesn't work? Does AutoIt crash or does the script execute successfully and not produce the desired files?

If you're positive it works on small files and it doesn't work on your huge file then logically their is nothing wrong with your code but rather a limit to either AutoIt or your computer.

For small files, it gives the correct output. For large files, it keeps on processing with CPU at 100%. I waited at most 24hrs before I endprocess it. I wouldn't say it crashed, I think it is just inefficient coding.

I believe there must be more efficient way of handling large text files because freeware programs can split my 300mb textfile within 5 minutes. It shouldn't be any limit by my computer but rather that of efficient coding.

I will try out RandallC suggestion and report back.

thanks a lot.

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Thanks @RandallC

$file1 = "C:\largefile.csv"
$i = 50000
$fileRead = FileRead($file1)
$i_pos = StringInStr($fileRead, @CRLF, 0, $i)
FileWrite(@ScriptDir & "\first.csv", StringLeft($fileRead, $i_pos - 1))
FileWrite(@ScriptDir & "\second.csv", StringMid($fileRead, $i_pos + 2))

I splitted a 150mb large text file in 25 seconds. Proper implementation of text manipulation.

Edited by SiteMaze

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0