Jump to content

Text file splitter by line


Recommended Posts

I have text file that have more than 1/2 million lines, for which I wanted to split the text file by lines.

I wrote a simple script, as below. It works for small files but not for big files. What gives?

The original text file is around 500mb, but my computer got ample of free memory.

$file = FileOpen("C:\CD8M.csv",0)

$i = 1
$oldstring = ""
$newstring = ""
$j = 50000 ; the line where to split

While 1
    $line = FileReadLine($file,$i)
    If @error = -1 then ExitLoop
        If $i < $j Then
            $oldstring = $oldstring & $line & @CRLF
        Else
            $newstring = $newstring & $line & @CRLF
        EndIf   
    $i = $i + 1
WEnd

FileClose($file)


$file = FileOpen("C:\first.csv",1)
    FileWrite($file,$oldstring)
FileClose($file)

$file = FileOpen("C:\second.csv",1)
    FileWrite($file,$newstring)
FileClose($file)
Link to comment
Share on other sites

Nah, I saw it there too. Mods are fast.

Ok good... Wasn't too sure about myself for a second there.

Anyways, since this is my third post in this thread I decided it should be helpful.

I found this in the helpfile which might be part of your problem:

From a performance standpoint it is a bad idea to read line by line specifying "line" parameter whose value is incrementing by one. This forces AutoIt to reread the file from the beginning until it reach the specified line.

I'm guessing that there is an AutoIt limit or something here (even though it's not mentioned that I can see).

What do you mean when you say it doesn't work? Does AutoIt crash or does the script execute successfully and not produce the desired files?

If you're positive it works on small files and it doesn't work on your huge file then logically their is nothing wrong with your code but rather a limit to either AutoIt or your computer.

Edited by Piano_Man
My Programs[list][*]Knight Media Player[*]Multiple Desktops[*]Daily Comics[*]Journal[/list]
Link to comment
Share on other sites

Hi,

All the above is true;

1. Additionally, you can optimise your code "&=";

2. It is 3x faster to just StringSplit a fileread, if you have memory etc;

; filelargesplit.au3
#include<file.au3>
$file = FileOpen("C:\CD8M.csv", 0)
;~ Local $file1 = FileOpenDialog("test.csv", @ScriptDir & "\", "csv (*.csv)", 1 + 4),$i
;~ Local $file1 = @ScriptDir & "\DELL9150FastSearchAllNew.txt",$i
ConsoleWrite(_FileCountLines($file1) & @LF)
Local $time1 = TimerInit()
$file = FileOpen($file1, 0)
;~ L$i = 1
$oldstring = ""
$newstring = ""
$j = 50000 ; the line where to split
;~ FileReadLine($file,$i)
While 1
    $line = FileReadLine($file)
    If @error = -1 Then ExitLoop
    If $i < $j Then
        $oldstring &= $line & @CRLF
    Else
        $newstring &= $line & @CRLF
    EndIf
    $i += 1
WEnd
FileClose($file)
$file = FileOpen(@ScriptDir & "\first.csv", 1)
FileWrite($file, $oldstring)
FileClose($file)
$file = FileOpen(@ScriptDir & "\second.csv", 1)
FileWrite($file, $newstring)
FileClose($file)
ConsoleWrite("filelargesplit.au3=" & Round(TimerDiff($time1)) & " msec" & @LF)
#cs
    606268
    filelargesplit.au3=18325 msec
#ce
;========================================================================
Local $c = FileDelete(@ScriptDir & "\first.csv"), $c = FileDelete(@ScriptDir & "\second.csv")
Local $time1 = TimerInit(), $fileRead = FileRead($file1), $i_pos = StringInStr($fileRead, @CRLF, 0, 50000)
FileWrite(@ScriptDir & "\first.csv", StringLeft($fileRead, $i_pos - 1))
FileWrite(@ScriptDir & "\second.csv", StringMid($fileRead, $i_pos + 2))
ConsoleWrite("fileread=" & Round(TimerDiff($time1)) & " msec" & @LF)
#cs
    606268 lines
    filelargesplit.au3=18736 msec
    fileread=5623 msec
#ce
Best, randall Edited by randallc
Link to comment
Share on other sites

  • 5 weeks later...

What do you mean when you say it doesn't work? Does AutoIt crash or does the script execute successfully and not produce the desired files?

If you're positive it works on small files and it doesn't work on your huge file then logically their is nothing wrong with your code but rather a limit to either AutoIt or your computer.

For small files, it gives the correct output. For large files, it keeps on processing with CPU at 100%. I waited at most 24hrs before I endprocess it. I wouldn't say it crashed, I think it is just inefficient coding.

I believe there must be more efficient way of handling large text files because freeware programs can split my 300mb textfile within 5 minutes. It shouldn't be any limit by my computer but rather that of efficient coding.

I will try out RandallC suggestion and report back.

thanks a lot.

Link to comment
Share on other sites

Thanks @RandallC

$file1 = "C:\largefile.csv"
$i = 50000
$fileRead = FileRead($file1)
$i_pos = StringInStr($fileRead, @CRLF, 0, $i)
FileWrite(@ScriptDir & "\first.csv", StringLeft($fileRead, $i_pos - 1))
FileWrite(@ScriptDir & "\second.csv", StringMid($fileRead, $i_pos + 2))

I splitted a 150mb large text file in 25 seconds. Proper implementation of text manipulation.

Edited by SiteMaze
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...