Jump to content

split a text file in multiples files


Recommended Posts

Hello Everybody,

I`m new here, I hope learn and share ideas around muttley

First I`m not a AutoIt programmer yet and my problem is simple : I need to split a text file in multiples files(based on the number of lines).

and output something

text.txt

to

text_1.txt

text_3.txt

text_2.txt

go on..

I searched on Google and all this forum to find a solution to do it, but I just did`t find any solution at all. I trying to code something here, but without success yet

$file = "file.txt"
$hFile = FileOpen($file,0)
$sRead = FileRead($hFile)
FileClose($hFile)

Local $a = StringSplit($sRead, "|")
For $element In $a
    ConsoleWrite($element & @CRLF)
Next

Someone can guide a help for me?

Many Thanks Folks!!

Link to comment
Share on other sites

  • Moderators

You need to be a lot more specific on the criteria of the split.

_FileReadToArray() will split the file into individual lines, and [0] element will have the total number of lines in the file as well.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

...I need to split a text file in multiples files(based on the number of lines).

and output something

text.txt

to

text_1.txt

text_3.txt

text_2.txt

go on..

If you promise not to laugh too hard - you can see one way to do it:
$lines_per_output_file = 10

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;generate a fake input file for this test
$junk = ""
For $i = 1 To 100
    $junk = $junk & $i & @CRLF
Next

$file = FileOpen("test.txt", 2)
FileWrite($file, $junk)
FileClose($file)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;read and split file into an array
$array_of_whole_file = StringSplit(FileRead("test.txt"), @CRLF, 1)

$filecnt = 1
$linecnt = 1
While 1
 ;open a numbered output file
    FileOpen("test_" & $filecnt & ".txt", 2)
 
 ;write x number of lines to that file
    For $i = 1 To $lines_per_output_file
        FileWriteLine("test_" & $filecnt & ".txt", $array_of_whole_file[$linecnt])
        $linecnt = $linecnt + 1
        If $array_of_whole_file[0] = $linecnt Then
            FileClose("test_" & $filecnt & ".txt")
            Exit
        EndIf
    Next
    FileClose("test_" & $filecnt & ".txt")
    $filecnt = $filecnt + 1
WEnd
...but SmOke_N is right - there is more than one way to interpret your request.

Edit1: Doh! - Welcome to the forums!

Edit2: added comments to my messy code

Edited by herewasplato

[size="1"][font="Arial"].[u].[/u][/font][/size]

Link to comment
Share on other sites

...and output something

lol you know when the word "something" is used, the request isn't specific enough..

Here is what i'll assume you meant though, since Smoke_N brought it up

#Include <File.au3>

Dim $FilePath = "c:\path\test.txt", $BaseName = "test_", $Lines
_FileReadToArray($FilePath,$Lines)
For $i = 1 to $Lines[0]
    $File = FileOpen(@ScriptDir&"\"&$BaseName&$i&".txt",2)
    Filewrite($File, $Lines[$i]
    FileClose($File)
Next

This will make a new file for every line of the base file.

However, i doubt thats what you meant though... muttley

Link to comment
Share on other sites

You need to be a lot more specific on the criteria of the split.

_FileReadToArray() will split the file into individual lines, and [0] element will have the total number of lines in the file as well.

SmOke_N, I guess this StringSplit() as used below did the same thing.

If you promise not to laugh too hard - you can see one way to do it:

$lines_per_output_file = 10

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;generate a fake input file for this test
$junk = ""
For $i = 1 To 100
    $junk = $junk & $i & @CRLF
Next

$file = FileOpen("test.txt", 2)
FileWrite($file, $junk)
FileClose($file)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;read and split file into an array
$array_of_whole_file = StringSplit(FileRead("test.txt"), @CRLF, 1)

$filecnt = 1
$linecnt = 1
While 1

 
 ;write x number of lines to that file
    For $i = 1 To $lines_per_output_file
        FileWriteLine("test_" & $filecnt & ".txt", $array_of_whole_file[$linecnt])
        $linecnt = $linecnt + 1
        If $array_of_whole_file[0] = $linecnt Then
            FileClose("test_" & $filecnt & ".txt")
            Exit
        EndIf
    Next
    FileClose("test_" & $filecnt & ".txt")
    $filecnt = $filecnt + 1
WEndoÝ÷ Ù»­Jc¤xج®(!¶Ø^­è¬Þ¶§¢w°k+h{^®Þ·*.®·ª¹ë-Ø­Ô:!ßuÊ&-çè®é¬ßqÝu×r¦z{l¶²ë,ÉÊ{ú®¢×©äÊ)eáz·°jÊejÚ-«b´lÌ(®H§÷«Ë¥.ëmꮢÛ(ëax*ºHë-­é¨½çmábã©zz®¢Û^²Ú®¢Ö¦§Mú~)^r{Z®¢Üf«¨·f§v+ly鬶½ëayø¥y©òÁ¬ªº^©

This will make a new file for every line of the base file.

However, i doubt thats what you meant though... muttley

I confess, The "something" wasn`t clear. Thank you herewasplato, it did the trick!

The fact is, I have 20000 lines to save in chunks of 2000 each, the herewasplato code did the trick.

The only weird thing I found was the anti virus AVG resident shield(lastest version on Vista) was blocking the script to create the new files, the processor was getting 100%(on the anti virus process stead of AutoIt one) and the file never saves. Even after compile the EXE program the problem remains.

After close it several times / disable it, the script begin to create the files.

Its is off topic and I don`t know if it`ll help the AutoIt programmers, some people can use this anti virus(the most free one used, I guess) and be unable to run the a programs created from AutoIt due this issue.

Anyway, Thanks you guys for the help, you rocks!

Edited by mite
Link to comment
Share on other sites

...Even after compile the EXE program the problem remains...

Lucky you - you are being protected from yourself :-)

My guess is that the Heuristics function in AVG sees the rapid creation of multiple files with similar names as a bad thing.

You can try to:

Compile the script without the UPX compression.*

List the compiled script as an exception - not to be monitored by AVG.

*Start > Programs > AutoIt v3 > Compile Script to .exe

Select "Compression" from the GUI's menu bar

Uncheck "UPX Compress .exe stub"

[The resulting file will be about twice what it would be compressed - but this will lessen the chance that an anti-virus product will decide to quarantine all compiles scripts at some point in the future. AVG, AVAST, Symantec and many other AV products have all done this at some point in time.]

I'm glad that the code worked for you... Enjoy AutoIt :-)

[size="1"][font="Arial"].[u].[/u][/font][/size]

Link to comment
Share on other sites

Hello herewasplato,

Thanks I checked here and the AVG is fine now!

Yes I love autoIt!

Just more one thing : now about performance.

I found an old little tool to split file called csplit.exe (it sens a UNIX split command line converted to win32) http://man.root.cz/1/csplit/

it doesn`t support UTF-16 as AutoIt. but when the text is in ASCII it works just fine.

The point is, this tool splits the text in 1-2 secs the autoIt script do it in 7-10 secs. I think is possible to optimize your script to run a little bit faster.

What do you think: Stead of use FileWriteLine() create a temporary array with the splited content and save one time using FileWrite() stead.

Or, slice the array part where the split line begin and where it ends. less loops, would do it faster.

I am a actionScript and PHP programer, as I am not an auto It programmer, am not sure how to do thinks like that yet.

Do you think it would let it faster?

I`m trying to do it right now, if I finish I`ll post the script here. You helped saved my day today.

Thanks again!

Posted Image

Edited by mite
Link to comment
Share on other sites

  • Moderators

SmOke_N, I guess this StringSplit() as used below did the same thing.

Well, you can do it a ton of ways really... _FileReadToArray() does in fact use StringSplit :) ... you being new, I figured I'd give you the simple means.

Here's just another example (pseudo example as I haven't tested it) ... to show you what I meant.

$a_each_element_is_the_file_to_write = _SplitFileToArray("output.txt", 2000)
_ArrayDisplay($a_each_element_is_the_file_to_write)

Func _SplitFileToArray($s_file, $n_split_lines)
    
    If FileExists($s_file) = 0 Then Return SetError(1, 0, 0)

    Local $s_read = FileRead($s_file)
    Local $a_sre = StringRegExp($s_read, "(?s)" & _
            "((?:.+?(?:\z|\r\n)){0," & $n_split_lines & "}|" & _
            "(?:.+?(?:\z|\n)){0," & $n_split_lines & "}|" & _
            "(?:.+?(?:\z|\r)){0," & $n_split_lines & "})", 3)
            
    If IsArray($a_sre) = 0 Then Return SetError(2, 0, 0)
    
    Local $n_ubound = UBound($a_sre) - 1
    ReDim $a_sre[$n_ubound]
    
    Return $a_sre
EndFunc

Edit:

Interesting enough. In StringRegExp using {number,number} says from left to right, minimum to maximum. Max here being n_split_lines. I decided to test the above. Seems that 789 is the max allowed muttley ... anything over that failed.

Edit2:

Doesn't seem to be an AutoIt limitation but a PCRE one, I just replicated it in another language as well, even changing the oveccount doesn't help.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

...What do you think: Stead of use FileWriteLine() create a temporary array with the splited content and save one time using FileWrite() stead.

Or, slice the array part where the split line begin and where it ends. less loops, would do it faster....

I actually thought of several ways to code it, but I was having a bad coding day. I could not concentrate on what I was supposed to be doing - so, I came to the forum. I had a hard time writing the code above as it was... I kept barfing the syntax.

If I hit another lull, I'll attempt to code it better/faster. I'm not a programmer by training or trade... so I might never arrive at the best approach and I'm not so good at error checking. Using the _FileReadToArray() UDF will handle the split better than the one line that I threw in. Take a look at the code. It is on line 175, in this file: C:\Program Files\AutoIt3\Include\File.au3 in a typical install of v3.2.12.0. You can also make use of the error checking that is built into that UDF.

If you decide to keep this part:

If $array_of_whole_file[0] = $linecnt Then

FileClose("test_" & $filecnt & ".txt")

Exit

EndIf

You can change it to:

If $array_of_whole_file[0] = $linecnt Then Exit

The help files says that AutoIt will close all open files upon exiting (but hints that it is better to code the close)... but a single line "If" is quicker than "If/EndIf". If you want to "code the close" - look in the help file for OnAutoItExit.

Lines like this:

$filecnt = $filecnt + 1

should be

$filecnt += 1

for speed... I just never remember that syntax.

I often wonder how those in this forum that are real programmers put up with seeing code like mine - perhaps it is thru shear moral fortitude that they don't go postal on me.

...later...

Edit: WYSI not WYG

Edited by herewasplato

[size="1"][font="Arial"].[u].[/u][/font][/size]

Link to comment
Share on other sites

Hey Guys,

I`ll check your new posts tomorrow because I am tired today after 18 hours of codes :)

This is the current code I am working with, it catchs the values from command line (mode,filename, number of lines, encoding)...

also it catch the content from clipboard, this is the reason Im not using the _FileReadToArray().

from tests, It saved 100.000 lines in 25 files lines in 45 secs aprox..

I`m happy with this result and I won`t need more than that (100.000)

I am lazy to comments muttley

If($CmdLine[1]="simple") Then
    $a=FileOpen ($CmdLine[2],Number($CmdLine[3])+2)
    $ok=FileWrite($a,ClipGet())
    FileClose($a)
    ConsoleWrite ($ok)
EndIf


If($CmdLine[1]="split") Then
    
    $lines = Number($CmdLine[4])

    $file = StringSplit(ClipGet(), @CRLF, 1)

    $filecnt = 1
    $linecnt = 1
    While 1
        $handle =  FileOpen($CmdLine[2] & $filecnt & ".xml",Number($CmdLine[3])+2)
        FileWriteLine($handle, "<node>")


            For $i = 1 To $lines
                FileWriteLine($handle, $file[$linecnt])
                $linecnt = $linecnt + 1
              If $file[0] = $linecnt Then
          FileWriteLine($handle, "</node>")
                  FileClose($handle)
          ConsoleWrite ($filecnt)
             Exit
             EndIf
     Next
    FileWriteLine($handle, "</node>")
        FileClose($handle)
        $filecnt = $filecnt + 1
    WEnd

EndIf

See ya

Link to comment
Share on other sites

Well, you can do it a ton of ways really... _FileReadToArray() does in fact use StringSplit :) ... you being new, I figured I'd give you the simple means.

Here's just another example (pseudo example as I haven't tested it) ... to show you what I meant.

$a_each_element_is_the_file_to_write = _SplitFileToArray("output.txt", 2000)
_ArrayDisplay($a_each_element_is_the_file_to_write)

Func _SplitFileToArray($s_file, $n_split_lines)
    
    If FileExists($s_file) = 0 Then Return SetError(1, 0, 0)

    Local $s_read = FileRead($s_file)
    Local $a_sre = StringRegExp($s_read, "(?s)" & _
            "((?:.+?(?:\z|\r\n)){0," & $n_split_lines & "}|" & _
            "(?:.+?(?:\z|\n)){0," & $n_split_lines & "}|" & _
            "(?:.+?(?:\z|\r)){0," & $n_split_lines & "})", 3)
            
    If IsArray($a_sre) = 0 Then Return SetError(2, 0, 0)
    
    Local $n_ubound = UBound($a_sre) - 1
    ReDim $a_sre[$n_ubound]
    
    Return $a_sre
EndFunc

Edit:

Interesting enough. In StringRegExp using {number,number} says from left to right, minimum to maximum. Max here being n_split_lines. I decided to test the above. Seems that 789 is the max allowed muttley ... anything over that failed.

Edit2:

Doesn't seem to be an AutoIt limitation but a PCRE one, I just replicated it in another language as well, even changing the oveccount doesn't help.

Hey SmOke_N,

Very interesting, I don`t undestand StringRegExp very well, but I`ll check the documentation.

I actually thought of several ways to code it, but I was having a bad coding day. I could not concentrate on what I was supposed to be doing - so, I came to the forum. I had a hard time writing the code above as it was... I kept barfing the syntax.

If I hit another lull, I'll attempt to code it better/faster. I'm not a programmer by training or trade... so I might never arrive at the best approach and I'm not so good at error checking. Using the _FileReadToArray() UDF will handle the split better than the one line that I threw in. Take a look at the code. It is on line 175, in this file: C:\Program Files\AutoIt3\Include\File.au3 in a typical install of v3.2.12.0. You can also make use of the error checking that is built into that UDF.

If you decide to keep this part:

If $array_of_whole_file[0] = $linecnt Then

FileClose("test_" & $filecnt & ".txt")

Exit

EndIf

You can change it to:

If $array_of_whole_file[0] = $linecnt Then Exit

The help files says that AutoIt will close all open files upon exiting (but hints that it is better to code the close)... but a single line "If" is quicker than "If/EndIf". If you want to "code the close" - look in the help file for OnAutoItExit.

Lines like this:

$filecnt = $filecnt + 1

should be

$filecnt += 1

for speed... I just never remember that syntax.

I often wonder how those in this forum that are real programmers put up with seeing code like mine - perhaps it is thru shear moral fortitude that they don't go postal on me.

...later...

Edit: WYSI not WYG

I decided not to do anything new to the code, since my project is very big, I can`t lose time with performance issues right now, the most important is it working ,isn`t?

After everything finished I`ll return to improve that.

Thanks for your help, good luck with your codes too. Actually I m not a programmer by training too.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...