Jump to content
PureNewb

SPLIT .CSV

Recommended Posts

PureNewb

Hello,I have a very large .csv file that i want to split but not by lines or size(kb,mb...) like https://www.gdgsoft.com/gsplit/  but according to the starting character of lines.
I am a little newb on autoit and i tried  find something in forum but i didnt find something about that parametre splitting.
I found only that script in this post to splitting by lines but not good in my case,also i dont know how to work with StringRegExp so any help would be blessing.

#include <File.au3>


$sFilePath = @ScriptDir & "\test.csv"
SplitCSVtoNewFile($sFilePath, 50)

Func SplitCSVtoNewFile($sFilePath, $iLine)
    Local $aCSV_Lines = StringSplit(StringStripCR(FileRead($sFilePath)), @LF, $STR_CHRSPLIT)
    If @error Then Return SetError(1, 0, 0)
    Local $i = 1, $c = 1, $sLine, $hFile, $d = StringLen(UBound($aCSV_Lines))
    Do
        $sLine &= $aCSV_Lines[$i] & @CRLF
        If Not Mod($i, $iLine) Then
            $hFile = FileOpen(StringTrimRight($sFilePath, 4) & StringFormat("%0" & $d & "i", $c) & ".csv", $FO_OVERWRITE)
            FileWrite($hFile, StringTrimRight($sLine, 2))
            FileClose($hFile)
            $sLine = ""
            $c += 1
        EndIf
        $i += 1
    Until $i = $aCSV_Lines[0]
    If $sLine <> "" Then
        $hFile = FileOpen(StringTrimRight($sFilePath, 4) & StringFormat("%0" & $d & "i", $c) & ".csv", $FO_OVERWRITE)
        FileWrite($hFile, StringTrimRight($sLine, 2))
        FileClose($hFile)
    EndIf
EndFunc

 


EXAMPLE OF CSV FILE :

What i want is to split in to 2 files,the file of September and the second one the file of October.So the script should read every line and find all the lines that starts with the $variable of September,then copy those lines and create a file with that name(September) and "paste" them.Then the same again for the $variable of October.

.CSV File
September,1
September,2
September,3
September,4
September,5
September,6
September,7
September,8
September,9
September,10
September,11
September,12
September,13
September,14
September,15
September,16
September,17
September,18
September,19
September,20
September,21
September,22
September,23
September,24
September,25
September,26
September,27
September,28
September,29
September,30
October,1
October,2
October,3
October,4
October,5
October,6
October,7
October,8
October,9
October,10
October,11
October,12
October,13
October,14
October,15
October,16
October,17
October,18
October,19
October,20
October,21
October,22
October,23
October,24
October,25
October,26
October,27
October,28
October,29
October,30

EXAMPLE OF RESULT FILES

1) File "September.csv"
September,1
September,2
September,3
September,4
September,5
September,6
September,7
September,8
September,9
September,10
September,11
September,12
September,13
September,14
September,15
September,16
September,17
September,18
September,19
September,20
September,21
September,22
September,23
September,24
September,25
September,26
September,27
September,28
September,29
September,30

2) File "October.csv"
October,1
October,2
October,3
October,4
October,5
October,6
October,7
October,8
October,9
October,10
October,11
October,12
October,13
October,14
October,15
October,16
October,17
October,18
October,19
October,20
October,21
October,22
October,23
October,24
October,25
October,26
October,27
October,28
October,29
October,30

 

Edited by Melba23
Added code tags

Share this post


Link to post
Share on other sites
mikell

?

$sFilePath = @ScriptDir & "\test.csv"

SplitCSVtoNewFile($sFilePath, "September")

Func SplitCSVtoNewFile($sFilePath, $var)
    Local $content = FileRead($sFilePath)
    $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "")
    $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2)
    FileWrite($hFile, $new)
    FileClose($hFile)
EndFunc

This SRER deletes all lines which don't begin with the wanted starting characters  :)

Edited by mikell
  • Like 1

Share this post


Link to post
Share on other sites
PureNewb

AWESOME!
Thank u!

Share this post


Link to post
Share on other sites
Melba23

PureNewb,

When you post code or large text dumps please use Code tags - see here how to do it.  Then you get a scrolling box and syntax colouring as you can see above now I have added the tags.

M23

 


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
PureNewb
1 hour ago, Melba23 said:

PureNewb,

When you post code or large text dumps please use Code tags - see here how to do it.  Then you get a scrolling box and syntax colouring as you can see above now I have added the tags.

M23

 

ok

 

3 hours ago, mikell said:

?

$sFilePath = @ScriptDir & "\test.csv"

SplitCSVtoNewFile($sFilePath, "September")

Func SplitCSVtoNewFile($sFilePath, $var)
    Local $content = FileRead($sFilePath)
    $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "")
    $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2)
    FileWrite($hFile, $new)
    FileClose($hFile)
EndFunc

This SRER deletes all lines which don't begin with the wanted starting characters  :)

also can i get the list of starting characters(September,October...etc) somehow?

Share this post


Link to post
Share on other sites
mikell

Sure. Just extract all the first elements to an array and then use _ArrayUnique
The example below will work if these elements don't include a comma (which is the delimiter)

$content = FileRead($sFilePath)
$a = StringRegExp($content, '(?m)^[^,]+', 3)
$b = _ArrayUnique($a)
_ArrayDisplay($b)

 

2 hours ago, PureNewb said:

 please use Code tags

Melba, I'm sorry. I should have mentioned this

Share this post


Link to post
Share on other sites
czardas

Perhaps this might be useful: _ArrayToSubItemCSV()

It produces a 1D array of (2D) CSV results which can then be written to separate files.

Edit: looking again at your request; you stated that you only want to specify the starting character, so this won't work because it splits on exact full matches within the specified column. I'm not sure how you had intended to deal with january, June and July: which all begin with the same letter. My function would be overkill in this case because, to solve your problem, you only need to split the csv without the need to search for matching (out of sequence) entries. @mikell's solution is probably best.

Edited by czardas

Share this post


Link to post
Share on other sites
PureNewb
16 hours ago, mikell said:

Sure. Just extract all the first elements to an array and then use _ArrayUnique
The example below will work if these elements don't include a comma (which is the delimiter)

$content = FileRead($sFilePath)
$a = StringRegExp($content, '(?m)^[^,]+', 3)
$b = _ArrayUnique($a)
_ArrayDisplay($b)

 

Melba, I'm sorry. I should have mentioned this

oh thanks again,it seems everything easy for you :P

 

8 hours ago, czardas said:

Perhaps this might be useful: _ArrayToSubItemCSV()

It produces a 1D array of (2D) CSV results which can then be written to separate files.

Edit: looking again at your request; you stated that you only want to specify the starting character, so this won't work because it splits on exact full matches within the specified column. I'm not sure how you had intended to deal with january, June and July: which all begin with the same letter. My function would be overkill in this case because, to solve your problem, you only need to split the csv without the need to search for matching (out of sequence) entries. @mikell's solution is probably best.

thanks for replay but as you said @mikell solution matching exactly in my needs.about january concept i will put the whole word"january".

  • Like 1

Share this post


Link to post
Share on other sites
PureNewb
On 11/27/2017 at 6:21 PM, mikell said:

?

$sFilePath = @ScriptDir & "\test.csv"

SplitCSVtoNewFile($sFilePath, "September")

Func SplitCSVtoNewFile($sFilePath, $var)
    Local $content = FileRead($sFilePath)
    $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "")
    $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2)
    FileWrite($hFile, $new)
    FileClose($hFile)
EndFunc

This SRER deletes all lines which don't begin with the wanted starting characters  :)

Hello,and what if the csv file is 5gb and giving an error ram "Error allocating memory."?

Share this post


Link to post
Share on other sites
Earthshine

Get more ram. 😋

File editor’s face this problem too when files become too large. What you have to do is you have to kind of build it so it scrolls through the file chunks at a time not the whole file at once. 

Edited by Earthshine

My resources are limited. You must ask the right questions

 

Share this post


Link to post
Share on other sites
Earthshine

I like my buffered approach better but that’s just me. I’m sure you could read a chunk of it into an array process it read the next chalk process it


My resources are limited. You must ask the right questions

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×