Jump to content

SPLIT .CSV


PureNewb
 Share

Recommended Posts

Hello,I have a very large .csv file that i want to split but not by lines or size(kb,mb...) like https://www.gdgsoft.com/gsplit/  but according to the starting character of lines.
I am a little newb on autoit and i tried  find something in forum but i didnt find something about that parametre splitting.
I found only that script in this post to splitting by lines but not good in my case,also i dont know how to work with StringRegExp so any help would be blessing.

#include <File.au3>


$sFilePath = @ScriptDir & "\test.csv"
SplitCSVtoNewFile($sFilePath, 50)

Func SplitCSVtoNewFile($sFilePath, $iLine)
    Local $aCSV_Lines = StringSplit(StringStripCR(FileRead($sFilePath)), @LF, $STR_CHRSPLIT)
    If @error Then Return SetError(1, 0, 0)
    Local $i = 1, $c = 1, $sLine, $hFile, $d = StringLen(UBound($aCSV_Lines))
    Do
        $sLine &= $aCSV_Lines[$i] & @CRLF
        If Not Mod($i, $iLine) Then
            $hFile = FileOpen(StringTrimRight($sFilePath, 4) & StringFormat("%0" & $d & "i", $c) & ".csv", $FO_OVERWRITE)
            FileWrite($hFile, StringTrimRight($sLine, 2))
            FileClose($hFile)
            $sLine = ""
            $c += 1
        EndIf
        $i += 1
    Until $i = $aCSV_Lines[0]
    If $sLine <> "" Then
        $hFile = FileOpen(StringTrimRight($sFilePath, 4) & StringFormat("%0" & $d & "i", $c) & ".csv", $FO_OVERWRITE)
        FileWrite($hFile, StringTrimRight($sLine, 2))
        FileClose($hFile)
    EndIf
EndFunc

 


EXAMPLE OF CSV FILE :

What i want is to split in to 2 files,the file of September and the second one the file of October.So the script should read every line and find all the lines that starts with the $variable of September,then copy those lines and create a file with that name(September) and "paste" them.Then the same again for the $variable of October.

.CSV File
September,1
September,2
September,3
September,4
September,5
September,6
September,7
September,8
September,9
September,10
September,11
September,12
September,13
September,14
September,15
September,16
September,17
September,18
September,19
September,20
September,21
September,22
September,23
September,24
September,25
September,26
September,27
September,28
September,29
September,30
October,1
October,2
October,3
October,4
October,5
October,6
October,7
October,8
October,9
October,10
October,11
October,12
October,13
October,14
October,15
October,16
October,17
October,18
October,19
October,20
October,21
October,22
October,23
October,24
October,25
October,26
October,27
October,28
October,29
October,30

EXAMPLE OF RESULT FILES

1) File "September.csv"
September,1
September,2
September,3
September,4
September,5
September,6
September,7
September,8
September,9
September,10
September,11
September,12
September,13
September,14
September,15
September,16
September,17
September,18
September,19
September,20
September,21
September,22
September,23
September,24
September,25
September,26
September,27
September,28
September,29
September,30

2) File "October.csv"
October,1
October,2
October,3
October,4
October,5
October,6
October,7
October,8
October,9
October,10
October,11
October,12
October,13
October,14
October,15
October,16
October,17
October,18
October,19
October,20
October,21
October,22
October,23
October,24
October,25
October,26
October,27
October,28
October,29
October,30

 

Edited by Melba23
Added code tags
Link to comment
Share on other sites

?

$sFilePath = @ScriptDir & "\test.csv"

SplitCSVtoNewFile($sFilePath, "September")

Func SplitCSVtoNewFile($sFilePath, $var)
    Local $content = FileRead($sFilePath)
    $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "")
    $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2)
    FileWrite($hFile, $new)
    FileClose($hFile)
EndFunc

This SRER deletes all lines which don't begin with the wanted starting characters  :)

Edited by mikell
Link to comment
Share on other sites

  • Moderators

PureNewb,

When you post code or large text dumps please use Code tags - see here how to do it.  Then you get a scrolling box and syntax colouring as you can see above now I have added the tags.

M23

 

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

1 hour ago, Melba23 said:

PureNewb,

When you post code or large text dumps please use Code tags - see here how to do it.  Then you get a scrolling box and syntax colouring as you can see above now I have added the tags.

M23

 

ok

 

3 hours ago, mikell said:

?

$sFilePath = @ScriptDir & "\test.csv"

SplitCSVtoNewFile($sFilePath, "September")

Func SplitCSVtoNewFile($sFilePath, $var)
    Local $content = FileRead($sFilePath)
    $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "")
    $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2)
    FileWrite($hFile, $new)
    FileClose($hFile)
EndFunc

This SRER deletes all lines which don't begin with the wanted starting characters  :)

also can i get the list of starting characters(September,October...etc) somehow?

Link to comment
Share on other sites

Sure. Just extract all the first elements to an array and then use _ArrayUnique
The example below will work if these elements don't include a comma (which is the delimiter)

$content = FileRead($sFilePath)
$a = StringRegExp($content, '(?m)^[^,]+', 3)
$b = _ArrayUnique($a)
_ArrayDisplay($b)

 

2 hours ago, PureNewb said:

 please use Code tags

Melba, I'm sorry. I should have mentioned this

Link to comment
Share on other sites

Perhaps this might be useful: _ArrayToSubItemCSV()

It produces a 1D array of (2D) CSV results which can then be written to separate files.

Edit: looking again at your request; you stated that you only want to specify the starting character, so this won't work because it splits on exact full matches within the specified column. I'm not sure how you had intended to deal with january, June and July: which all begin with the same letter. My function would be overkill in this case because, to solve your problem, you only need to split the csv without the need to search for matching (out of sequence) entries. @mikell's solution is probably best.

Edited by czardas
Link to comment
Share on other sites

16 hours ago, mikell said:

Sure. Just extract all the first elements to an array and then use _ArrayUnique
The example below will work if these elements don't include a comma (which is the delimiter)

$content = FileRead($sFilePath)
$a = StringRegExp($content, '(?m)^[^,]+', 3)
$b = _ArrayUnique($a)
_ArrayDisplay($b)

 

Melba, I'm sorry. I should have mentioned this

oh thanks again,it seems everything easy for you :P

 

8 hours ago, czardas said:

Perhaps this might be useful: _ArrayToSubItemCSV()

It produces a 1D array of (2D) CSV results which can then be written to separate files.

Edit: looking again at your request; you stated that you only want to specify the starting character, so this won't work because it splits on exact full matches within the specified column. I'm not sure how you had intended to deal with january, June and July: which all begin with the same letter. My function would be overkill in this case because, to solve your problem, you only need to split the csv without the need to search for matching (out of sequence) entries. @mikell's solution is probably best.

thanks for replay but as you said @mikell solution matching exactly in my needs.about january concept i will put the whole word"january".

Link to comment
Share on other sites

  • 1 month later...
On 11/27/2017 at 6:21 PM, mikell said:

?

$sFilePath = @ScriptDir & "\test.csv"

SplitCSVtoNewFile($sFilePath, "September")

Func SplitCSVtoNewFile($sFilePath, $var)
    Local $content = FileRead($sFilePath)
    $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "")
    $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2)
    FileWrite($hFile, $new)
    FileClose($hFile)
EndFunc

This SRER deletes all lines which don't begin with the wanted starting characters  :)

Hello,and what if the csv file is 5gb and giving an error ram "Error allocating memory."?

Link to comment
Share on other sites

Get more ram. 😋

File editor’s face this problem too when files become too large. What you have to do is you have to kind of build it so it scrolls through the file chunks at a time not the whole file at once. 

Edited by Earthshine

My resources are limited. You must ask the right questions

 

Link to comment
Share on other sites

  • Developers

Just change the process to read a record at a time. This will take longer but will be independent of the filesize.

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...