Jump to content

Manipulating a CSV after import to array


Chimaera
 Share

Recommended Posts

I had a chat with the man and he says i cant save the data with excel as it damages phone numbers in the csv.. formatting etc

Another bit i dont understand is this bit

2009-09-03 00:00:00.000

So

dd/mm/yyyy hh:mm:ss.???  but what is the last set for milliseconds?

@czardas moving the width doesn't change it and after save it ends up 00.00.00 no date etc

Am i thinking about this the wrong way i wonder?

Shouldnt all items between the "," be treated as a string? then AutoIt would deal with it properly when imported.

Edited by Chimaera
Link to comment
Share on other sites

  • Replies 56
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Yes i opened it with excel and saved it etc and it didnt work for me

I used this in excel

dd/mm/yyyy hh:mm:ss.000

 

Do you mean with your code?

Edited by Chimaera
Link to comment
Share on other sites

works for me:

#include <excel.au3>
$oExcel = _ExcelBookNew()
_ExcelWriteCell ($oExcel, "2013-10-15 00:00:00.000", 1, 1)
MsgBox(1,1,1)
$oExcel.Columns("A:A").Select
$oExcel.Selection.NumberFormat = "m/d/yy;@"

excel representation of the 'date' before message box: 00:00.000

after messagebox

10/15/13

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

I had a chat with the man and he says i cant save the data with excel as it damages phone numbers in the csv.. formatting etc .........

.......... @czardas moving the width doesn't change it and after save it ends up 00.00.00 no date etc

 

I'm quite taken aback by these statements. If I were Microsoft this would be a huge embarrassment. They create all these amazing features but still have to make simple things so complicated.

When I use csv I always load the data into an array and run all queries, mathematical formulas etc... using AutoIt.

Link to comment
Share on other sites

I'm quite taken aback by these statements. If I were Microsoft this would be a huge embarrassment. They create all these amazing features but still have to make simple things so complicated.

When I use csv I always load the data into an array and run all queries, mathematical formulas etc... using AutoIt.

As i was trying to do with this Autoit shows the dates properly in the array window but they cant be used as the column to sort by

Im going to have to do a load of testing like the code jdelaney posted and also revisit the different scripts including yours to see if there are differences that can be exploited

If i can process the script using a different udf then i may get different results..

Time will tell

Link to comment
Share on other sites

I can write a date sort for you with that format. Give me an hour or two because I have to raid the supermarket first. ;)

Hang on a minute, what is the format? The array display image you posted looks ready to sort on column 0 without any formatting required at all.

As i was trying to do with this Autoit shows the dates properly in the array window but they cant be used as the column to sort by

 

Why not?

Edited by czardas
Link to comment
Share on other sites

Because when it accesses the rest of the code

It doesn't split the files if you select that column with the dates in

it works away for the sort of right amount of time then exits and there is no splitting of files other than the header

#Region ;**** Directives created by AutoIt3Wrapper_GUI ****
#AutoIt3Wrapper_res_requestedExecutionLevel=requireAdministrator
#AutoIt3Wrapper_Icon=compile\chimaera_red.ico
#AutoIt3Wrapper_Outfile=Simple CSV Split.exe
#AutoIt3Wrapper_UseUpx=n
#AutoIt3Wrapper_Run_Obfuscator=y
#EndRegion ;**** Directives created by AutoIt3Wrapper_GUI ****
#cs ----------------------------------------------------------------------------
    AutoIt Version: 3.3.8.0
    Author:         Chimaera
    Script Function: Simple CSV Splitter
#ce ----------------------------------------------------------------------------
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <StaticConstants.au3>

#include <Array.au3>
; ------------------------------------------------------------------------------
#AutoIt3Wrapper_AU3Check_Parameters=-d -w 1 -w 2 -w 3 -w- 4 -w 5 -w 6 -w- 7
; ------------------------------------------------------------------------------
Global $SplitButton, $OpenButton, $OpenLabel, $label1, $input1, $ColumnRead, $FilterRead, $label2, $input2, $sProgress
; ------------------------------------------------------------------------------
Local $GUI_Start = GUICreate(" Simple CSV Split", 500, 300, -1, -1, BitXOR($GUI_SS_DEFAULT_GUI, $WS_MINIMIZEBOX))

$OpenButton = GUICtrlCreateButton(" Open CSV ", 20, 20, 130, 35)
GUICtrlSetFont(-1, 11, 550, -1, "Tahoma")
$OpenLabel = GUICtrlCreateLabel("No File Selected", 160, 30, 350, 60, $SS_LEFT)
GUICtrlSetFont(-1, 11, 550, -1, "Tahoma")
$label1 = GUICtrlCreateLabel("Choose Column For Split", 20, 80, 240)
GUICtrlSetFont(-1, 11, 550, -1, "Tahoma")
$input1 = GUICtrlCreateInput("", 240, 77, 60, 25)
GUICtrlSetFont(-1, 11, 550, -1, "Tahoma")
$label2 = GUICtrlCreateLabel("Choose Column For Sort", 20, 120, 240)
GUICtrlSetFont(-1, 11, 550, -1, "Tahoma")
$input2 = GUICtrlCreateInput("", 240, 127, 60, 25)
GUICtrlSetFont(-1, 11, 550, -1, "Tahoma")
$sProgress = GUICtrlCreateProgress(20, 160, 460)
$SplitButton = GUICtrlCreateButton(" Split CSV ", 20, 220, 130, 35)
GUICtrlSetFont(-1, 11, 550, -1, "Tahoma")
GUISetState()

Local $nMsg = 0
While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit
        Case $OpenButton
            Local $sFileOpen = FileOpenDialog("Choose A File To Split.", @ScriptDir & "\", "Csv (*.csv)")
            Local $sSourceSize = FileGetSize( $sFileOpen)
;~          ConsoleWrite($sSourceSize & @crlf)
            Local $sFileSelect = _GetFileName( $sFileOpen)
            Local $sFolderName = StringTrimRight( $sFileSelect, 4)
            If @error = 1 Or $sFileSelect = "" Then
                MsgBox(48, "Cancelled", "Cancelled By User", 4)
                Exit
            Else
                GUICtrlSetData( $OpenLabel, $sFileSelect)
;~              MsgBox(48, "", $sFolderName, 4)
                Global $MainArray = _ParseCSV($sFileOpen, ",")
                _ArrayDisplay($MainArray)
            EndIf
        Case $SplitButton
            $ColumnRead = GUICtrlRead($input1) + 1
            $FilterRead = GUICtrlRead($input2)
            _IsolateSupplier()
    EndSwitch
WEnd

ConsoleWrite(_GetFileName(@ScriptFullPath) & @CRLF)

; Get the filename including the extension of a filepath. Idea from _PathSplit.
Func _GetFileName($sFilePath)
    Return StringTrimLeft($sFilePath, StringInStr($sFilePath, "\", 2, -1))
EndFunc   ;==>_GetFileName

Func _IsolateSupplier()
    Global $aSuppliers, $FindSupplier, $sDestFolder
;~      Local $sDestSize = DirGetSize( $sDestFolder)
        Local $sDestSize = $sSourceSize  * 1.20
        Local $ipct = Int(($sSourceSize / $sDestSize) * 100)
;~      ConsoleWrite($ipct & @CRLF)
    If $ColumnRead <> "" And $FilterRead <> "" Then
        _ArraySort($MainArray, Default, Default, Default, $FilterRead) ; sort by column after split
;~      _ArrayDisplay($MainArray)
        $aSuppliers = _ArrayUnique($MainArray, $ColumnRead)
;~      _ArrayDisplay($aSuppliers)
        $sDestFolder = DirCreate(@ScriptDir & "\" & $sFolderName)
        If IsArray($aSuppliers) Then
            For $i = 1 To $aSuppliers[0]
                If IsArray($aSuppliers) Then
                    $FindSupplier = _IsolateRecord($i)
                    Sleep(100)
                EndIf
            Next
        Else
            MsgBox(64, "Error", "No Cells to Process")
        EndIf
    ElseIf $ColumnRead <> "" And $FilterRead = "" Then
        $aSuppliers = _ArrayUnique($MainArray, $ColumnRead)
;~      _ArrayDisplay($aSuppliers)
        $sDestFolder = DirCreate(@ScriptDir & "\" & $sFolderName)
        If IsArray($aSuppliers) Then
            For $i = 1 To $aSuppliers[0]
                If IsArray($aSuppliers) Then
                    $FindSupplier = _IsolateRecord($i)
                        GUICtrlSetData($sProgress, $ipct)
                        ConsoleWrite($ipct & @CRLF)
                    Sleep(100)
                EndIf
            Next
        Else
            MsgBox(64, "Error", "No Cells to Process")
        EndIf
    EndIf
    MsgBox(64, "All Done", "Splitting Finished")
EndFunc   ;==>_IsolateSupplier

Func _IsolateRecord($i)
    Local $FoundRecords = _ArrayFindAll($MainArray, $aSuppliers[$i], 0, 0, 0, 0, $ColumnRead - 1)
    Local $NewArray[UBound($FoundRecords)][UBound($MainArray, 2)]
    For $x = 0 To UBound($FoundRecords) - 1
        For $y = 0 To UBound($MainArray, 2) - 1
            $NewArray[$x][$y] = $MainArray[$FoundRecords[$x]][$y]
        Next
;~      _ArrayDisplay($NewArray)
        _WriteCSV(@ScriptDir & "\" & $sFolderName & "\" & $aSuppliers[$i] & ".csv", $NewArray, ",")
    Next
EndFunc   ;==>_IsolateRecord

Func _ParseCSV($sFile, $sDelimiters = ',;', $sQuote = '"', $iFormat = 0)
    Local Static $aEncoding[6] = [0, 0, 32, 64, 128, 256]
    If $iFormat < -1 Or $iFormat > 6 Then
        Return SetError(3, 0, 0)
    ElseIf $iFormat > -1 Then
        Local $hFile = FileOpen($sFile, $aEncoding[$iFormat]), $sLine, $aTemp, $aCSV[1], $iReserved, $iCount
        #forceref  $sLine, $aTemp, $aCSV, $iReserved, $iCount
        If @error Then Return SetError(1, @error, 0)
        $sFile = FileRead($hFile)
        FileClose($hFile)
    EndIf
    If $sDelimiters = "" Or IsKeyword($sDelimiters) Then $sDelimiters = ',;'
    If $sQuote = "" Or IsKeyword($sQuote) Then $sQuote = '"'
    $sQuote = StringLeft($sQuote, 1)
    Local $srDelimiters = StringRegExpReplace($sDelimiters, '[\\\^\-\[\]]', '\\\0')
    Local $srQuote = StringRegExpReplace($sQuote, '[\\\^\-\[\]]', '\\\0')
    Local $sPattern = StringReplace(StringReplace('(?m)(?:^|[,])\h*(["](?:[^"]|["]{2})*["]|[^,\r\n]*)(\v+)?', ',', $srDelimiters, 0, 1), '"', $srQuote, 0, 1)
    Local $aREgex = StringRegExp($sFile, $sPattern, 3)
    If @error Then Return SetError(2, @error, 0)
    $sFile = '' ; save memory
    Local $iBound = UBound($aREgex), $iIndex = 0, $iSubBound = 1, $iSub = 0
    Local $aResult[$iBound][$iSubBound]
    For $i = 0 To $iBound - 1
        Select
            Case StringLen($aREgex[$i]) < 3 And StringInStr(@CRLF, $aREgex[$i])
                $iIndex += 1
                $iSub = 0
                ContinueLoop
            Case StringLeft(StringStripWS($aREgex[$i], 1), 1) = $sQuote
                $aREgex[$i] = StringStripWS($aREgex[$i], 3)
                $aResult[$iIndex][$iSub] = StringReplace(StringMid($aREgex[$i], 2, StringLen($aREgex[$i]) - 2), $sQuote & $sQuote, $sQuote, 0, 1)
            Case Else
                $aResult[$iIndex][$iSub] = $aREgex[$i]
        EndSelect
        $aREgex[$i] = 0 ; save memory
        $iSub += 1
        If $iSub = $iSubBound Then
            $iSubBound += 1
            ReDim $aResult[$iBound][$iSubBound]
        EndIf
    Next
    If $iIndex = 0 Then $iIndex = 1
    ReDim $aResult[$iIndex][$iSubBound]
    Return $aResult
EndFunc   ;==>_ParseCSV

Func _WriteCSV($sFile, Const ByRef $aData, $sDelimiter = ',', $sQuote = '"', $iFormat = 0)
    Local Static $aEncoding[6] = [2, 2, 34, 66, 130, 258]
    If $sDelimiter = "" Or IsKeyword($sDelimiter) Then $sDelimiter = ','
    If $sQuote = "" Or IsKeyword($sQuote) Then $sQuote = '"'
    Local $iBound = UBound($aData, 1), $iSubBound = UBound($aData, 2)
    If Not $iSubBound Then Return SetError(2, 0, 0)
    Local $hFile = FileOpen($sFile, $aEncoding[$iFormat])
    If @error Then Return SetError(2, @error, 0)
    For $i = 0 To $iBound - 1
        For $j = 0 To $iSubBound - 1
            FileWrite($hFile, $sQuote & StringReplace($aData[$i][$j], $sQuote, $sQuote & $sQuote, 0, 1) & $sQuote)
            If $j < $iSubBound - 1 Then FileWrite($hFile, $sDelimiter)
        Next
        FileWrite($hFile, @CRLF)
    Next
    FileClose($hFile)
    Return True
EndFunc   ;==>_WriteCSV

Here is the latest version

And a test file

Run the program and select the csv and put zero "0" in the choose column for split box and split it and see what happens for you

Then run it with "1" for the next column and it will work it takes a short while...

Its a work in progress so the progress doesnt work right but you will be able to get the gist

Im beginning to suspect it may have to more to do with the udf's than the excel stuff thats why i need to test different ones

Edited by Chimaera
Link to comment
Share on other sites

Okay I just got back from the supermarket. I figured out the problem while I was shopping. I checked when I got back I was right. The problem is with saving the files. The timestamp uses colon in the format and this character is not allowed in file names.

---------------------------
msgbox
---------------------------
C:Documents and SettingsNickDesktopchimaeraJH_SQL_Nom Acc2009-06-12 00:00:00.000.csv
---------------------------
OK   
---------------------------
 

You need to change the timestamp format to get this to work, however I advise the timestamp be omitted from the split: simply because you will end up with a stupidly large number of csv files containing only one entry. At the moment they are all 00:00:00:000 so the timestamp is irrelevant anyway, but I believe this is an artificial sample. I will now look at the code.

BTW I think it's a good idea of yours and worthwhile spending time on, so I'll try to produce something we can both make use of (this weekend). :)

Edited by czardas
Link to comment
Share on other sites

The timestamp is needed because of tracking down irregularities and pinpointing it to a time and person

thats what they use it for in the checking, i am trying to simplyfy things so he can break it down and chose one particular set of info

per company name or per operator or per timeslot etc etc

and you would end up with 1 file with lots of entrys until the time changes, they use the milliseconds because there can be hundreds? per second transactions

my test file is only a very small taster, i believe some of these csv goto 500,000 lines maybe

Edited by Chimaera
Link to comment
Share on other sites

500,000 x 5 columns is 2.5 million array elements. This will require some testing.

The timestamp can not be used as a file name unless you replace the colon delimiter ( : ) with a compatible file name character. There is no choice in this.

BTW 500,000 miliseconds is just over 8 minutes. I think rounding to the nearest including all transctions per second would make more sense, but still too small a sample (per minute, hour or  day maybe). You don't want to create 500,000 csv files (I think). :think: If you have a lot of transactions at 000 miliseconds then maybe I'm wrong (seems strange). Even at intervals of 1 second with an average of 100 transactions per second, you will be creating 5,000 csv files. You could limit this using a time frame.

I suggest sorting the new arrays by the milisecond time stamp after you split the csv into reasonable time periods.

Edited by czardas
Link to comment
Share on other sites

The transactions are within a MySql dbase and the csv is created from the dbase when its needed maybe days after the problem has occured.

The split only occurs to narrow down the data

Link to comment
Share on other sites

Have a play with this. I have to teach guitar right now so any questions will be answered later. Tested on 500,000 rows similar to the csv you posted. It seems to be working. More work is needed on the time stamp. ATM it just kicks out an error.

Before running the script you will need to replace JH_SQL_Nom Acc.csv with your own csv file. Place the csv file in the same folder as the script.

;

#include <Array.au3>

Local $sFilePath = @ScriptDir & "\JH_SQL_Nom Acc.csv"
Local $iColumn = 1 ; Columns are zero based
Local $sDelimiter = ","
Local $bColHeader = True

Local $hFile = FileOpen($sFilePath)
If $hFile = -1 Then
    MsgBox(0, "", "Unable to open file")
    Exit
EndIf

Local $sCSV = FileRead($hFile)
If @error Then
    MsgBox(0, "", "Unable to read file")
    FileClose($hFile)
    Exit
EndIf
FileClose($hFile)

Local $aCSV = _CSVSplit($sCSV) ; Create the main array

; Create an array of csv strings
Local $aSubItemCSV = _ArrayToSubItemCSV($aCSV, $iColumn, $sDelimiter, $bColHeader)

; Now we write each csv to file
Local $sFolderName = StringTrimRight(StringReplace($sFilePath, @ScriptDir & "\", ""), 4)
Local $sNewPath, $iSuccess

For $i = 0 To UBound($aSubItemCSV) -1
    If StringRegExp($aSubItemCSV[$i][0], '[\/\?<>\\\:\|"]') Then
        MsgBox(0, "Invalid file name", "You can not use the following characters in a file name" & @CRLF & '/ ? < > \ : | "')
        Exit
    EndIf
    $sNewPath = @ScriptDir & "\" & $sFolderName & "\" & $aSubItemCSV[$i][0] & ".csv"
    $hFile = FileOpen($sNewPath, BitOr(8,1))
    If $hFile = -1 Then
        MsgBox(0, "", "Unable to open file")
        Exit
    EndIf

    $iSuccess = FileWrite($hFile, $aSubItemCSV[$i][1])
    If $iSuccess = 0 Then
        MsgBox(0, "", "Unable to write to file")
        Exit
    EndIf
Next


; Function name ==> _ArrayToSubItemCSV
; Returns an array with two columns
; The first column contains the unique item name from the input criteria (see param 2 below)
; The second column contains the csv data ready to write to file

; param 1 = Array to parse (created from a csv)
; param 2 = Input array column used to select unique items from
; param 3 = delimiter for the returned csv strings (default = comma)
; param 4 = Include csv column headers

Func _ArrayToSubItemCSV($aCSV, $iCol, $sDelim = ",", $bHeaders = False)
    If Not IsArray($aCSV) Or UBound($aCSV, 0) <> 2 Then Return SetError(1) ; Incorrect Array

    Local $iBound = UBound($aCSV), $iNumCols = UBound($aCSV, 2)
    If $iBound = 1 Then Return SetError(2) ; Nothing to parse
    If $iCol > $iNumCols Then Return SetError(3) ; $iCol is out of range

    Local $aTempCSV[1][$iNumCols], $iTempIndex, $sTestItem, $iStart = 0

    If $bHeaders Then
        If $iBound = 2 Then Return SetError(2) ; Nothing to parse
        $iStart = 1
    EndIf
    _ArraySort($aCSV, 0, $iStart, 0, $iCol) ; Sort on the selected column

    Local $aSubItemCSV[$iBound][2], $iItems = 0, $sTestItem = Not $aCSV[$iBound -1][$iCol]

    For $i = $iBound -1 To $iStart Step -1
        If $sTestItem <> $aCSV[$i][$iCol] Then ; Start a new csv instance
            If $iItems > 0 Then ; Write to main array
                ReDim $aTempCSV[$iTempIndex][$iNumCols]
                $aSubItemCSV[$iItems -1][0] = $sTestItem
                $aSubItemCSV[$iItems -1][1] = _ArrayToCSV($aTempCSV, $sDelim, @CRLF)
            EndIf

            ReDim $aTempCSV[$iBound][$iNumCols] ; Create new csv template
            $iTempIndex = 0
            $sTestItem = $aCSV[$i][$iCol]

            If $bHeaders Then
                For $j = 0 To $iNumCols -1
                    $aTempCSV[0][$j] = $aCSV[0][$j]
                Next
                $iTempIndex = 1
            EndIf
            $iItems += 1
        EndIf

        For $j = 0 To $iNumCols -1 ; Continue writing to csv
            $aTempCSV[$iTempIndex][$j] = $aCSV[$i][$j]
        Next
        $iTempIndex += 1
    Next
    ReDim $aTempCSV[$iTempIndex][$iNumCols]
    $aSubItemCSV[$iItems -1][0] = $sTestItem
    $aSubItemCSV[$iItems -1][1] = _ArrayToCSV($aTempCSV, $sDelim, @CRLF)

    ReDim $aSubItemCSV[$iItems][2]
    Return $aSubItemCSV
EndFunc

; #FUNCTION# ============================================================================
; Name...........: _CSVSplit
; Description ...: Converts a string in CSV format to a two dimensional array (see comments)
; Syntax.........: _ArrayToCSV ( $aArray [, $sDelim ] )
; Parameters ....: $aArray  - The array to convert
;                  $sDelim  - Optional - Delimiter set to comma by default (see 2nd comment)
; Return values .: Success  - Returns a two dimensional array or a one dimensional array (see 1st comment)
;                  Failure  - Sets @error to:
;                 |@error = 1 - First parameter is not a valid string
;                 |@error = 2 - Second parameter is not a valid string
;                 |@error = 3 - Could not find suitable delimiter replacements
; Author ........: czardas
; Comments ......; Returns a one dimensional array if the input string does not contain the delimiter string
;                ; Some CSV formats use semicolon as a delimiter instead of a comma
;                ; Set the second parameter to @TAB To convert to TSV
; ========================================================================================

Func _CSVSplit($string, $sDelim = ",") ; Parses csv string input and returns a one or two dimensional array
    If (Not IsString($string)) Or ($string = "") Then Return SetError(1, 0, 0) ; Invalid string
    If (Not IsString($sDelim)) Or ($sDelim = "") Then Return SetError(2, 0, 0) ; Invalid string

    $string = StringRegExpReplace($string, "[(\r\n)\r\n]+\z", "") ; [Line Added] Remove training breaks
    Local $iOverride = 255, $asDelim[3] ; $asDelim => replacements for comma, new line and double quote
    For $i = 0 To 2
        $asDelim[$i] = _GetSubstitute($string, $iOverride, $sDelim) ; Choose a suitable substitution character
        If @error Then Return SetError(3, 0, 0) ; String contains too much variety
    Next
    $iOverride = 0

    Local $aArray = StringRegExp($string, '\A[^"]+|("+[^"]+)|"+\z', 3) ; Split string using double quotes delim ... largest match
    $string = ""

    Local $iBound = UBound($aArray)
    For $i = 0 To $iBound -1
        $iOverride += StringInStr($aArray[$i], '"', 0, -1) ; Increment by the number of adjacent double quotes per element
        If Mod ($iOverride +2, 2) = 0 Then ; Acts as an on/off switch
            $aArray[$i] = StringReplace($aArray[$i], $sDelim, $asDelim[0]) ; Replace comma delimeters
            $aArray[$i] = StringRegExpReplace($aArray[$i], "(\r\n)|[\r\n]", $asDelim[1]) ; Replace new line delimeters
        EndIf
        $aArray[$i] = StringReplace($aArray[$i], '""', $asDelim[2]) ; Replace double quote pairs
        $aArray[$i] = StringReplace($aArray[$i], '"', '') ; Delete enclosing double quotes - not paired
        $aArray[$i] = StringReplace($aArray[$i], $asDelim[2], '"') ; Reintroduce double quote pairs as single characters
        $string &= $aArray[$i] ; Rebuild the string, which includes two different delimiters
    Next
    $iOverride = 0

    $aArray = StringSplit($string, $asDelim[1], 2) ; Split to get rows
    $iBound = UBound($aArray)
    Local $aCSV[$iBound][2], $aTemp
    For $i = 0 To $iBound -1
        $aTemp = StringSplit($aArray[$i], $asDelim[0]) ; Split to get row items
        If Not @error Then
            If $aTemp[0] > $iOverride Then
                $iOverride = $aTemp[0]
                ReDim $aCSV[$iBound][$iOverride] ; Add columns to accomodate more items
            EndIf
        EndIf
        For $j = 1 To $aTemp[0]
            If StringLen($aTemp[$j]) Then
                If Not StringRegExp($aTemp[$j], '[^"]') Then ; Field only contains double quotes
                    $aTemp[$j] = StringTrimLeft($aTemp[$j], 1) ; Delete enclosing double quote single char
                EndIf
                $aCSV[$i][$j -1] = $aTemp[$j] ; Populate each row
            EndIf
        Next
    Next

    If $iOverride > 1 Then
        Return $aCSV ; Multiple Columns
    Else
        For $i = 0 To $iBound -1
            If StringLen($aArray[$i]) And (Not StringRegExp($aArray[$i], '[^"]')) Then ; Only contains double quotes
                $aArray[$i] = StringTrimLeft($aArray[$i], 1) ; Delete enclosing double quote single char
            EndIf
        Next
        Return $aArray ; Single column
    EndIf
EndFunc ;==> _CSVSplit


; #FUNCTION# ============================================================================
; Name...........: _ArrayToCSV
; Description ...: Converts a two dimensional array to CSV format
; Syntax.........: _ArrayToCSV ( $aArray [, $sDelim [, $sNewLine ]] )
; Parameters ....: $aArray  - The array to convert
;                  $sDelim  - Optional - Delimiter set to comma by default (see comments)
;                  $sNewLine  - Optional - New Line set to @LF by default (see comments)
; Return values .: Success  - Returns a string in CSV format
;                  Failure  - Sets @error to:
;                 |@error = 1 - First parameter is not a valid array
;                 |@error = 2 - Second parameter is not a valid string
;                 |@error = 3 - Third parameter is not a valid string
; Author ........: czardas
; Comments ......; One dimensional arrays are returned as multiline text (without delimiters)
;                ; Some users may need to set the second parameter to semicolon to return the prefered CSV format
;                ; To convert to TSV use @TAB for the second parameter
;                ; Some users may wish to set the third parameter to @CRLF
; ========================================================================================

Func _ArrayToCSV($aArray, $sDelim = ",", $sNewLine = @LF)
    If (Not IsArray($aArray)) Or (Ubound($aArray, 0) > 2) Then Return SetError(1, 0 ,"")
    If Not IsString($sDelim) Then Return SetError(2, 0 ,"")
    If Not IsString($sNewLine) Then Return SetError(3, 0 ,"")

    Local $iRows = UBound($aArray), $string = ""
    If Ubound($aArray, 0) = 2 Then ; Check if the array has two dimensions
        Local $iCols = UBound($aArray, 2)
        For $i = 0 To $iRows -1
            For $j = 0 To $iCols -1
                If StringRegExp($aArray[$i][$j], '["' & $sDelim & ']') Then
                    $aArray[$i][$j] = '"' & StringReplace($aArray[$i][$j], '"', '""') & '"'
                EndIf
                $string &= $aArray[$i][$j] & $sDelim
            Next
            $string = StringTrimRight($string, StringLen($sDelim)) & $sNewLine
        Next
    Else ; The delimiter is not needed
        For $i = 0 To $iRows -1
            If StringInStr($aArray[$i], '"') Then
                $aArray[$i] = '"' & StringReplace($aArray[$i], '"', '""') & '"'
            EndIf
            $string &= $aArray[$i] & $sNewLine
        Next
    EndIf
    Return StringTrimRight($string, StringLen($sNewLine)) ; Delete any newline characters added to the end of the string
EndFunc ;==> _ArrayToCSV


; #FUNCTION# ============================================================================
; Name...........: _GetSubstitute
; Description ...: Searches for an AscII char to be used for substitution, ie one not contained within the string
; Syntax.........: _GetSubstitute($string, ByRef $iCountdown)
; Parameters ....: $string   - The string of characters to avoid
;                  $iCountdown - The AscII number to begin checking => Best set to 255 on the first run
;                  $sAvoid   - Optional string of characters to avoid
; Return values .: Success   - Returns a suitable substitution character not found within the first parameter
;                  Failure   - Sets @error to 1 => No substitution character available
; Author ........: czardas
; Comments ......; This function is connected to the function _CSVSplit and was not intended for general use
;                ; $iCountdown is returned ByRef to avoid selecting the same character on subsequent calls
; ========================================================================================

Func _GetSubstitute($string, ByRef $iCountdown, $sAvoid = "")
    If $iCountdown < 1 Then Return SetError(1, 0, "") ; Out of options
    Local $sTestChar
    For $i = $iCountdown To 1 Step -1
        $sTestChar = Chr($i)
        $iCountdown -= 1
        If Not StringInStr($string, $sTestChar, 2) Then ; Some characters may interfere with parsing =>
            If ($i = 34) Or ($i = 13) Or ($i = 10) Or StringInStr($sAvoid, $sTestChar) Then ContinueLoop
            Return $sTestChar
        EndIf
    Next
    Return SetError(1, 0, "") ; Out of options
EndFunc ;==> _GetSubstitute
Edited by czardas
Link to comment
Share on other sites

The above function _ArrayToSubItemCSV() returns an array, which allows you to test if each file name is valid and make replacements where necessary before writing files. This also allows you to select exactly which CSV files you wish to write, for example all those occuring between specific time periods. It seems reasonably fast and the option to include column headers is definately useful.

With your timestamp issue, you could add an additional column to the original csv array using Redim. Fill that column with a trimmed down version of the timestamp. Trim off the last 4 characters to get intervals of one second, 7 characters to get minute by minute results etc... Then use the added column with _ArrayToSubItemCSV() to create the split. Delete the added column using Redim after the splitting has occured. For that you will need to create an array using _CSVSplit() for each new CSV formatted string before converting back to CSV format with _ArrayToCSV() and then finally write the results to file. This is specific (only) to your project, and just one of several possible solutions: such as modifying this line to suit your needs (preferable because it would be much faster).

;

If $sTestItem <> $aCSV[$i][$iCol] Then ; Start a new csv instance

;

Replacing the colon with underscore (in the new column) before splitting will solve the file naming problem. Alternatively just use a different delimiter in the first column. You can always change it back again later. However milisecond intervals will produce a very large number of CSV formatted strings so I recommend you add the extra column or modify the code, as suggested in the paragraph above. You have to decide how you want to handle this.

I have not tested ProgAndy's CSV functions on 500,000 rows with 5 columns in each row. You can give that a try. How you create the array is up to you. Parsing the large CSV took about 45 seconds using _CSVSplit() with 2GB RAM.

_ArrayToSubItemCSV() is not case sensitive and uses @CRLF for line breaks in the newly created CSV strings.


If I knew more about databases, I would probably tell you this is all unecessary because you can use some amazing command to create a batch of csv files. Anyway I don't know that. :shhh:

Edited by czardas
Link to comment
Share on other sites

That is one quick function m8

Ill need to read up and digest it to understand what it does compared to the original one

Many thx

Just for info - The dbases are not tweakable so it has to be done after the fact like we are looking at

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...