czardas

CSVSplit

30 posts in this topic

#1 ·  Posted (edited)

This UDF introduces one or two small tweaks to the functions I posted in Snippet Dump (nothing major). Terminating the CSV with a break is now the default behaviour (although I think it's just a lazy convention). The inclusion of a new function _ArrayToSubItemCSV() got me excited. Thanks for an awesome idea Chimaera. :) I have included the option to sort the returned CSV formatted strings ascending using any column.

;

#include-once
#include <Array.au3>

; #INDEX# =======================================================================================================================
; Title .........: CSVSplit
; AutoIt Version : 3.3.8.1
; Language ......: English
; Description ...: CSV related functions
; Notes .........: CSV format does not have a general standard format, however these functions allow some flexibility.
;                  The default behaviour of the functions applies to the most common formats used in practice.
; Author(s) .....: czardas
; ===============================================================================================================================

; #CURRENT# =====================================================================================================================
;_ArrayToCSV
;_ArrayToSubItemCSV
;_CSVSplit
; ===============================================================================================================================

; #INTERNAL_USE_ONLY#============================================================================================================
; __GetSubstitute
; ===============================================================================================================================

; #FUNCTION# ====================================================================================================================
; Name...........: _ArrayToCSV
; Description ...: Converts a two dimensional array to CSV format
; Syntax.........: _ArrayToCSV ( $aArray [, $sDelim [, $sNewLine [, $bFinalBreak ]]] )
; Parameters ....: $aArray      - The array to convert
;                  $sDelim      - Optional - Delimiter set to comma by default (see comments)
;                  $sNewLine    - Optional - New Line set to @LF by default (see comments)
;                  $bFinalBreak - Set to true in accordance with common practice => CSV Line termination
; Return values .: Success  - Returns a string in CSV format
;                  Failure  - Sets @error to:
;                 |@error = 1 - First parameter is not a valid array
;                 |@error = 2 - Second parameter is not a valid string
;                 |@error = 3 - Third parameter is not a valid string
;                 |@error = 4 - 2nd and 3rd parameters must be different characters
; Author ........: czardas
; Comments ......; One dimensional arrays are returned as multiline text (without delimiters)
;                ; Some users may need to set the second parameter to semicolon to return the prefered CSV format
;                ; To convert to TSV use @TAB for the second parameter
;                ; Some users may wish to set the third parameter to @CRLF
; ===============================================================================================================================

Func _ArrayToCSV($aArray, $sDelim = Default, $sNewLine = Default, $bFinalBreak = True)
    If Not IsArray($aArray) Or Ubound($aArray, 0) > 2 Or Ubound($aArray) = 0 Then Return SetError(1, 0 ,"")
    If $sDelim = Default Then $sDelim = ","
    If $sDelim = "" Then Return SetError(2, 0 ,"")
    If $sNewLine = Default Then $sNewLine = @LF
    If $sNewLine = "" Then Return SetError(3, 0 ,"")
    If $sDelim = $sNewLine Then Return SetError(4, 0, "")

    Local $iRows = UBound($aArray), $sString = ""
    If Ubound($aArray, 0) = 2 Then ; Check if the array has two dimensions
        Local $iCols = UBound($aArray, 2)
        For $i = 0 To $iRows -1
            For $j = 0 To $iCols -1
                If StringRegExp($aArray[$i][$j], '["\r\n' & $sDelim & ']') Then
                    $aArray[$i][$j] = '"' & StringReplace($aArray[$i][$j], '"', '""') & '"'
                EndIf
                $sString &= $aArray[$i][$j] & $sDelim
            Next
            $sString = StringTrimRight($sString, StringLen($sDelim)) & $sNewLine
        Next
    Else ; The delimiter is not needed
        For $i = 0 To $iRows -1
            If StringRegExp($aArray[$i], '["\r\n' & $sDelim & ']') Then
                $aArray[$i] = '"' & StringReplace($aArray[$i], '"', '""') & '"'
            EndIf
            $sString &= $aArray[$i] & $sNewLine
        Next
    EndIf
    If Not $bFinalBreak Then $sString = StringTrimRight($sString, StringLen($sNewLine)) ; Delete any newline characters added to the end of the string
    Return $sString
EndFunc ;==> _ArrayToCSV

; #FUNCTION# ====================================================================================================================
; Name...........: _ArrayToSubItemCSV
; Description ...: Converts an array to multiple CSV formated strings based on the content of the selected column
; Syntax.........: _ArrayToSubItemCSV($aCSV, $iCol [, $sDelim [, $bHeaders [, $iSortCol [, $bAlphaSort ]]]])
; Parameters ....: $aCSV       - The array to parse
;                  $iCol       - Array column used to search for unique content
;                  $sDelim     - Optional - Delimiter set to comma by default
;                  $bHeaders   - Include csv column headers - Default = False
;                  $iSortCol   - The column to sort on for each new CSV (sorts ascending) - Default = False
;                  $bAlphaSort - If set to true, sorting will be faster but numbers won't always appear in order of magnitude.
; Return values .: Success   - Returns a two dimensional array - col 0 = subitem name, col 1 = CSV data
;                  Failure   - Returns an empty string and sets @error to:
;                 |@error = 1 - First parameter is not a 2D array
;                 |@error = 2 - Nothing to parse
;                 |@error = 3 - Invalid second parameter Column number
;                 |@error = 4 - Invalid third parameter - Delimiter is an empty string
;                 |@error = 5 - Invalid fourth parameter - Sort Column number is out of range
; Author ........: czardas
; Comments ......; @CRLF is used for line breaks in the returned array of CSV strings.
;                ; Data in the sorting column is automatically assumed to contain numeric values.
;                ; Setting $iSortCol equal to $iCol will return csv rows in their original ordered sequence.
; ===============================================================================================================================

Func _ArrayToSubItemCSV($aCSV, $iCol, $sDelim = Default, $bHeaders = Default, $iSortCol = Default, $bAlphaSort = Default)
    If Not IsArray($aCSV) Or UBound($aCSV, 0) <> 2 Then Return SetError(1, 0, "") ; Not a 2D array

    Local $iBound = UBound($aCSV), $iNumCols = UBound($aCSV, 2)
    If $iBound < 2 Then Return SetError(2, 0, "") ; Nothing to parse

    If IsInt($iCol) = 0 Or $iCol < 0 Or $iCol > $iNumCols -1 Then Return SetError(3, 0, "") ; $iCol is out of range

    If $sDelim = Default Then $sDelim = ","
    If $sDelim = "" Then Return SetError(4, 0, "") ; Delimiter can not be an empty string

    If $bHeaders = Default Then $bHeaders = False

    If $iSortCol = Default Or $iSortCol == False Then $iSortCol = -1
    If IsInt($iSortCol) = 0 Or $iSortCol < -1 Or $iSortCol > $iNumCols -1 Then Return SetError(5, 0, "") ; $iSortCol is out of range

    If $bAlphaSort = Default Then $bAlphaSort = False

    Local $iStart = 0

    If $bHeaders Then
        If $iBound = 2 Then Return SetError(2, 0, "") ; Nothing to parse
        $iStart = 1
    EndIf

    Local $sTestItem, $iNewCol = 0

    If $iSortCol <> -1 And ($bAlphaSort = False Or $iSortCol = $iCol) Then ; In this case we need an extra Column for sorting
        ReDim $aCSV [$iBound][$iNumCols +1]

        ; Populate column
        If $iSortCol = $iCol Then
            For $i = $iStart To $iBound -1
                $aCSV[$i][$iNumCols] = $i
            Next
        Else
            For $i = $iStart To $iBound -1
                $sTestItem = StringRegExpReplace($aCSV[$i][$iSortCol], "\A\h+", "") ; Remove leading horizontal WS
                If StringIsInt($sTestItem) Or StringIsFloat($sTestItem) Then
                    $aCSV[$i][$iNumCols] = Number($sTestItem)
                Else
                    $aCSV[$i][$iNumCols] = $aCSV[$i][$iSortCol]
                EndIf
            Next
        EndIf
        $iNewCol = 1
        $iSortCol = $iNumCols
    EndIf

    _ArraySort($aCSV, 0, $iStart, 0, $iCol) ; Sort on the selected column

    Local $aSubItemCSV[$iBound][2], $iItems = 0, $aTempCSV[1][$iNumCols + $iNewCol], $iTempIndex
    $sTestItem = Not $aCSV[$iBound -1][$iCol]

    For $i = $iBound -1 To $iStart Step -1
        If $sTestItem <> $aCSV[$i][$iCol] Then ; Start a new csv instance
            If $iItems > 0 Then ; Write to main array
                ReDim $aTempCSV[$iTempIndex][$iNumCols + $iNewCol]

                If $iSortCol <> -1 Then _ArraySort($aTempCSV, 0, $iStart, 0, $iSortCol)
                If $iNewCol Then ReDim $aTempCSV[$iTempIndex][$iNumCols]

                $aSubItemCSV[$iItems -1][0] = $sTestItem
                $aSubItemCSV[$iItems -1][1] = _ArrayToCSV($aTempCSV, $sDelim, @CRLF)
            EndIf

            ReDim $aTempCSV[$iBound][$iNumCols + $iNewCol] ; Create new csv template
            $iTempIndex = 0
            $sTestItem = $aCSV[$i][$iCol]

            If $bHeaders Then
                For $j = 0 To $iNumCols -1
                    $aTempCSV[0][$j] = $aCSV[0][$j]
                Next
                $iTempIndex = 1
            EndIf
            $iItems += 1
        EndIf

        For $j = 0 To $iNumCols + $iNewCol -1 ; Continue writing to csv
            $aTempCSV[$iTempIndex][$j] = $aCSV[$i][$j]
        Next
        $iTempIndex += 1
    Next

    ReDim $aTempCSV[$iTempIndex][$iNumCols + $iNewCol]

    If $iSortCol <> -1 Then _ArraySort($aTempCSV, 0, $iStart, 0, $iSortCol)
    If $iNewCol Then ReDim $aTempCSV[$iTempIndex][$iNumCols]

    $aSubItemCSV[$iItems -1][0] = $sTestItem
    $aSubItemCSV[$iItems -1][1] = _ArrayToCSV($aTempCSV, $sDelim, @CRLF)

    ReDim $aSubItemCSV[$iItems][2]
    Return $aSubItemCSV
EndFunc ;==> _ArrayToSubItemCSV

; #FUNCTION# ====================================================================================================================
; Name...........: _CSVSplit
; Description ...: Converts a string in CSV format to a two dimensional array (see comments)
; Syntax.........: CSVSplit ( $aArray [, $sDelim ] )
; Parameters ....: $aArray  - The array to convert
;                  $sDelim  - Optional - Delimiter set to comma by default (see 2nd comment)
; Return values .: Success  - Returns a two dimensional array or a one dimensional array (see 1st comment)
;                  Failure  - Sets @error to:
;                 |@error = 1 - First parameter is not a valid string
;                 |@error = 2 - Second parameter is not a valid string
;                 |@error = 3 - Could not find suitable delimiter replacements
; Author ........: czardas
; Comments ......; Returns a one dimensional array if the input string does not contain the delimiter string
;                ; Some CSV formats use semicolon as a delimiter instead of a comma
;                ; Set the second parameter to @TAB To convert to TSV
; ===============================================================================================================================

Func _CSVSplit($string, $sDelim = ",") ; Parses csv string input and returns a one or two dimensional array
    If Not IsString($string) Or $string = "" Then Return SetError(1, 0, 0) ; Invalid string
    If Not IsString($sDelim) Or $sDelim = "" Then Return SetError(2, 0, 0) ; Invalid string

    $string = StringRegExpReplace($string, "[\r\n]+\z", "") ; [Line Added] Remove training breaks
    Local $iOverride = 63743, $asDelim[3] ; $asDelim => replacements for comma, new line and double quote
    For $i = 0 To 2
        $asDelim[$i] = __GetSubstitute($string, $iOverride) ; Choose a suitable substitution character
        If @error Then Return SetError(3, 0, 0) ; String contains too many unsuitable characters
    Next
    $iOverride = 0

    Local $aArray = StringRegExp($string, '\A[^"]+|("+[^"]+)|"+\z', 3) ; Split string using double quotes delim - largest match
    $string = ""

    Local $iBound = UBound($aArray)
    For $i = 0 To $iBound -1
        $iOverride += StringInStr($aArray[$i], '"', 0, -1) ; Increment by the number of adjacent double quotes per element
        If Mod ($iOverride +2, 2) = 0 Then ; Acts as an on/off switch
            $aArray[$i] = StringReplace($aArray[$i], $sDelim, $asDelim[0]) ; Replace comma delimeters
            $aArray[$i] = StringRegExpReplace($aArray[$i], "(\r\n)|[\r\n]", $asDelim[1]) ; Replace new line delimeters
        EndIf
        $aArray[$i] = StringReplace($aArray[$i], '""', $asDelim[2]) ; Replace double quote pairs
        $aArray[$i] = StringReplace($aArray[$i], '"', '') ; Delete enclosing double quotes - not paired
        $aArray[$i] = StringReplace($aArray[$i], $asDelim[2], '"') ; Reintroduce double quote pairs as single characters
        $string &= $aArray[$i] ; Rebuild the string, which includes two different delimiters
    Next
    $iOverride = 0

    $aArray = StringSplit($string, $asDelim[1], 2) ; Split to get rows
    $iBound = UBound($aArray)
    Local $aCSV[$iBound][2], $aTemp
    For $i = 0 To $iBound -1
        $aTemp = StringSplit($aArray[$i], $asDelim[0]) ; Split to get row items
        If Not @error Then
            If $aTemp[0] > $iOverride Then
                $iOverride = $aTemp[0]
                ReDim $aCSV[$iBound][$iOverride] ; Add columns to accomodate more items
            EndIf
        EndIf
        For $j = 1 To $aTemp[0]
            If StringLen($aTemp[$j]) Then
                If Not StringRegExp($aTemp[$j], '[^"]') Then ; Field only contains double quotes
                    $aTemp[$j] = StringTrimLeft($aTemp[$j], 1) ; Delete enclosing double quote single char
                EndIf
                $aCSV[$i][$j -1] = $aTemp[$j] ; Populate each row
            EndIf
        Next
    Next

    If $iOverride > 1 Then
        Return $aCSV ; Multiple Columns
    Else
        For $i = 0 To $iBound -1
            If StringLen($aArray[$i]) And (Not StringRegExp($aArray[$i], '[^"]')) Then ; Only contains double quotes
                $aArray[$i] = StringTrimLeft($aArray[$i], 1) ; Delete enclosing double quote single char
            EndIf
        Next
        Return $aArray ; Single column
    EndIf

EndFunc ;==> _CSVSplit

; #INTERNAL_USE_ONLY# ===========================================================================================================
; Name...........: __GetSubstitute
; Description ...: Searches for a character to be used for substitution, ie one not contained within the input string
; Syntax.........: __GetSubstitute($string, ByRef $iCountdown)
; Parameters ....: $string   - The string of characters to avoid
;                  $iCountdown - The first code point to begin checking
; Return values .: Success   - Returns a suitable substitution character not found within the first parameter
;                  Failure   - Sets @error to 1 => No substitution character available
; Author ........: czardas
; Comments ......; This function is connected to the function _CSVSplit and was not intended for general use
;                  $iCountdown is returned ByRef to avoid selecting the same character on subsequent calls to this function
;                  Initially $iCountown should be passed with a value = 63743
; ===============================================================================================================================

Func __GetSubstitute($string, ByRef $iCountdown)
    If $iCountdown < 57344 Then Return SetError(1, 0, "") ; Out of options
    Local $sTestChar
    For $i = $iCountdown To 57344 Step -1
        $sTestChar = ChrW($i)
        $iCountdown -= 1
        If Not StringInStr($string, $sTestChar) Then
            Return $sTestChar
        EndIf
    Next
    Return SetError(1, 0, "") ; Out of options
EndFunc ;==> __GetSubstitute

;

See examples in post #8 below.

;

Edited by czardas

Share this post


Link to post
Share on other sites



This is incorrect use of a character class.

[(\r\n)\r\n]

It's basically

[\r\n]

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

This is incorrect use of a character class.

[(\r\n)\r\n]

It's basically

[\r\n]

 

Thanks guinness.

Share this post


Link to post
Share on other sites

No probs -_0


_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

I can still see some ways to clean it up: which I will do shortly (I wrote much of this code a while ago). For all intents and purposes the resulting code will not behave differently. I am using 'private use' code point range characters as delimiters (first time implementation by me).

Edited by czardas

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

I have updated the first post. The only changes made were in error checking.

To get a good idea of what's going on in the example, make sure your csv has duplicated entries in col 1 (2nd column) otherwise you will get a new csv for every row - aargh.

Edited by czardas

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

This replacement for the function _ArrayToSubItemCSV (see first post) still needs testing and perhaps a tweak or two. I have made some tricky choices and I wanted to know your opinions. Because csv fields commonly contain numeric data, the preferance has been set to sort strings by magnitude when numeric content is encountered. This is the default behaviour for sorting in the function below, although it involes more parsing. Setting $bAlphaSort to True negates the default behaviour, and the code will run slightly faster.

One problem encountered is that number conversions sometimes alter string format so an extra column is needed to do the sorting. This avoids any potential data corruption within the newly created csv strings, but again adds extra overhead.

Finally I considered placing the rows (in each new csv after parsing) back in the same order they were originally encountered in the source array. One column that is guaranteed to contain the same value in every row after splitting is the splitting column itself. So setting the sorting column parameter equal to the splitting column parameter is basically saying do nothing at all. I didn't want to add another parameter and I am already using the value $iSortCol = False when any order will do - in which case $iSortCol becomes -1 in the code (so as not to be confused with zero). Sorry it's hard to explain.

Basically there are 4 choices:

1. Return data in any order ==> $iSortCol = False (fastest default method)

2. Quick Sort ==> Set $bAlphaSort = True (sorting method)

3. Original sequence ==> Set $iSortCol = $iCol (slower)

4. Numeric Sort ==> $bAlphaSort = False (slowest default sorting method)

If _ArraySort() were to return duplicates in the order in which they are encountered (this is not a criticism), most of these complications wouldn't occur. Of course there are other ways to parse the data.


CODE MOVED TO THE UDF (1st post)


The modifications are the result of a feature request; so any alternative methods, suggestions about default behaviour or other ideas would be greatly appreciated.

Edited by czardas

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Finished testing and there were one or two bugs needed fixing. The default behaviour is non conventional but suitable for practical reasons, as mentioned above - although I expected differences of opinion.

Examples to illustrate the functionality of the new version of _ArrayToSubItemCSV()

Here is test.csv used for the example:

Date, Details,Amount
01-02-2013,contra, 10
03-02-2013,withdrawal, 2
04-02-2013,lodgement,12.0
05-02-2013,transfer,.56
08-02-2013,contra,2000
21-02-2013,lodgement,0.20
04-03-2013,withdrawal,8
09-03-2013,contra,2
18-03-2013,lodgement,0.20
24-03-2013,lodgement,8
31-03-2013,transfer,10
04-04-2013,withdrawal, .01
05-04-2013,transfer,.56
08-05-2013,transfer,20.00
21-07-2013,lodgement,0.20
04-08-2013,withdrawal,8
04-09-2013,lodgement,12.0
05-09-2013,transfer,.56
08-09-2013,contra,2000

;

And here is the code used for testing:

#include <CSVSplit.au3>

Local $sFilePath = @ScriptDir & "\test.csv" ; Change this to your own csv file (in the same directory as the script)

Local $hFile = FileOpen($sFilePath)
If $hFile = -1 Then
    MsgBox(0, "", "Unable to open file")
    Exit
EndIf

Local $sCSV = FileRead($hFile)
If @error Then
    MsgBox(0, "", "Unable to read file")
    FileClose($hFile)
    Exit
EndIf
FileClose($hFile)

Local $aCSV = _CSVSplit($sCSV) ; Create the main array
If @error Then
    ConsoleWrite("Error = " & @error & @LF)
    Exit
EndIf

; Examples 0f _ArrayToSubItemCSV()
Local $aParam[4][5] = [ _ ; Test parameters
[1,Default,True,2,Default], _ ; Test 1 = Numeric sort - with headers
[1,Default,True,2,True], _ ; Test 2 = Alpha Sort - with headers
[1,Default,Default,Default,Default], _ ; Test 3 = Any order - no headers
[1,Default,True,1,Default]] ; Test 4 = Original sequence - with headers

Local $aSubItemCSV, $sFolderName, $sNewPath, $iSuccess

For $i = 0 To 3
    ; Create an array of csv strings
    $aSubItemCSV = _ArrayToSubItemCSV($aCSV, $aParam[$i][0], $aParam[$i][1], $aParam[$i][2], $aParam[$i][3], $aParam[$i][4])
    If @error Then
        ConsoleWrite("Error = " & @error & @LF)
        Exit
    EndIf

    ; Now let's write each csv to file
    $sFolderName = StringTrimRight(StringReplace($sFilePath, @ScriptDir & "\", ""), 4) & " " & $i +1

    For $j = 0 To UBound($aSubItemCSV) -1
        If StringRegExp($aSubItemCSV[$j][0], '[\/\?<>\\\:\|\*"]') Then
            MsgBox(0, "Invalid file name", "You can not use the following characters in a file name" & @CRLF & '/ ? < > \ : | * "')
            Exit
        EndIf
        $sNewPath = @ScriptDir & "\" & $sFolderName & "\" & $aSubItemCSV[$j][0] & ".csv"
        $hFile = FileOpen($sNewPath, BitOr(8,1))
        If $hFile = -1 Then
            MsgBox(0, "", "Unable to open file")
            Exit
        EndIf

        $iSuccess = FileWrite($hFile, $aSubItemCSV[$j][1])
        FileClose($hFile)
        If $iSuccess = 0 Then
            MsgBox(0, "", "Unable to write to file")
            Exit
        EndIf
    Next
Next

;

Notice the differences in the output on each test. Look at the Date and Amount columns. Notice that if you use Excel to view the files, numbers are formatted for display purposes - tricky and misleading. No formatting or data corruption actually occurs in the csv. - Excel just does its own thing.

Test 1 = Numeric sort on the Amount column with headers
Test 2 = Quick sort on the Amount column with headers
Test 3 = Any order without headers
Test 4 = Original sequence with headers

Edited by czardas

Share this post


Link to post
Share on other sites

Perhaps it would be an idea to ask the Windows locale for default split character ? Some parts of the world (at least mine) uses ; to split since we have comma as decimal.

So we have

Bob;Doe;1234,49
John;Doe;9948,12
Jane;Doe;1999,00

 

Welcomed UDF though !

1 person likes this

I am just a hobby programmer, and nothing great to publish right now.

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Myicq, thanks for trying it. Change the parameter $sDelim to anything you want - although I have only seen semicolon used as an alternative. Use @TAB if you wish tsv. I think it should be the developer's job to make sure the delimiter is suitable.

Edited by czardas

Share this post


Link to post
Share on other sites

Update to first post: Bugfix in _ArrayToCSV().

New lines contained within fields were being ignored - ouch! :doh: Thanks Chimaera for reporting this.

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

You're welcome. The bug was a stupid oversight on my part. I've been using some of this code for a while already without noticing it. When I get time, I will probably make some trivial (non script breaking) changes. I can see a difference between the code I wrote a while back and the code I wrote recently. It  feels good to finally create something useful to someone else, besides myself. :)

Edited by czardas
1 person likes this

Share this post


Link to post
Share on other sites

Been looking at this again

Is it possible to have a second sort

So

Main Split / First Sort Column / Second Sort Column

 

Company_code,Company_name,Stock_code,Short_desc,Discontinued,Preferred_supplier,Stock_type_l2,Stock_id,Line Number,26,Page_No,2128768,91914

ABRUTT,Abrutt Ltd,SKR3187,Swingback stepladder 4 tread,0,-1,Facility,27037,1,,560,,0
ABRUTT,Abrutt Ltd,SKR3188,Swingback stepladder 5 tread,0,-1,Facility,27038,2,,560,,0
ABRUTT,Abrutt Ltd,SKR3189,Swingback stepladder 6 tread,0,-1,Facility,27039,3,,560,,0
ABRUTT,Abrutt Ltd,SKR3190,Swingback stepladder 8 tread,0,-1,Facility,27040,4,,560,,0
ABRUTT,Abrutt Ltd,SKR3191,Swingback stepladder 10 tread,0,-1,Facility,27041,5,,560,,0
ABRUTT,Abrutt Ltd,SKR3192,Swingback stepladder 12 tread,0,-1,Facility,27042,6,,560,,0
ABRUTT,Abrutt Ltd,SKR3193,Swingback stepladder 14 tread,0,-1,Facility,27043,7,,560,,0
ABRUTT,Abrutt Ltd,SKR3194,Platform stepladder 4 tread,0,-1,Facility,27044,8,,561,,0
ABRUTT,Abrutt Ltd,SKR3195,Platform stepladder 5 tread,0,-1,Facility,27045,9,,561,,0
ABRUTT,Abrutt Ltd,SKR3196,Platform stepladder 6 tread,0,-1,Facility,27046,10,,561,,0
ABRUTT,Abrutt Ltd,SKR3197,Platform stepladder 8 tread,0,-1,Facility,27047,11,,561,,0
ABRUTT,Abrutt Ltd,SKR3198,Platform stepladder 10 tread,0,-1,Facility,27048,12,,561,,0
ABRUTT,Abrutt Ltd,SKR3199,Platform stepladder 12 tread,0,-1,Facility,27049,13,,561,,0
 
In this instance
Page_No / Company_Code / Line_Number
 
As im trying to refine the data to make it easier for the booklet im producing
 
Just a thought thx

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

I consider a second sort option to be more of an array, or database specific, type function. This UDF can be used for that purpose already, but it may be more convenient to simply use a multi-column array sorting function. The whole idea of creating an array from csv is to use array functions to manipulate the data.

Here is a method using _ArrayToSubItemCSV(). Once you have done one split, you can split each csv string again using the main sorting column as the split, and sort each new csv on the second column. Then concatenate the results in the correct sequence.

Remember there are potential issues with numeric data when sorting arrays. This is taken into consideration in the example below. I had to corrupt your example csv because it was too ordely to use as a test. The example below splits on column 10, followed by sorting on column 2, followed by sorting on column 7. The headers are removed and put back later. If there are no headers, then that part of the code will need to be modified.

;

#include 'CSVSplit.au3'

Local $sCSV = _
"Company_code,Company_name,Stock_code,Short_desc,Discontinued,Preferred_supplier,Stock_type_l2,Stock_id,Line Number,26,Page_No,2128768,91914" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3188,Swingback stepladder 5 tread,0,-1,Facility,27038,2,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3193,Platform stepladder 6 tread,0,-1,Facility,27076,10,,561,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 10 tread,0,-1,Facility,2741,5,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3188,Swingback stepladder 12 tread,0,-1,Facility,27092,6,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3193,Swingback stepladder 14 tread,0,-1,Facility,27043,7,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 6 tread,0,-1,Facility,27036,3,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 8 tread,0,-1,Facility,27049,4,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 12 tread,0,-1,Facility,2049,13,,561,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 4 tread,0,-1,Facility,2744,8,,561,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3195,Platform stepladder 5 tread,0,-1,Facility,70495,9,,561,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3187,Swingback stepladder 4 tread,0,-1,Facility,27437,1,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3197,Platform stepladder 8 tread,0,-1,Facility,27037,11,,561,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 14 tread,0,-1,Facility,27043,7,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 6 tread,0,-1,Facility,27036,3,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3195,Swingback stepladder 8 tread,0,-1,Facility,27019,4,,560,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 12 tread,0,-1,Facility,20492,13,,561,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3187,Platform stepladder 4 tread,0,-1,Facility,2744,8,,561,,0" & @LF & _
"ABRUTT,Abrutt Ltd,SKR3188,Platform stepladder 10 tread,0,-1,Facility,27068,12,,561,,0"

Local $iSplitCol = 10, $iFirstSortCol = 2, $iSecondSortCol = 7

Local $aFirstSplit = _CSVSplit($sCSV)

; Get rid of the headers to simplify the multiple splitting operations.
Local $aHeader[1][UBound($aFirstSplit, 2)]
For $i = 0 To UBound($aFirstSplit, 2) -1
    $aHeader[0][$i] = $aFirstSplit[0][$i]
Next

Local $sCSVHeader = _ArrayToCSV($aHeader) ; We'll add the headers back later
_ArrayDelete($aFirstSplit, 0) ; Delete Headers

$aFirstSplit = _ArrayToSubItemCSV($aFirstSplit, $iSplitCol) ; Do not sort on the first run

Local $aSecondSplit, $vTestItem, $vOutputDisplay

For $i = 0 To UBound($aFirstSplit) -1
    $aSecondSplit = _CSVSplit($aFirstSplit[$i][1])
    ; The sub sort occurs here
    $aSecondSplit = _ArrayToSubItemCSV($aSecondSplit, $iFirstSortCol, Default, Default, $iSecondSortCol)

    ; Now we test to see if the data in the main sort is numeric.
    For $j = 0 To UBound($aSecondSplit) -1
        $vTestItem = StringRegExpReplace($aSecondSplit[$j][0], "\A\h+", "") ; Remove leading horizontal WS
        If StringIsInt($vTestItem) Or StringIsFloat($vTestItem) Then
            $aSecondSplit[$j][0] = Number($aSecondSplit[$j][0]) ; Conversion to numeric data.
        EndIf
    Next

    _ArraySort($aSecondSplit) ; The main sort occurs here.

    ; Now combine the sorted csv strings in the correct order
    $vOutputDisplay = $sCSVHeader
    For $j = 0 To UBound($aSecondSplit) -1
        $vOutputDisplay &= $aSecondSplit[$j][1] ; The final csv string is assembled.
    Next

    ; Add your code here below
    ; Write $vOutputDisplay string to file

    ; Remove the next two lines which are for testing purposes only.
    $vOutputDisplay = _CSVSplit($vOutputDisplay)
    _ArrayDisplay($vOutputDisplay)
Next
Edited by czardas

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

I discovered a bug in CSVSplit() and added a patch. In some instances fields only containing double quotes were returning incorrect results.

This bug turned out to be a ghost in the machine.

Edited by czardas

Share this post


Link to post
Share on other sites

#20 ·  Posted (edited)

This line seems not to work:

$sLine = '10,20,"Blah,Blah",50,"Test1,10" Monitor,40",50",100'

Output should be

10|20|Blah,Blah|50|Test1,10" Monitor,40|50"|100

But is

10|20|Blah,Blah|50|Test1,10 Monitor|40,50|100
Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now