Jump to content

Simple, quick CSV parser


ed0
 Share

Recommended Posts

Hi,

Here is a CSV file parser that reads Excel style CSV files. It doesn't include all the functionality of Ivan Perez' CSV library, just a parser and a couple of mostly superfluous get row/column functions. Why use it instead of Ivan's code? Well, maybe you won't. He has some good stuff in there. But this version does have a few virtues:

- Like Ivan's code, it parses Excel style CSV files

- It parses the CSV into a 2D array, so you can access the fields directly

- It is lightning quick

In case you think CSV parsing is trivial, here's an example file and its parsed representation:

a,b,c,d,"e",f,g
,b,"","""d""",,'f',
"a",b, ,,"""e",f',g
a,"b
""",c,d,,f,""

[a][b][c][d][e][f][g]
[][b][]["d"][]['f'][]
[a][b][ ][]["e][f'][g]
[a][b
"][c][d][][f][]

See http://en.wikipedia.org/wiki/Comma-separated_values for a description of the format.

I think the code is interesting also, mainly because it parses the entire file in a single StringRegExp() call. Check it out and let me know what you think!

Edit: Heh, just noticed I didn't fully document the get row/column parameters. There's always something...

LibCsv2.au3

Edited by ed0
Link to comment
Share on other sites

where's the script or file containing the script? :)

haha it was added right after my page reloaded when posting my msg! :-D

Sorry John, you probably caught me in mid-edit, fixing my parameter documentation.

Ed

Link to comment
Share on other sites

  • 2 months later...

Hi ed0, thanks for sharing :)

I haven't tested this yet, but there's an error with it.

Line 57:

Die( 'Error: ' & @error )

--> Unknown function.

Maybe replace the line with:

ConsoleWrite('!>StringRegExp() Error: ' & @error & @CRLF)

or something like that.

Cheers,

footswitch

Link to comment
Share on other sites

  • 9 months later...

Great parsing solution. But from the sample post provided, one thing that is lacking that people (like me) could really use is a CSV parser that can parse CSV fields with embedded commas, treating the comma as part of a value and not a delimiter.

You do cover the other part, where a CSV field can span a line with embedded newlines, so that's good.

Link to comment
Share on other sites

  • 5 months later...
  • 3 months later...

I seem have to have some problems with your parser. I have this CSV content:

"name",20090501,123456,"50,50"
"name",20090401,123456,"60,50"
"name",20090301,123456,"70,50"

I have also tried this content (changed comma to dot, last number is a float):

"name",20090501,123456,"50.50"
"name",20090401,123456,"60.50"
"name",20090301,123456,"70.50"

If I have it parse this content, I get the following message:

csv.au3 (165) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.:
$aRow[$i] = $aCSV[$rowNum][$i]
$aRow[$i] = ^ ERROR

I open the file using:

$ImportFileName = FileOpenDialog("Open CSV", $Location, "CSV Files (*.csv)", 1+2)
    $CSVData = _CSVReadFile($ImportFileName, ",")
    _ArrayDisplay($CSVData)
    ConsoleWrite(UBound($CSVData,1) & " ")
    For $counter = 0 to UBound($CSVData,1)
        $row = _CSVGetRow($CSVData, $counter)
                _ArrayDisplay($row)
        Next

Am I doing something wrong or is it misreading something from the CSV?

Link to comment
Share on other sites

  • 2 years later...

Hi All, I'm long time user of Autoit but a first time poster. Thanks all who have contributed to this great tool.

I put a fair effort in to testing and checking forums to find an answer before posting so I'm hoping I've got all the information in this post and the right person needs to help me out.... many thanks in advance.

I'm using LibCsv2.au3 and if I run a test with my csv using the _CSVTest function it works as expected. When I run it using the _CSVReadFile function no values are returned. my 'test.csv' is just a copy of the example used in the original post.

a,b,c,d,"e",f,g

,b,"","""d""",,'f',

"a",b, ,,"""e",f',g

a,"b

""",c,d,,f,""

I've used MsgBox at certain points to capture whats going on. I used MsgBox to show $aCsvData at the end to see what it is about to return and it shows nothing. If I MsgBox to show $aRawData at the start it returns the expected text. I also know it is working out the row and columns correctly too, as if I put a MsgBox for each time it loops:

Local $aCsvData[$rowCount][$maxCols]

MsgBox(4096,$rowCount,$maxCols)

Which shows 4 and 7. This is correct. But when I try to show the value its up to the MsgBox shows that it found nothing.

$aMatch = $aRawData[$i]

MsgBox(4096,"This is what aRawDatais",$aRawData[$i])

Any ideas why is is reading them as blank? Oh and if you are wondering why I am using MsgBox instead of 'ConsoleWrite' its because I dont have install rights on my pc and I dont have a program to look at its output. If anyone knows of a standalone utility I can use I'd be keen to use it.

Thanks all!!!!

Link to comment
Share on other sites

The variable $aCsvData is an array, are you trying to display it as an array (ex. using _ArrayDisplay()) or as a string (ex. MsgBox(0, "", $aCsvData))?

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Oh and if you are wondering why I am using MsgBox instead of 'ConsoleWrite' its because I dont have install rights on my pc and I dont have a program to look at its output. If anyone knows of a standalone utility I can use I'd be keen to use it.

Thanks all!!!!

For an easy answer to this question check out the example script for a portable version of AutoIT3 with a portable version of the Autoit3wrapper program. Install it on a flash drive and it won't need anything installed on the computer you're using it on.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Nice! should probs go on the excel UDF if they dont already have something like this.

ongoing projects:-firestorm: Largescale P2P Social NetworkCompleted Autoit Programs/Scripts: Variable Pickler | Networked Streaming Audio (in pure autoIT) | firenet p2p web messenger | Proxy Checker | Dynamic Execute() Code Generator | P2P UDF | Graph Theory Proof of Concept - Breadth First search

Link to comment
Share on other sites

Hey guys thanks heaps for looking at my post. I understand $aCsvData is a array, however looking at the code my understanding is that $aRawData[$i] should show the value for the data at the point of the loop its at. And also that the final result the function returns should be one formatted string, which can then used to create an new array. I'm really keen to use this code as it looks great but I don't seem to be able to get it to work with a 2D array.

So save any confusion all I just want to confirm to everyone I haven't altered the original code other than doing the following.

Calling the _CSVReadFile function (for 2D Array)

Using test.csv file based on a copy and paste of the example csv given by author.

Using MsgBox to show final return result (which shows as blank)

I think it would take a couple of minutes to replicate what I've done. I guess I'm just curious to see if anyone has the same problem. I can see the script has been downloaded over 800 times so I realize this point toward the problem being me but I'm really scratching my head as I don't feel like I've done anything to the code. Is there any other example code anyone else has seen to turn a csv file in to an array?

Cheeers!!!

Link to comment
Share on other sites

Hi kingsmiley,

You seem to think that $aRawData[$i] contains a string value. It actually contains an array as it puts the array into $aMatch, which becomes an array for further processing. Confused? StringRegExp() with flag 4 returns an array of arrays. Each element of the array is an array and thus $aRawData[$i] is one such element. If you want to look at the values then you would look at $aMatch[n] in which "n" is the index of the element to view.

Here is an example of usage

#include "LibCsv2.au3"
#include <array.au3>

$teststring = _
    'a,b,c,d,"e",f,g' & @CRLF & _
    ',b,"","""d""",,' & "'" & 'f' & "'" & ',' & @CRLF & _
    '"a",b, ,,"""e",f' & "'" & ',g' & @CRLF & _
    'a,"b' & @CRLF & _
    '""",c,d,,f,""'
If Not FileExists('test.txt') Then
    ; create a test file to read later
    FileWrite('test.txt', $teststring)
EndIf

$array = _CSVReadFile('test.txt')

SplashTextOn('Looping', '', Default, 50)

For $i1 = 0 To UBound($array) -1
    For $i2 = 0 To UBound($array, 2) -1
        ControlSetText('Looping', '', 'Static1', $array[$i1][$i2])
        Sleep(500)
    Next
Next

SplashOff()

_ArrayDisplay($array)

Edit: Do note that _CSVReadFile() returns a 2D array and not a one formatted string that you state.

Edited by MHz
Link to comment
Share on other sites

Hi MHz thanks heaps for your reply and thanks also to BrewManNH. You are right I misunderstood the return value as being a string as I didn't know functions could return Arrays as I'm not that familiar with Arrays. I've managed to get the script working for what I'm doing which is great because it means I can get people to run it based on their own list without having to bother me!!!!! Awesome - Thanks again to the Autoit community! :huh2:

Link to comment
Share on other sites

  • 8 months later...

Nice! should probs go on the excel UDF if they dont already have something like this.

I don't think this is related to Excel just because it can open and view such .csv files.

But from the sample post provided, one thing that is lacking that people (like me) could really use is a CSV parser that can parse CSV fields with embedded commas, treating the comma as part of a value and not a delimiter. You do cover the other part, where a CSV field can span a line with embedded newlines, so that's good.

Yeah, should be mentioned, but regarding the other examples one can imagine it also working that way.

Also, I found this function pretty useful for my uses and added a few modifications. (See top lines in attached and cited script.)

Line 57: Die( 'Error: ' & @error ) --> Unknown function.

Deleted that line

I seem have to have some problems with your parser. I have this CSV content:

"name",20090501,123456,"50,50"
"name",20090401,123456,"60,50"
"name",20090301,123456,"70,50"

I open the file using:

$ImportFileName = FileOpenDialog("Open CSV", $Location, "CSV Files (*.csv)", 1+2)
$CSVData = _CSVReadFile($ImportFileName, ",")
_ArrayDisplay($CSVData)
ConsoleWrite(UBound($CSVData,1) & " ")
For $counter = 0 to UBound($CSVData,1)
  $row = _CSVGetRow($CSVData, $counter)
  _ArrayDisplay($row)
Next
Am I doing something wrong or is it misreading something from the CSV?
You forgot to substract UBound($CSVData,1) by 1 as arrays are 0-based. I added a few error handlings to prevent the script from crashing when the row or col param is "out of range".

; PREG-based CSV file parser.
; Copyright 2007, Ed Fletcher

; Modifications by FichteFoll, (02-2012):
;   - added $cEnclose parameter for defining the enclosing character (default: ")
;   - have _CSVGetColumn and _CSVGetRow use Const ByRef for the first param to save ressources
;   - allow _CSVGetColumn and _CSVGetRow to use negative numbers as index (parsing backwards)
;    and add some error handling for invalid parameters
;   - fix use of "Die" as debug function since this is not defined
;   - modified _CSVTest to accept the same parameters as _CSVReadFile

; #AutoIt3Wrapper_AU3Check_Parameters=-d -w 3 -w 4 -w 5 -w 6
#include-once

;===============================================================================
;
; Description:    Reads a CSV file into a 2D array
; Parameter(s):  $sPath    - Path to the CSV file to read
;                  $cSeparator  - Separator character, default is comma (,)
;                  $cEnclose    - Character used in enclosings, default is "
; Requirement(s):   None
; Return Value(s):  On Success - 2D CSV array
;                  On Failure - 0  and Set
;                                  @ERROR to:  1 - File not found/openable
;                                              2 - File read error
;                                              3 - CSV format error
; Author(s):        Ed Fletcher
; Note(s):        Pattern based on work by Jeffrey E. F. Friedl in
;                  "Mastering Regular Expressions, 2nd Edition"
;===============================================================================
Func _CSVReadFile( $path, $separator = ',', $enclose = '"' )

    ;; open the file and read the entire CSV dataset into one string.

    Local $hFile = FileOpen( $path, 0 )
    If $hFile == -1 Then
        SetError( 1 )
        Return 0
    EndIf

    Local $sRawData = FileRead( $hFile )
    If @error > 0 Then
        FileClose( $hFile )
        SetError( 2 )
        Return 0
    EndIf

    FileClose( $hFile )


    ;; parse the string into an array of matched fields

    Local $pattern = '(?m)'                   ; multi-line search mode
    $pattern &= 'G(?:^|[' & $separator & '])'   ; start of line or start of field
    $pattern &= '(?:'                           ; one of two options:
    $pattern &= $enclose                         ;   a field starting with at double quote
    $pattern &= StringFormat('([^%s]*+(?:%s%s[^%s]*+)*+)', $enclose, $enclose, $enclose, $enclose)
    ;                                           ;   (quote-pairs and any non-quote chars)
    $pattern &= $enclose                         ;   a double quote ending the field
    $pattern &= '(r?n?)'                       ;   (any sort of line ending here?)
    $pattern &= '|'                           ; or:
    $pattern &= '([^"' & $separator & 'rn]*+)' ;   (a simple CSV field, no quotes or commas)
    $pattern &= '(r?n?)'                       ;   (any sort of line ending here?)
    $pattern &= ')'                           ; note that we should have 4 captures per CSV element

    Local $aRawData = StringRegExp( $sRawData, $pattern, 4 )
    If @error <> 0 Then
        SetError( 3 )
        Return 0
    EndIf

    $sRawData = ''  ; we're done with this, and it might be large

    ; $aRawData is a 1D array containing every field in the CSV file.  Each element
    ; in $aRawData is an array of 5 strings, like so:
    ; 0 - all of the characters consumed while matching this field
    ; 1 - field contents, if the field was double quoted
    ; 2 - a line ending, if the field was double quoted and this is the end of the line
    ; 3 - field contents, if the field was *not* double quoted
    ; 4 - a line ending, if the field was *not* double quoted and this is the end of the line


    ;; pass through the results once to determine the number of rows and the max number of columns

    Local $i, $aMatch
    Local $colCount = 0, $maxCols = 0
    Local $rowCount = 0

    For $i=0 To UBound($aRawData)-1
        $aMatch = $aRawData[$i]

        If $colCount == 0 Then
            $rowCount += 1          ; we're looking at the first field on the current row
        EndIf

        $colCount += 1

        If $colCount > $maxCols Then
            $maxCols = $colCount    ; longest row so far...
        EndIf

        If $aMatch[2] <> '' OR (UBound($aMatch) > 3 AND $aMatch[4] <> '') Then
            $colCount = 0           ; row complete, we might start a new one
        EndIf
    Next

    ;; we now know how large to make our 2D output array

    Local $aCsvData[$rowCount][$maxCols]


    ;; finally, populate our output array

    Local $row = 0, $col = 0

    For $i=0 To UBound($aRawData)-1
        $aMatch = $aRawData[$i]

        If UBound($aMatch) > 3 AND $aMatch[3] <> '' Then
            ; It was a simple field, no processing required
            $aCsvData[$row][$col] = $aMatch[3]
        Else
            ; It was a quoted value, so take care of embedded double quotes
            $aCsvData[$row][$col] = StringReplace($aMatch[1], '""', '"')
        EndIf

        $col += 1

        ; now look for a line ending that ends the current data row
        If $aMatch[2] <> '' OR (UBound($aMatch) > 3 AND $aMatch[4] <> '') Then
            $row += 1
            $col = 0
        EndIf
    Next

    Return $aCsvData
EndFunc

;===============================================================================
;
; Description:    Pulls a single column out of a 2D array
; Parameter(s):  $aCSV    - 2D array to work with; Const ByRef
;                  $colNum  - Column index, 0-based;
;                               Negative numbers for backwards parsing are allowed
; Requirement(s):   None
; Return Value(s):  On Success - An array of columnar data
;                  On Failure - 0  and Set
;                                  @ERROR to:  1 - Dimension mismatch; only 2D arrays!
;                                              2 - $colNum is invalid
;                                              3 - $colNum exceeds column count
; Note(s):
;
;===============================================================================
Func _CSVGetColumn( Const ByRef $aCSV, $colNum )
    ; test array dimensions
    If UBound($aCSV, 0) <> 2 Then
        SetError( 1 )
        Return 0
    EndIf
    ; test second parameter for validity
    $colNum = Int($colNum) ; cast strings
    If Not IsInt($colNum) Then
        SetError( 2 )
        Return 0
    EndIf
    Local $aBounds[2] = [UBound($aCSV, 1), UBound($aCSV, 2)]
    ; test second parameter for validity (2)
    If $colNum < 0 Then $colNum = $aBounds[1] + $colNum
    If $colNum < 0 Or $colNum > ($aBounds[1] - 1) Then
        SetError( 3 )
        Return 0
    EndIf

    ; start with defining the return array
    Local $aColumn[$aBounds[0]]
    Local $i

    For $i=0 To $aBounds[0]-1
        $aColumn[$i] = $aCSV[$i][$colNum]
    Next

    Return $aColumn
EndFunc

;===============================================================================
;
; Description:    Pulls a single row out of a 2D array
; Parameter(s):  $aCSV    - 2D array to work with; Const ByRef
;                  $rowNum  - Row index, 0-based;
;                                Negative numbers for backwards parsing are allowed
; Requirement(s):   None
; Return Value(s):  On Success - An array of row data
;                  On Failure - 0  and Set
;                                  @ERROR to:  1 - Dimension mismatch; only 2D arrays!
;                                              2 - $rowNum is invalid
;                                              3 - $rowNum exceeds column count
; Note(s):
;
;===============================================================================
Func _CSVGetRow( Const ByRef $aCSV, $rowNum )
    ; test array dimensions
    If UBound($aCSV, 0) <> 2 Then
        SetError( 1 )
        Return 0
    EndIf
    ; test second parameter for validity
    $colNum = Int($rowNum) ; cast strings
    If Not IsInt($rowNum) Then
        SetError( 2 )
        Return 0
    EndIf
    Local $aBounds[2] = [UBound($aCSV, 1), UBound($aCSV, 2)]
    ; test second parameter for validity (2)
    If $rowNum < 0 Then $rowNum = $aBounds[0] + $rowNum
    If $rowNum < 0 Or $rowNum > ($aBounds[0] - 1) Then
        SetError( 3 )
        Return 0
    EndIf

    ; start with defining the return array
    Local $aRow[$aBounds[1]]
    Local $i

    For $i=0 To $aBounds[1]-1
        $aRow[$i] = $aCSV[$rowNum][$i]
    Next

    Return $aRow
EndFunc

;===============================================================================
;
; Description:    Test function for _CSVReadFile()
; Parameter(s):  $sPath   - Path to the file to read, default is 'test.csv'
;                  $cSeparator  - Separator character, default is comma (,)
;                  $cEnclose    - Character used in enclosings, default is "
; Requirement(s):   None
; Return Value(s):  None
; Note(s):          Dumps out array data via ConsoleWrite()
;
;===============================================================================
Func _CSVTest( $file = "test.csv", $separator = ',', $enclose = '"' )
    Local $output = _CSVReadFile( $file, $separator, $enclose )

    If @error <> 0 Then
        ConsoleWriteError( "Error " & @error & " reading file [" & $file & "]" & @CR )
        Exit
    EndIf

    Local $i, $j
    For $i=0 To UBound($output)-1
        For $j=0 To UBound($output, 2)-1
            ConsoleWrite('[' & $output[$i][$j] & ']')
        Next
        ConsoleWrite(@CR)
    Next
EndFunc

;~ If @ScriptName == 'LibCsv2_mod.au3' Then
;~   _CSVTest("ABRS.csv")
;~ EndIf

I hope this is of use to someone.

LibCsv2_mod.au3

Edited by FichteFoll
Link to comment
Share on other sites

  • 6 months later...

FichteFoll has a very nice script, and I thank him. If anybody has trouble with it, as I did, the original Fiedl expression from his book seems to have a backslash before the G near the start of the expression. Once I put the backslash in, it worked great for me.

(the google books link below may or may not take you to the exact page)

http://books.google.com/books?id=sshKXlr32-AC&pg=PA213&lpg=PA213&dq=friedl+csv+regular+expression&source=bl&ots=d9O_1PafGi&sig=UuVi-teJFE1LDC82HnDg17yIj5Y&hl=en&sa=X&ei=NwckUOnIA8HlqgHR-YHQDA&ved=0CFQQ6AEwBA#v=onepage&q=friedl%20csv%20regular%20expression&f=false

Link to comment
Share on other sites

  • 4 years later...

I found that a few CSV files I have contain a space between the comma delimiter and and fields containing double quotes. This caused the Regular Expression to choke.  I was able to fix the regular expression in FichteFoll's modified library by changing  the line $pattern &= '(?:'    to   $pattern &= '(?: *'  in _CSVReadFile

(Note, I just added a space and an asterisk after the colon in the bolded text above.  This may not show up well in my post so I figured I'd spell it out)

This eats any extra spaces that may exist.

Here's an example of data that caused the issue:

-76.38470000, 40.54840000, "AA3RG/145.17000-", "Pine Grove PL:110.9 DCS: IRLP: ECHO:"
-76.37270000, 40.50400000, "AA3RG-R/146.64000-", "Brookside PL: DCS: IRLP: ECHO:149493"
-79.18292900, 41.85703200, "AB3AA/145.27000-", "Warren PL:173.8 DCS: IRLP: ECHO:"

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...