Jump to content

Parse CSV crash


Go to solution Solved by czardas,

Recommended Posts

I'm having some trouble debugging this issue. I use the following ParseCSV script:

Func _ParseCSV($sFile, $sDelimiters = ',;', $sQuote = '"', $iFormat = 0, $iAddIndex = 0, $AddHeader = 0)
    Local Static $aEncoding[6] = [0, 0, 32, 64, 128, 256]
    If $iFormat < -1 Or $iFormat > 6 Then
        Return SetError(3, 0, 0)
    ElseIf $iFormat > -1 Then
        Local $hFile = FileOpen($sFile, $aEncoding[$iFormat]), $sLine, $aTemp, $aCSV[1], $iReserved, $iCount
        If @error Then Return SetError(1, @error, 0)
        $sFile = FileRead($hFile)
        FileClose($hFile)
    EndIf
    If $sDelimiters = "" Or IsKeyword($sDelimiters) Then $sDelimiters = ',;'
    If $sQuote = "" Or IsKeyword($sQuote) Then $sQuote = '"'
    $sQuote = StringLeft($sQuote, 1)
    $iAddIndex = Number($iAddIndex=True)
    $AddHeader = Number($AddHeader=True)
    Local $srDelimiters = StringRegExpReplace($sDelimiters, '[\\\^\-\[\]]', '\\\0')
    Local $srQuote = StringRegExpReplace($sQuote, '[\\\^\-\[\]]', '\\\0')
    Local $sPattern = StringReplace(StringReplace('(?m)(?:^|[,])\h*(["](?:[^"]|["]{2})*["]|[^,\r\n]*)(\v+)?', ',', $srDelimiters, 0, 1), '"', $srQuote, 0, 1)
    Local $aREgex = StringRegExp($sFile, $sPattern, 3)
    If @error Then Return SetError(2, @error, 0)
    $sFile = '' ; save memory
    Local $iBound = UBound($aREgex), $iIndex = $AddHeader, $iSubBound = 1+$iAddIndex, $iSub = $iAddIndex, $sLast='' ;changed
    If $iBound Then $sLast = $aREgex[$iBound-1]
    Local $aResult[$iBound + $iAddIndex][$iSubBound] ;changed
    For $i = 0 To $iBound - 1
        If $iSub = $iSubBound Then
            $iSubBound += 1
            ReDim $aResult[$iBound][$iSubBound]
        EndIf
        Select
            Case StringLeft(StringStripWS($aREgex[$i], 1), 1) = $sQuote
                $aREgex[$i] = StringStripWS($aREgex[$i], 3)
                $aResult[$iIndex][$iSub] = $aREgex[$i]
;~                 $aResult[$iIndex][$iSub] = StringReplace(StringMid($aREgex[$i], 2, StringLen($aREgex[$i])-2), $sQuote&$sQuote, $sQuote, 0, 1)
            Case StringRegExp($aREgex[$i], '^\v+$') ; StringLen($aREgex[$i]) < 3 And StringInStr(@CRLF, $aREgex[$i]) ;new line found
                StringReplace($aREgex[$i], @LF, "", 0, 1)
                $iIndex += @extended
                $iSub = $iAddIndex ;changed
                ContinueLoop
            Case Else
                $aResult[$iIndex][$iSub] = $aREgex[$i]
        EndSelect
        $aREgex[$i] = 0 ; save memory
        $iSub += 1
        If $iAddIndex Then $aResult[$iIndex][0] = $iIndex ;added
    Next
    If Not StringRegExp($sLast, '^\v+$') Then $iIndex+=1
    ReDim $aResult[$iIndex][$iSubBound - 1]
    If $iAddIndex Then $aResult[0][0] = "Index" ;added
    If $AddHeader Then
        For $i = 1 To $iSubBound - 2
            $aResult[0][$i] = "Col" & $i
        Next
    EndIf
    Return $aResult
EndFunc   ;==>_ParseCSV

but on some of the files I am importing to an array using this method, the script crashes (No error in SCITE but a windows crash dialog) giving the following:

 

Problem signature:

  Problem Event Name: APPCRASH
  Application Name: autoit3.exe
  Application Version: 3.3.13.19
  Application Timestamp: 53fa0936
  Fault Module Name: autoit3.exe
  Fault Module Version: 3.3.13.19
  Fault Module Timestamp: 53fa0936
  Exception Code: c00000fd
  Exception Offset: 000102f9
  OS Version: 6.1.7601.2.1.0.256.48
  Locale ID: 1033
  Additional Information 1: 4181
  Additional Information 2: 41810812715e7a6496820cf28371c79c
  Additional Information 3: 9615
  Additional Information 4: 9615c990bcbe602452124ea23ad11713
 
Read our privacy statement online:
 
If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\System32\en-USerofflps.txt
 

I have tried deleting some of the rows in the CSV file and sometimes it works, other times it doesn't. I think the issue might be with the amount of memory I have because if I keep it to 100 records (out of 1600) it never crashes. One of the records is a very long text field that may also be causing the issue. Any suggestions on how to figure out what the issue is here so I can actually handle the exception?

Link to comment
Share on other sites

Still errors on the Prod version:

 

Problem signature:

  Problem Event Name: APPCRASH
  Application Name: AutoIt3.exe
  Application Version: 3.3.12.0
  Application Timestamp: 538b6694
  Fault Module Name: AutoIt3.exe
  Fault Module Version: 3.3.12.0
  Fault Module Timestamp: 538b6694
  Exception Code: c00000fd
  Exception Offset: 0000b6d9
  OS Version: 6.1.7601.2.1.0.256.48
  Locale ID: 1033
  Additional Information 1: 4181
  Additional Information 2: 41810812715e7a6496820cf28371c79c
  Additional Information 3: f2de
  Additional Information 4: f2deb8544d46ccbdca5fc1495bc67d73
 
Read our privacy statement online:
 
If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\System32\en-USerofflps.txt
 
Link to comment
Share on other sites

_FileReadToArray now processes CSV files as well.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Just out of interest: are you always using the same $dDelimiters and $sQuote, and if so, which? Because if you're pushing pipes or tabs or some other interesting character into those regexes, things could get really ugly really quickly with those replacements :)

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

Also, would you happen to have a data file with which to reproduce? Maybe you can replace all characters in values by "x" if the data is sensitive or something. Maybe that's a good test anyway...

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

@SadBunny

Here is a link to an example file. I do use formatting in my text field, but on most of the records it doesn't seem to be an issue and it works fine. 

@water

I tried _FileReadToArray but it always has a return value of 0, 0 for the error value, and 0 for the extended value

@czardas

I'm not sure I understand how to use your script. They seem to be parsing arrays to CSV and not parsing CSVs to arrays... am I missing something?

Link to comment
Share on other sites

Looks like a stack overflow error

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

  • Solution

CSVSplit creates an array from a csv formatted string. As I write the example I discover a bug with enclosed double quotes ==> ,"""", The function needs rewriting.

;

#include <array.au3>

Local $aArray = _CSVSplit('"1","""""""""", hello world' & @LF & '(*), "1,2,3,4,5", "@"' & @LF)
_ArrayDisplay($aArray)


; #FUNCTION# ====================================================================================================================
; Name...........: _CSVSplit
; Description ...: Converts a string in CSV format to a two dimensional array (see comments)
; Syntax.........: CSVSplit ( $aArray [, $sDelim ] )
; Parameters ....: $aArray  - The array to convert
;                  $sDelim  - Optional - Delimiter set to comma by default (see 2nd comment)
; Return values .: Success  - Returns a two dimensional array or a one dimensional array (see 1st comment)
;                  Failure  - Sets @error to:
;                 |@error = 1 - First parameter is not a valid string
;                 |@error = 2 - Second parameter is not a valid string
;                 |@error = 3 - Could not find suitable delimiter replacements
; Author ........: czardas
; Comments ......; Returns a one dimensional array if the input string does not contain the delimiter string
;                ; Some CSV formats use semicolon as a delimiter instead of a comma
;                ; Set the second parameter to @TAB To convert to TSV
; ===============================================================================================================================

Func _CSVSplit($string, $sDelim = ",") ; Parses csv string input and returns a one or two dimensional array
    If Not IsString($string) Or $string = "" Then Return SetError(1, 0, 0) ; Invalid string
    If Not IsString($sDelim) Or $sDelim = "" Then Return SetError(2, 0, 0) ; Invalid string

    $string = StringRegExpReplace($string, "[\r\n]+\z", "") ; [Line Added] Remove training breaks
    Local $iOverride = 63743, $asDelim[3] ; $asDelim => replacements for comma, new line and double quote
    For $i = 0 To 2
        $asDelim[$i] = __GetSubstitute($string, $iOverride) ; Choose a suitable substitution character
        If @error Then Return SetError(3, 0, 0) ; String contains too many unsuitable characters
    Next
    $iOverride = 0

    Local $aArray = StringRegExp($string, '\A[^"]+|("+[^"]+)|"+\z', 3) ; Split string using double quotes delim - largest match
    $string = ""

    Local $iBound = UBound($aArray)
    For $i = 0 To $iBound -1
        $iOverride += StringInStr($aArray[$i], '"', 0, -1) ; Increment by the number of adjacent double quotes per element
        If Mod ($iOverride +2, 2) = 0 Then ; Acts as an on/off switch
            $aArray[$i] = StringReplace($aArray[$i], $sDelim, $asDelim[0]) ; Replace comma delimeters
            $aArray[$i] = StringRegExpReplace($aArray[$i], "(\r\n)|[\r\n]", $asDelim[1]) ; Replace new line delimeters
        EndIf
        $aArray[$i] = StringReplace($aArray[$i], '""', $asDelim[2]) ; Replace double quote pairs
        $aArray[$i] = StringReplace($aArray[$i], '"', '') ; Delete enclosing double quotes - not paired
        $aArray[$i] = StringReplace($aArray[$i], $asDelim[2], '"') ; Reintroduce double quote pairs as single characters
        $string &= $aArray[$i] ; Rebuild the string, which includes two different delimiters
    Next
    $iOverride = 0

    $aArray = StringSplit($string, $asDelim[1], 2) ; Split to get rows
    $iBound = UBound($aArray)
    Local $aCSV[$iBound][2], $aTemp
    For $i = 0 To $iBound -1
        $aTemp = StringSplit($aArray[$i], $asDelim[0]) ; Split to get row items
        If Not @error Then
            If $aTemp[0] > $iOverride Then
                $iOverride = $aTemp[0]
                ReDim $aCSV[$iBound][$iOverride] ; Add columns to accomodate more items
            EndIf
        EndIf
        For $j = 1 To $aTemp[0]
            If StringLen($aTemp[$j]) Then
                If Not StringRegExp($aTemp[$j], '[^"]') Then ; Field only contains double quotes
                    $aTemp[$j] = StringTrimLeft($aTemp[$j], 1) ; Delete enclosing double quote single char
                EndIf
                $aCSV[$i][$j -1] = $aTemp[$j] ; Populate each row
            EndIf
        Next
    Next

    If $iOverride > 1 Then
        Return $aCSV ; Multiple Columns
    Else
        For $i = 0 To $iBound -1
            If StringLen($aArray[$i]) And (Not StringRegExp($aArray[$i], '[^"]')) Then ; Only contains double quotes
                $aArray[$i] = StringTrimLeft($aArray[$i], 1) ; Delete enclosing double quote single char
            EndIf
        Next
        Return $aArray ; Single column
    EndIf

EndFunc ;==> _CSVSplit

; #INTERNAL_USE_ONLY# ===========================================================================================================
; Name...........: __GetSubstitute
; Description ...: Searches for a character to be used for substitution, ie one not contained within the input string
; Syntax.........: __GetSubstitute($string, ByRef $iCountdown)
; Parameters ....: $string   - The string of characters to avoid
;                  $iCountdown - The first code point to begin checking
; Return values .: Success   - Returns a suitable substitution character not found within the first parameter
;                  Failure   - Sets @error to 1 => No substitution character available
; Author ........: czardas
; Comments ......; This function is connected to the function _CSVSplit and was not intended for general use
;                  $iCountdown is returned ByRef to avoid selecting the same character on subsequent calls to this function
;                  Initially $iCountown should be passed with a value = 63743
; ===============================================================================================================================

Func __GetSubstitute($string, ByRef $iCountdown)
    If $iCountdown < 57344 Then Return SetError(1, 0, "") ; Out of options
    Local $sTestChar
    For $i = $iCountdown To 57344 Step -1
        $sTestChar = ChrW($i)
        $iCountdown -= 1
        If Not StringInStr($string, $sTestChar) Then
            Return $sTestChar
        EndIf
    Next
    Return SetError(1, 0, "") ; Out of options
EndFunc ;==> __GetSubstitute
Edited by czardas
Link to comment
Share on other sites

  • Moderators

Jewtus,

_FileReadToArray works fine on that file. Alhough as the lines do not have the same number of elements when split you need to create an "array of arrays" to hold the data - and then do a bit more work to get it all into one array which you expand as necessary to fit the max line size: :)

#include <Array.au3>
#include <File.au3>

; Read file
Global $aContent
_FileReadToArray("test.csv", $aContent, $FRTA_INTARRAYS + $FRTA_COUNT, ",")

; Create final array
Global $aFinal[$aContent[0]][1]
; And fill - expanding the array aas necessary
For $i = 1 To $aContent[0]
    If ($aContent[$i])[0] > UBound($aFinal, 2) Then
        ReDim $aFinal[$aContent[0]][($aContent[$i])[0]]
    EndIf
    For $j = 1 To ($aContent[$i])[0]
        $aFinal[$i - 1][$j - 1] = ($aContent[$i])[$j]
    Next
Next

; Display the result
_ArrayDisplay($aFinal, "Final", Default, 8)
M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

I'm going to keep messing with this, but its Friday and I don't feel like trying to get this fixed today haha.

Thanks for the suggestions all. I'll let you know what I come up with on Monday.

Link to comment
Share on other sites

@Jewtus

Can you please test this function whether it works properly?

#include-once

; #FUNCTION# ========================================================================================================================================
; Name .................:   StringSplit2D()
; Description ..........:   Splits  a string into columns instead of rows as it is done by SplitString(), like a csv file to a 2d array ;-)
; Syntax ...............:   StringSplit2D($sString[, $sDelimiter = ";"[, $bStripWS = True[, $sQuotationMark = '"'[, $sDummy = "¦"[, $iWidthLen = 256]]]]])
; Parameters ...........:   $sString             - A string value.
;                           $sDelimiter          - [optional] A string value. Default is ";".
;                           $bStripWS            - [optional] A binary value. Default is True.
;                           $sQuotationMark      - [optional] A string value. Default is '"'.
;                           $sDummy              - [optional] A string value. Default is "¦".
;                           $iWidthLen           - [optional] An integer value. Default is 256.
; Return values .......:    Success - 2d array
;                           Error 1 - $sStringis not set
;                           Error 2 - $delimter is not set
;                           Error 3 - error splitting string
;                           Error 4 - array width exceeded
;
; Version .............:    v0.97 build 2015-03-03 beta
; Author ..............:    UEZ
; Modified ............:
; Comment .............:    former StringSplitW
; Remarks .............:    RegEx take from http://stackoverflow.com/questions/4476812/regular-expressions-how-to-replace-a-character-within-quotes
; Related .............:    StringSplit, StringReplace, StringRegExpReplace, StringLen, StringStripCR
; ===================================================================================================================================================
Func StringSplit2D($sString, $sDelimiter = ";", $bStripWS = True, $sQuotationMark = '"', $sDummy = "¦", $iWidthLen = 256)
    If $sString = "" Then Return SetError(1, 0, 0)
    If $sDelimiter = "" Then Return SetError(2, 0, 0)
    Local $aSplit =  StringSplit(StringStripCR($sString), @LF)
    If @error Then Return SetError(3, 0, 0)
    Local $chk, $iWidth, $i, $j, $k, $iLen, $iMax = 1, $iMaxWidth, $iSum = 0
    Local $aPos[1], $l = 0
    Local $aVertical[$aSplit[0]][$iWidthLen], $iDelimiterLen = StringLen($sDelimiter) - 1, $sLine
    For $k = 1 To $aSplit[0]
        $iLen = StringLen($aSplit[$k])
        If $iLen > 1 Then
            Switch $bStripWS
                Case True
                    $sLine = StringRegExpReplace($aSplit[$k], '(?m)\' & $sDelimiter & '(?=[^' & $sQuotationMark & ']*' & $sQuotationMark & '(?:[^' & $sQuotationMark & '\r\n]*' & $sQuotationMark & '[^' & $sQuotationMark & ']*' & $sQuotationMark & ')*[^' & $sQuotationMark & '\r\n]*$)', $sDummy)
                Case Else
                    $sLine = StringRegExpReplace(StringStripWS($aSplit[$k], 4), '(?m)\' & $sDelimiter & '(?=[^' & $sQuotationMark & ']*' & $sQuotationMark & '(?:[^' & $sQuotationMark & '\r\n]*' & $sQuotationMark & '[^' & $sQuotationMark & ']*' & $sQuotationMark & ')*[^' & $sQuotationMark & '\r\n]*$)', $sDummy)
            EndSwitch
            $chk = StringReplace($sLine, $sDelimiter, $sDelimiter)
            $iWidth = @extended
            $iSum += $iWidth
            If $iWidth > $iWidthLen Then Return SetError(4, 0, 0)
            If $iWidth >= $iMax Then $iMax = $iWidth + 1
            Switch $iWidth
                Case 0
                    $aVertical[$l][0] = $sLine
                Case Else
                    Dim $aPos[$iWidth * 2 + 2]
                    $j = 1
                    $aPos[0] = 1
                    For $i = 0 To $iWidth - 1
                        $aPos[$j] = StringInStr($sLine, $sDelimiter, 0, $i + 1) - 1
                        $aPos[$j + 1] = $aPos[$j] + 2 + $iDelimiterLen
                        $j += 2
                    Next
                    $aPos[UBound($aPos) - 1] = StringLen($sLine)
                    $j = 0
                    For $i = 0 To UBound($aPos) - 1 Step 2
                        $aVertical[$l][$j] = StringMid(StringReplace($sLine, $sDummy, $sDelimiter), $aPos[$i], $aPos[$i + 1] - $aPos[$i] + 1)
                        $j += 1
                    Next
                EndSwitch
                $l += 1
        EndIf
    Next
    ReDim $aVertical[$l][$iMax]
    Return $aVertical
EndFunc

Your test.csv is confusing...

THANKS!

Edit: seems that also my function fails on a single double quote...

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

#include <array.au3>

Global $aFinal[0][0]
Global $aFinTemp

$sString = '"1", """""""""", hello world' & @LF & '(*), "1,2,3,4,5", "@", "4TH"' & @LF & '"2", """""""""", hello 2 world , "4TH" , "5TH"' & @LF

Global $sColumns = ", "
Global $sRows = @LF

_ArrayDisplay(_Splitter($sString , $sColumns , $sRows))

Func _Splitter($sString , $sColDelim , $sRowDelim)

$aString = stringsplit($sString , $sRowDelim , 2)

for $i = 0 to ubound($aString) - 1
    $aLine = stringsplit($aString[$i] , $sColDelim , 1)
    If ubound($aFinal , 2) < ubound($aLine) - 1 then
        $aFinTemp = $aFinal
        redim $aFinal[0][ubound($aLine) - 1]
        _ArrayAdd($aFinal , $aFinTemp)
        EndIf
    _ArrayAdd($aFinal , _ArrayToString($aLine , "|" , 1))
Next

Return $aFinal

EndFunc ;_Splitter

edit:  in Func form

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...