Jump to content
detrophy

Help needed to understand csv to array but with splitting

Recommended Posts

Hey fellow people,

I'm trying to create a script that does the following:

1. import a csv with comma separated values.
2. put every line of it into a 1D array
3. read array line 1 (not 0)
4. create for every word that is separated with a comma a new column
5. repeat with next line

A 2D array should be created by now and its filled with values like in an excel sheet.

 

This is my code atm, but I don't exactly understand why it won't work. May someone explain to me what I'm doing wrong? Thank you!

 

Func opensheet()

Dim $i, $a_temp, $aRowsF, $aRowsS, $n = 1, $aLFC = 0, $aLFR = 0, $aLSC = 0, $aLSR = 0
Local $aLinesF[$aLFR][$aLFC], $aLinesS[$aLSR][$aLSC], $aStringSplitted

$sheet = FileOpenDialog("Open File",@DesktopDir,"All(*.*)",$FD_FILEMUSTEXIST)


_FileReadToArray($sheet, $aLinesF,$FRTA_COUNT)
$aRowCountF = UBound($aLinesF, $UBOUND_ROWS)
_ArrayDisplay($aLinesF,"test")
        For $aLFR = 1 to $aRowCountF Step +1
            $aStringSplitted = StringSplit($aLinesF[$aLFR][$aLFC], ",")
            $aRowsS = UBound($aStringSplitted, $UBOUND_ROWS)
            ;MsgBox(0,"",$aRowsS & "BLUB!")
                For $iS = 1 to $aRowsS Step +1
                    _ArrayColInsert($aLinesF,$n)
                    $n = $n +1
                    ;MsgBox(0,"",$n)
                Next
        Next
_ArrayDisplay($aLinesF,"test")
_ArrayDisplay($aStringSplitted,"test")
    ;MsgBox(0,"",$aLines[3])
EndFunc

 

Share this post


Link to post
Share on other sites

I forgot some essential informations:

The array has to be dynamic, as the csv table will be dynamic as well. The expected range is somewhat between 50 to 50k lines and its uncertain if they will contain more columns in the future. I cannot give you an example of this csv, as they contain customer data. But it contains atm about 60 values (between commas) per line.

It doesn't matter if the script has to work 2 or more minutes to go through all the data, as doing it manually consumes as much time (some formatting afterwards is necessary).

Share this post


Link to post
Share on other sites

Are the number of columns identical from row to row inside one file ?  If so you could use the additional parameter $sDelimiter = "," of _FileReadToArray.

From your script, you seem to create new columns for each and every row, not a good idea...

Edited by Nine

Share this post


Link to post
Share on other sites

detrophy,

        Not sure if your problem to create a 2D array comes from unequal columns per line.   If that is the issue then use this:

Global $aArray

_FileReadToArray('PathtoyourFile\yourfilename.csv', $aArray, $FRTA_INTARRAYS , ",")
$col = 0

For $i = 0 to ubound($aArray) - 1
     $col = ubound($aArray[$i]) > $col ?  ubound($aArray[$i]) : $col
next

local $aFinal[0][$col]

For $j = 0 to ubound($aArray) - 1
    _ArrayAdd($aFinal , _ArrayToString($aArray[$j] , @CRLF) , 0 , @CRLF , ",")
Next

_ArrayDisplay($aFinal)

This will create a 2D array from your csv even if each line is of different lengths. 

Share this post


Link to post
Share on other sites
9 hours ago, Nine said:

Are the number of columns identical from row to row inside one file ?  If so you could use the additional parameter $sDelimiter = "," of _FileReadToArray.

From your script, you seem to create new columns for each and every row, not a good idea...

The number of columns is identical for every row, yes. But I want to be flexible in the case some vaules per row get removed or are added permanently.

Like this as an example:

A-B-C-D-E-F
1-2-3-4-5-6
4-5-2-2-3-4

To:
A-B-C-E-F
1-2-3-4-5
3-4-5-2-1

The script has to adapt to this situation.

 

I want to do this:
CSV

1,2,3,4,5,67,8,9,0
2,3,4,5,6,7,8,9,0,1

table

[1][2][3][4][5] ...
[2][3][4][5][6] ...

To be able to do this, I create for every Row (starting with 1) a new 1D array. Overwrite the first value and create a new column for every row.
Repeat.

It's somewhat inefficient, but I can't imagine a better way.

8 hours ago, xcaliber13 said:

detrophy,

        Not sure if your problem to create a 2D array comes from unequal columns per line.   If that is the issue then use this:

Global $aArray

_FileReadToArray('PathtoyourFile\yourfilename.csv', $aArray, $FRTA_INTARRAYS , ",")
$col = 0

For $i = 0 to ubound($aArray) - 1
     $col = ubound($aArray[$i]) > $col ?  ubound($aArray[$i]) : $col
next

local $aFinal[0][$col]

For $j = 0 to ubound($aArray) - 1
    _ArrayAdd($aFinal , _ArrayToString($aArray[$j] , @CRLF) , 0 , @CRLF , ",")
Next

_ArrayDisplay($aFinal)

This will create a 2D array from your csv even if each line is of different lengths. 

 

 

To clarify what I want to do:
I want to replicate the function "Text to columns" from excel in which it splits text into new cells.

Share this post


Link to post
Share on other sites

How's this? _ArrayAdd does most of the work for you :) (It assumes that the header line will contain the correct number of cells, but the number can change)

#include <Array.au3>

Main()

Func Main() 
    
    Local $sFile = @ScriptDir & "\test.csv"
    ; Get the file contents
    Local $sFileData = FileRead($sFile)
    ; Get the first line
    Local $sFirstLine = FileReadLine($sFile, 1)
    ; Create the array using the number of commas in the first line
    Local $asCSV[0][UBound(StringSplit($sFirstLine, ",", 3))]
    ; Fill the array
    _ArrayAdd($asCSV, $sFileData, 0, ",")
    
    ; For debugging only...
    _ArrayDisplay($asCSV)

EndFunc

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Share this post


Link to post
Share on other sites

Still not clear to me as your examples show the same number of columns per row.  This will work even if the number of columns changes from file to file . Try this :

#include <Constants.au3>
#include <File.au3>

Local $sFile = @ScriptDir & "\test.csv"
Local $aArray
_FileReadToArray ($sFile,$aArray,$FRTA_COUNT,",")
If @error Then Exit MsgBox ($MB_SYSTEMMODAL,"","Error reading file")
_ArrayDisplay ($aArray)

But if the number of columns varies from row to row into a single file, you could do this instead :

#include <Constants.au3>
#include <File.au3>
#include <Math.au3>

Local Const $MAXIMUM_OF_COLUMNS = 100 ; this is the largest value that it will never exceed
Local $sFile = @ScriptDir & "\test.csv"
Local $aArray, $aTemp, $iMax = 0
_FileReadToArray ($sFile,$aArray,$FRTA_COUNT)
If @error Then Exit MsgBox ($MB_SYSTEMMODAL,"","Error reading file")
Local $aFinalArray[$aArray[0]][$MAXIMUM_OF_COLUMNS]
For $i = 1 to $aArray[0]
  $aTemp = StringSplit ($aArray[$i],",")
  $iMax = _Max ($iMax, $aTemp[0])
  For $j = 1 to $aTemp[0]
    $aFinalArray[$i-1][$j-1]=$aTemp[$j]
  Next
Next
ReDim $aFinalArray[$aArray[0]][$iMax]
_ArrayDisplay ($aFinalArray)

ReDim is very fast compare to creating dynamically columns and rows (even if you do not care much about speed).  Try it you will see...

Share this post


Link to post
Share on other sites

Thank you @Nine, your second example works, with a few little changes three times faster.

For a 1500 x 60 array, the code works for about 9 seconds.

Can someone point me to a good reference where I can learn, how to filter this -> ", <- but not this -> ," <- ?

I found some nice Regex tutorials and documentation, but they did not help me much.

Share this post


Link to post
Share on other sites
3 hours ago, detrophy said:

Can someone point me to a good reference where I can learn

Help file and wiki autoit are good places to start.  But there is not (that I know of) a reference on "Not Good Way".  I suppose it is a question of trial and error (in other words, with experience)...Good luck !

Share this post


Link to post
Share on other sites

I've recently learned more about .csv files while building an import file for Jira.  If you just split the file by , you may get messed up as you can actually add multi-line  or commas enclosed within double quotes.  if it's just a very simple csv file you don't have to worry about this.  I'm not aware of a proper AutoIt reader for you to use as I was just building files.

http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs

 


hmm... I guess I have to have a signature...

Share this post


Link to post
Share on other sites

Also note, that as the link BigDaddyO posted states, not all CSV files are created equal. I was working with one today (from Bank of America) and found that it double quoted all fields and didn't include any double quotes in the fields, so I could use the following regular expression: (See it here)

Local $asRow = StringRegExp($sCsvContents, '"(.*?)",', 3)

It only works in very rare cases, but study your case and see if you can find a simple solution.


All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...