Sign in to follow this  
Followers 0
Guest Michael.J.Curley

Multi-Dimensional array creation

5 posts in this topic

I am trying to write a script that will convert rather large (up to 20 mb) text files into a different format... I import this data into access to create reports, but the number/date formats/null values cause several import errors in access,meaning I lose some of the data. For some reason I decided the best approach would be to convert the data into a multi-dimensional array. However, I am having 3 problems with this approach.

First, the program seems to get exponentially slower the larger the $ZSE16In... I learned this by using two single dimensional arrays. I figured reducing the number of copies of the huge variable would help to speed the process.

Second, in a medium sized file (about 250K), the further the progress, the slower it gets.

Third, I have no idea how to create a multi-dimensional array (without dim'ing it).

Example input text....

Table : T001W

Displayed fields: 4 of 4 Fixed columns: 1 List width 0250

------------------------------------------------------------------

| |WERKS|NAME1 |KUNNR |LIFNR |

------------------------------------------------------------------

| |0001 |WERKS | | |

------------------------------------------------------------------

First 2 rows would be wasted space. Same with the third. fourth row is the column names, sixth row is the actual data. There can be up to several hundreds of thousands of rows (and as many as 30 columns)

Text is normally from a clipget(), which is placed there by the SAP transaction ZSE16 (hence the name) :D . In the code snippet below, I am using a fileread to get the text from some previously saved files.

$InputFile = FileOpenDialog("Select Input File", @ScriptDir, "Text Files (*.txt)", 1)

$InputString = FileRead($InputFile, FileGetSize($InputFile))

FormatZSE16($InputString)

$OutPutFile = Fileopen(ValExt(FileSaveDialog("Select output File", @ScriptDir, "Text Files (*.txt)", 27), "txt"),2)

FileWrite($OutPutFile, $InPutString)

Exit

Func ValExt($FileName, $Extension)

If StringRight($FileName,StringLen($Extension)+1) = "." & $Extension Then
  Return $FileName
Else
  Return $FileName & "." & $Extension
EndIf

EndFunc

Func FormatZSE16(ByRef $Zse16In)

ProgressOn("Reformatting Text File","The Moo Cow Says...", "Splitting file into seperate lines...",0,0)

$ZSE16In = StringSplit($ZSE16In, @LF)
$Timer = TimerStart()

$TotalRecords = UBound($ZSE16In)-1

For $i = 1 to $TotalRecords
  
  $Progress = Round(($i / $TotalRecords)*100, 0)
  
   ProgressSet($Progress, "Currently Processing Record " & $i & " of " & $TotalRecords, Remaining($Timer, $TotalRecords, $i) & " Remaining")
   
  If StringInStr($ZSE16In[$i], "| |") Then
    
    $ZSE16In[$i] = StringReplace($ZSE16In[$i], "| |", "")
    $ZSE16In[$i] = StringLeft($ZSE16In[$i], StringLen($ZSE16In[$i]) - 2)
    $ZSE16In[$i] = StringSplit($ZSE16In[$i], "|")
    
    $TotalColumns = UBound($ZSE16In[$i])-1
    For $j = 1 to $TotalColumns
    MsgBox(0, $i & " - " & $j, "$i -  $j",1)
      
      $ZSE16In[$i][$j] = StringStripWS($ZSE16In[$i][$j], 7)
      
      If StringInStr($ZSE16In[$i][$j], ".") Then
        $ZSE16In[$i][$j] = StringReplace($ZSE16In[$i][$j], ",", "")
      EndIf

      If $ZSE16In[$i][$j] = "00000" Then
        $ZSE16In[$i][$j] = ""
      EndIf
      
      If $ZSE16In[$i][$j] = "0000.00.00" Then
        $ZSE16In[$i][$j] = ""
      EndIf
      
      If $ZSE16In[$i][$j] = ". ." Then
        $ZSE16In[$i][$j] = ""
      EndIf
      
      If $j = 1 Then
        $Temp = $ZSE16In[$i][$j]
      Else
        $Temp = $Temp & @TAB & $ZSE16In[$i][$j]
      EndIf
      
    Next
  
  $TempOut = $TempOut & $Temp & @CRLF
  
  EndIf
Next

ProgressOff()

$ZSE16In = $TempOut

EndFunc

Func Remaining($TimerInUse, $TotalRecords, $CurrentRecord)
  
  $TimeNow = TimerStop($TimerInUse) 
  $TimePerRec = ($TimeNow  / $CurrentRecord) 
  $TotalTime = $TimePerRec * $TotalRecords
  $TimeLeft = $TotalTime - $TimeNow

  $Hours = Int($TimeLeft / 3600000)
  $TimeLeft = $TimeLeft - ($Hours * 3600000)

  $Minutes = Int($TimeLeft / 60000)
  $TimeLeft = $TimeLeft - ($Minutes * 60000)

  $Seconds = Round($TimeLeft / 1000, 0)
  
  Return StringRight( "00" & $Hours, 2) & ":" & StringRight( "00" & $Minutes, 2) & ":" &  StringRight( "00" & $Seconds, 2)

EndFunc

I think I included all the udfs... Hope so...

$ZSE16In[$i][$j] = StringStripWS($ZSE16In[$i][$j], 7) is the line that is throwing an error about subscript out of range. That is the first time I try to pull the value out of what I thought was a multidimensional array. Strangely, the line 2 above it ($TotalColumns = UBound($ZSE16In[$i])-1) does not fail, and produces the correct number of "columns" in the file.

Also, for a 2 mb file (and almost 20 minutes), script uses about 33-45 mb of memory, as reported by task manager. Does that seem a little odd to anyone else?

BTW- the reason I am trying to do this with autoit is that another script will be responsible for executing the transaction and downloading these text files.

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I am trying to write a script that will convert rather large (up to 20 mb) text files into a different format...  I import this data into access to create reports, but the number/date formats/null values cause several import errors in access,meaning I lose some of the data.  For some reason I decided the best approach would be to convert the data into a  multi-dimensional array.  However, I am having 3 problems with this approach.

Autoit won't handle null values in a file. it assumes the input ends there.

First, the program seems to get exponentially slower the larger the $ZSE16In...  I learned this by using two single dimensional arrays.  I figured reducing the number of copies of the huge variable would help to speed the process.

Doesn't it make more sence to process the file record by record with a file of this size. There have been posts a while back, explaining the issue why it gets slower....

Second, in a medium sized file (about 250K), the further the progress, the slower it gets.

Third, I have no idea how to create a multi-dimensional array (without dim'ing it).

You can only do that with dim. The stringsplit you use will return a single dimention array...

Something like this (Untested):

$INPUTFILE = FileOpenDialog("Select Input File", @ScriptDir, "Text Files (*.txt)", 1)
$H_OUTPUTFILE = FileOpen(ValExt(FileSaveDialog("Select output File", @ScriptDir, "Text Files (*.txt)", 27), "txt"), 2)
$H_INPUTFILE = FileOpen($INPUTFILE,0)
While 1
   $INPUTSTRING = FileReadLine($H_INPUTFILE)
   If @error then ExitLoop
   FormatZSE16($INPUTSTRING)
   FileWriteLine($H_OUTPUTFILE, $INPUTSTRING)
Wend
FileClose($H_OUTPUTFILE)
FileClose($H_InputFILE)
Exit
Func ValExt($FILENAME, $EXTENSION)
   If StringRight($FILENAME, StringLen($EXTENSION) + 1) = "." & $EXTENSION Then
      Return $FILENAME
   Else
      Return $FILENAME & "." & $EXTENSION
   EndIf
EndFunc  ;==>ValExt
Func FormatZSE16(ByRef $ZSE16IN)
   If StringInStr($ZSE16IN[$I], "| |") Then
      $ZSE16IN = StringReplace($ZSE16IN, "| |", "")
      $ZSE16IN = StringLeft($ZSE16IN, StringLen($ZSE16IN) - 2)
      $ZSE16IN = StringSplit($ZSE16IN, "|")
      For $J = 1 To $ZSE16IN[0]
         MsgBox(0, $I & " - " & $J, "$i -  $j", 1)
         $ZSE16IN[$j] = StringStripWS($ZSE16IN[$J], 7)
         If StringInStr($ZSE16IN[$J], ".") Then
            $ZSE16IN[$J] = StringReplace($ZSE16IN[$J], ",", "")
         EndIf
         If $ZSE16IN[$J] = "00000" Then
            $ZSE16IN[$J] = ""
         EndIf
         If $ZSE16IN[$J] = "0000.00.00" Then
            $ZSE16IN[$J] = ""
         EndIf
         If $ZSE16IN[$J] = ". ." Then
            $ZSE16IN[$J] = ""
         EndIf
         If $J = 1 Then
            $TEMP = $ZSE16IN[$J]
         Else
            $TEMP = $TEMP & @TAB & $ZSE16IN[$J]
         EndIf
      Next
      $TEMPOUT = $TEMPOUT & $TEMP & @CRLF
   EndIf
   $ZSE16IN = $TEMPOUT
EndFunc  ;==>FormatZSE16
Func Remaining($TIMERINUSE, $TOTALRECORDS, $CURRENTRECORD)
   $TIMENOW = TimerStop($TIMERINUSE) 
   $TIMEPERREC = ($TIMENOW / $CURRENTRECORD) 
   $TOTALTIME = $TIMEPERREC * $TOTALRECORDS
   $TIMELEFT = $TOTALTIME - $TIMENOW
   $HOURS = Int($TIMELEFT / 3600000)
   $TIMELEFT = $TIMELEFT - ($HOURS * 3600000)
   $MINUTES = Int($TIMELEFT / 60000)
   $TIMELEFT = $TIMELEFT - ($MINUTES * 60000)
   $SECONDS = Round($TIMELEFT / 1000, 0)
   Return StringRight( "00" & $HOURS, 2) & ":" & StringRight( "00" & $MINUTES, 2) & ":" & StringRight( "00" & $SECONDS, 2)
EndFunc  ;==>Remaining
Edited by JdeB

Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Autoit won't handle null values in a file. it assumes the input ends there.

There are no ascii null values in the text. Just dates (where date format = yyyy.mm.dd) like "0000.00.00" and " . . ". These are the primary cause of my import errors in MS Access.

Doesn't it make more sence to process the file record by record with a file of this size. There have been posts a while back, explaining the issue why it gets slower....

Yes and no. At what size is it faster to take the contents off the clipboard, write them to a file on the hard disk.... That whole fileread/write portion of the script is just there for testing purposes. Once I get the script running right, FormatZSE16(ByRef $ZSE16IN) will be a #included UDF, and $ZSE16In will be ClipGet(). I guess I was under the impressions that filereadline(file, 100000) would fileread a byte at a time until arrived at the 100000th line (and then do it all over again for the 100001th file, when in a loop). But that may not be the case. I will do more testing.

You can only do that with dim. The stringsplit you use will return a single dimention array...

Something like this (Untested):

You suggestion does not break the input down to "columns" so and the validation rules have to work on column by column basis... EDIT: yes it does... nevermind.... Trying... Edited by Michael.J.Curley

Share this post


Link to post
Share on other sites

If no line number to read is given, the "next" line will be read. ("Next" for a newly opened file is initially the first line.)

That is what I was not getting.. silly me. Here I was trying to count the lines first, so I could setup the loop to read lines forcing it to each line number (and report progress). I still need to count the lines first (just for the progress bar), but I dont need to feed that to the loop.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

No need to count the lines for the progressbar first..

Just do a FileGetSize to get the total size of the file and keep the total lenght processed by adding the lenght of each record to a variable...

Edited by JdeB

Visit the SciTE4AutoIt3 Download page for the latest versions        Beta files                                                          Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0