Guest Michael.J.Curley Posted June 27, 2004 Share Posted June 27, 2004 I am trying to write a script that will convert rather large (up to 20 mb) text files into a different format... I import this data into access to create reports, but the number/date formats/null values cause several import errors in access,meaning I lose some of the data. For some reason I decided the best approach would be to convert the data into a multi-dimensional array. However, I am having 3 problems with this approach.First, the program seems to get exponentially slower the larger the $ZSE16In... I learned this by using two single dimensional arrays. I figured reducing the number of copies of the huge variable would help to speed the process.Second, in a medium sized file (about 250K), the further the progress, the slower it gets.Third, I have no idea how to create a multi-dimensional array (without dim'ing it).Example input text.... Table : T001WDisplayed fields: 4 of 4 Fixed columns: 1 List width 0250------------------------------------------------------------------| |WERKS|NAME1 |KUNNR |LIFNR |------------------------------------------------------------------| |0001 |WERKS | | |------------------------------------------------------------------ First 2 rows would be wasted space. Same with the third. fourth row is the column names, sixth row is the actual data. There can be up to several hundreds of thousands of rows (and as many as 30 columns)Text is normally from a clipget(), which is placed there by the SAP transaction ZSE16 (hence the name) . In the code snippet below, I am using a fileread to get the text from some previously saved files.expandcollapse popup$InputFile = FileOpenDialog("Select Input File", @ScriptDir, "Text Files (*.txt)", 1) $InputString = FileRead($InputFile, FileGetSize($InputFile)) FormatZSE16($InputString) $OutPutFile = Fileopen(ValExt(FileSaveDialog("Select output File", @ScriptDir, "Text Files (*.txt)", 27), "txt"),2) FileWrite($OutPutFile, $InPutString) Exit Func ValExt($FileName, $Extension) If StringRight($FileName,StringLen($Extension)+1) = "." & $Extension Then Return $FileName Else Return $FileName & "." & $Extension EndIf EndFunc Func FormatZSE16(ByRef $Zse16In) ProgressOn("Reformatting Text File","The Moo Cow Says...", "Splitting file into seperate lines...",0,0) $ZSE16In = StringSplit($ZSE16In, @LF) $Timer = TimerStart() $TotalRecords = UBound($ZSE16In)-1 For $i = 1 to $TotalRecords $Progress = Round(($i / $TotalRecords)*100, 0) ProgressSet($Progress, "Currently Processing Record " & $i & " of " & $TotalRecords, Remaining($Timer, $TotalRecords, $i) & " Remaining") If StringInStr($ZSE16In[$i], "| |") Then $ZSE16In[$i] = StringReplace($ZSE16In[$i], "| |", "") $ZSE16In[$i] = StringLeft($ZSE16In[$i], StringLen($ZSE16In[$i]) - 2) $ZSE16In[$i] = StringSplit($ZSE16In[$i], "|") $TotalColumns = UBound($ZSE16In[$i])-1 For $j = 1 to $TotalColumns MsgBox(0, $i & " - " & $j, "$i - $j",1) $ZSE16In[$i][$j] = StringStripWS($ZSE16In[$i][$j], 7) If StringInStr($ZSE16In[$i][$j], ".") Then $ZSE16In[$i][$j] = StringReplace($ZSE16In[$i][$j], ",", "") EndIf If $ZSE16In[$i][$j] = "00000" Then $ZSE16In[$i][$j] = "" EndIf If $ZSE16In[$i][$j] = "0000.00.00" Then $ZSE16In[$i][$j] = "" EndIf If $ZSE16In[$i][$j] = ". ." Then $ZSE16In[$i][$j] = "" EndIf If $j = 1 Then $Temp = $ZSE16In[$i][$j] Else $Temp = $Temp & @TAB & $ZSE16In[$i][$j] EndIf Next $TempOut = $TempOut & $Temp & @CRLF EndIf Next ProgressOff() $ZSE16In = $TempOut EndFunc Func Remaining($TimerInUse, $TotalRecords, $CurrentRecord) $TimeNow = TimerStop($TimerInUse) $TimePerRec = ($TimeNow / $CurrentRecord) $TotalTime = $TimePerRec * $TotalRecords $TimeLeft = $TotalTime - $TimeNow $Hours = Int($TimeLeft / 3600000) $TimeLeft = $TimeLeft - ($Hours * 3600000) $Minutes = Int($TimeLeft / 60000) $TimeLeft = $TimeLeft - ($Minutes * 60000) $Seconds = Round($TimeLeft / 1000, 0) Return StringRight( "00" & $Hours, 2) & ":" & StringRight( "00" & $Minutes, 2) & ":" & StringRight( "00" & $Seconds, 2) EndFuncI think I included all the udfs... Hope so...$ZSE16In[$i][$j] = StringStripWS($ZSE16In[$i][$j], 7) is the line that is throwing an error about subscript out of range. That is the first time I try to pull the value out of what I thought was a multidimensional array. Strangely, the line 2 above it ($TotalColumns = UBound($ZSE16In[$i])-1) does not fail, and produces the correct number of "columns" in the file.Also, for a 2 mb file (and almost 20 minutes), script uses about 33-45 mb of memory, as reported by task manager. Does that seem a little odd to anyone else?BTW- the reason I am trying to do this with autoit is that another script will be responsible for executing the transaction and downloading these text files. Link to comment Share on other sites More sharing options...
Developers Jos Posted June 27, 2004 Developers Share Posted June 27, 2004 (edited) I am trying to write a script that will convert rather large (up to 20 mb) text files into a different format... I import this data into access to create reports, but the number/date formats/null values cause several import errors in access,meaning I lose some of the data. For some reason I decided the best approach would be to convert the data into a multi-dimensional array. However, I am having 3 problems with this approach. Autoit won't handle null values in a file. it assumes the input ends there. First, the program seems to get exponentially slower the larger the $ZSE16In... I learned this by using two single dimensional arrays. I figured reducing the number of copies of the huge variable would help to speed the process.Doesn't it make more sence to process the file record by record with a file of this size. There have been posts a while back, explaining the issue why it gets slower.... Second, in a medium sized file (about 250K), the further the progress, the slower it gets. Third, I have no idea how to create a multi-dimensional array (without dim'ing it). You can only do that with dim. The stringsplit you use will return a single dimention array... Something like this (Untested): expandcollapse popup$INPUTFILE = FileOpenDialog("Select Input File", @ScriptDir, "Text Files (*.txt)", 1) $H_OUTPUTFILE = FileOpen(ValExt(FileSaveDialog("Select output File", @ScriptDir, "Text Files (*.txt)", 27), "txt"), 2) $H_INPUTFILE = FileOpen($INPUTFILE,0) While 1 $INPUTSTRING = FileReadLine($H_INPUTFILE) If @error then ExitLoop FormatZSE16($INPUTSTRING) FileWriteLine($H_OUTPUTFILE, $INPUTSTRING) Wend FileClose($H_OUTPUTFILE) FileClose($H_InputFILE) Exit Func ValExt($FILENAME, $EXTENSION) If StringRight($FILENAME, StringLen($EXTENSION) + 1) = "." & $EXTENSION Then Return $FILENAME Else Return $FILENAME & "." & $EXTENSION EndIf EndFunc ;==>ValExt Func FormatZSE16(ByRef $ZSE16IN) If StringInStr($ZSE16IN[$I], "| |") Then $ZSE16IN = StringReplace($ZSE16IN, "| |", "") $ZSE16IN = StringLeft($ZSE16IN, StringLen($ZSE16IN) - 2) $ZSE16IN = StringSplit($ZSE16IN, "|") For $J = 1 To $ZSE16IN[0] MsgBox(0, $I & " - " & $J, "$i - $j", 1) $ZSE16IN[$j] = StringStripWS($ZSE16IN[$J], 7) If StringInStr($ZSE16IN[$J], ".") Then $ZSE16IN[$J] = StringReplace($ZSE16IN[$J], ",", "") EndIf If $ZSE16IN[$J] = "00000" Then $ZSE16IN[$J] = "" EndIf If $ZSE16IN[$J] = "0000.00.00" Then $ZSE16IN[$J] = "" EndIf If $ZSE16IN[$J] = ". ." Then $ZSE16IN[$J] = "" EndIf If $J = 1 Then $TEMP = $ZSE16IN[$J] Else $TEMP = $TEMP & @TAB & $ZSE16IN[$J] EndIf Next $TEMPOUT = $TEMPOUT & $TEMP & @CRLF EndIf $ZSE16IN = $TEMPOUT EndFunc ;==>FormatZSE16 Func Remaining($TIMERINUSE, $TOTALRECORDS, $CURRENTRECORD) $TIMENOW = TimerStop($TIMERINUSE) $TIMEPERREC = ($TIMENOW / $CURRENTRECORD) $TOTALTIME = $TIMEPERREC * $TOTALRECORDS $TIMELEFT = $TOTALTIME - $TIMENOW $HOURS = Int($TIMELEFT / 3600000) $TIMELEFT = $TIMELEFT - ($HOURS * 3600000) $MINUTES = Int($TIMELEFT / 60000) $TIMELEFT = $TIMELEFT - ($MINUTES * 60000) $SECONDS = Round($TIMELEFT / 1000, 0) Return StringRight( "00" & $HOURS, 2) & ":" & StringRight( "00" & $MINUTES, 2) & ":" & StringRight( "00" & $SECONDS, 2) EndFunc ;==>Remaining Edited June 27, 2004 by JdeB SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Guest Michael.J.Curley Posted June 27, 2004 Share Posted June 27, 2004 (edited) Autoit won't handle null values in a file. it assumes the input ends there.There are no ascii null values in the text. Just dates (where date format = yyyy.mm.dd) like "0000.00.00" and " . . ". These are the primary cause of my import errors in MS Access.Doesn't it make more sence to process the file record by record with a file of this size. There have been posts a while back, explaining the issue why it gets slower....Yes and no. At what size is it faster to take the contents off the clipboard, write them to a file on the hard disk.... That whole fileread/write portion of the script is just there for testing purposes. Once I get the script running right, FormatZSE16(ByRef $ZSE16IN) will be a #included UDF, and $ZSE16In will be ClipGet(). I guess I was under the impressions that filereadline(file, 100000) would fileread a byte at a time until arrived at the 100000th line (and then do it all over again for the 100001th file, when in a loop). But that may not be the case. I will do more testing.You can only do that with dim. The stringsplit you use will return a single dimention array... Something like this (Untested):You suggestion does not break the input down to "columns" so and the validation rules have to work on column by column basis... EDIT: yes it does... nevermind.... Trying... Edited June 27, 2004 by Michael.J.Curley Link to comment Share on other sites More sharing options...
Guest Michael.J.Curley Posted June 27, 2004 Share Posted June 27, 2004 If no line number to read is given, the "next" line will be read. ("Next" for a newly opened file is initially the first line.)That is what I was not getting.. silly me. Here I was trying to count the lines first, so I could setup the loop to read lines forcing it to each line number (and report progress). I still need to count the lines first (just for the progress bar), but I dont need to feed that to the loop. Link to comment Share on other sites More sharing options...
Developers Jos Posted June 27, 2004 Developers Share Posted June 27, 2004 (edited) No need to count the lines for the progressbar first.. Just do a FileGetSize to get the total size of the file and keep the total lenght processed by adding the lenght of each record to a variable... Edited June 27, 2004 by JdeB SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now