gruntydatsun Posted August 2, 2017 Share Posted August 2, 2017 (edited) I am trying to generate a 2d array out of an imported text file with fixed length fields on each line. File format for each line is below. The is no delimiter, just need to capture 10 chars then 35 chars then 4 chars etc etc into the 2d array. 10, 35, 4, 2, 8, 30, 1, 14, 10, 16, 50, 10, 8 The data file I'm importing is pretty big, 140MB or so and 810k lines. This is how I'm doing it now. It's a bit slow though. Any suggestions for a better way much appreciated guys as my regexfu is weak #include <array.au3> ;read in data file $raw = FileRead(@Scriptdir & "\datafile.dat") ;create 1d array of all values in file (would love to just have a regex to made a 2d array in one hit... sigh.. i suck) $array1d = StringRegExp($raw,"(.{10})(.{35})(.{4})(.{2})(.{8})(.{1})(.{15})(.{10})(.{16})(.{50})(.{10})(.{8})",3) $rows = UBound($array1d)/12 ;12 fields per line in data file & regex makes 1d array so this is the line count of file $cols = 12 ;12 fields per line local $array2d[$rows][$cols] ;2d array to hold results local $counter = 0 ;like it says for $x = 0 to $rows-1 for $y = 0 to $cols-1 $array2d[$x][$y] = $array1d[$counter] ;insert 12 elements per row from 1d array into 2d array $counter = $counter + 1 Next Next _ArrayDisplay($array2d,"Results") edit: sorry, inserted broken code, it works now Edited August 2, 2017 by gruntydatsun Link to comment Share on other sites More sharing options...
AspirinJunkie Posted August 2, 2017 Share Posted August 2, 2017 (edited) expandcollapse popup#include <Array.au3> $s_File = FileRead(@Scriptdir & "\datafile.dat") $aA_RegExReturn = StringRegExp($s_File, "(?m)^(.{10})(.{35})(.{4})(.{2})(.{8})(.{1})(.{15})(.{10})(.{16})(.{50})(.{10})(.{8})", 4) $a_Result = _ArrayAinATo2d($aA_RegExReturn) _ArrayDisplay($a_Result) ; #FUNCTION# ====================================================================================== ; Name ..........: _ArrayAinATo2d() ; Description ...: Convert a Arrays in Array into a 2D array ; Syntax ........: _ArrayAinATo2d(ByRef $A) ; Parameters ....: $A - the arrays in array which should be converted ; Return values .: Success: a 2D Array build from the input array ; Failure: False ; @error = 1: $A is'nt an 1D array ; = 2: $A is empty ; = 3: first element isn't a array ; Author ........: AspirinJunkie ; ================================================================================================= Func _ArrayAinATo2d(ByRef $A) If UBound($A, 0) <> 1 Then Return SetError(1, UBound($A, 0), False) Local $N = UBound($A) If $N < 1 Then Return SetError(2, $N, False) Local $u = UBound($A[0]) If $u < 1 Then Return SetError(3, $u, False) Local $a_Ret[$N][$u] For $i = 0 To $N - 1 Local $t = $A[$i] If UBound($t) > $u Then ReDim $a_Ret[$N][UBound($t)] For $j = 0 To UBound($t) - 1 $a_Ret[$i][$j] = $t[$j] Next Next Return $a_Ret EndFunc ;==>_ArrayAinATo2d If the datatypes are necessary or you want to convert some columns or anything then maybe this is an alternative for you: expandcollapse popup#include <Array.au3> $s_File = FileRead(@ScriptDir & "\datafile.dat") $a_Data = String_HandleTable($s_File, "int 10; leftString 35; myOwnFunc 4; Int 2;") _ArrayDisplay($a_Data, "parsed") Func myOwnFunc(ByRef $A) Return String($A) EndFunc ;==>myOwnFunc ; convert a string with fixed column widths into an 2D-array + user defined type conversions Func String_HandleTable(ByRef Const $s_String, Const $s_Format) ; by AspirinJunkie ; create format array Local $a_Split = StringRegExp($s_Format, '\s*(\w+)\s*(\d+)', 4) If @error Then Return SetError(1, @error, "") Local $a_Cols[UBound($a_Split)][3] Local $x = 1, $j = 0 For $i In $a_Split $x += $i[2] If $i[1] = "void" Then ContinueLoop $a_Cols[$j][0] = $i[1] $a_Cols[$j][1] = $x - $i[2] $a_Cols[$j][2] = $i[2] $j += 1 Next If $j <> UBound($a_Cols) Then ReDim $a_Cols[$j][3] ; process string Local $a_Split = StringRegExp($s_String, "([^\r\n|\n|\r]+)", 3) Local $a_Ret[UBound($a_Split)][UBound($a_Cols)] Local $s_rawVal, $s_Val, $x = 0 For $s_Line In $a_Split For $i = 0 To UBound($a_Cols) - 1 $s_rawVal = StringMid($s_Line, $a_Cols[$i][1], $a_Cols[$i][2]) Switch $a_Cols[$i][0] Case "leftString" $s_Val = StringStripWS($s_rawVal, 2) Case "rightString" $s_Val = StringStripWS($s_rawVal, 1) Case Else $s_Val = Execute($a_Cols[$i][0] & '("' & $s_rawVal & '")') If @error Then ContinueLoop EndSwitch $a_Ret[$x][$i] = $s_Val Next $x += 1 Next Return $a_Ret EndFunc ;==>String_HandleTable Edited August 2, 2017 by AspirinJunkie Link to comment Share on other sites More sharing options...
AndyG Posted August 2, 2017 Share Posted August 2, 2017 (edited) Hi, 18 hours ago, gruntydatsun said: The data file I'm importing is pretty big, 140MB or so and 810k lines. This is how I'm doing it now. It's a bit slow though. The question is, what do you want to do? Nobody reads 140MB data to display them within an _ArrayDisplay()! Some profiling shows that reading a 169MB testfile takes about 4 seconds, regex takes 6 seconds, transforming 1D_>2D-array takes 17 seconds....all in all 27 seconds on my machine (without "showing" the data via _ArrayDisplay() which takes 8-10 extra seconds) How should the data be further processed? I am almost sure that there is a faster solution for your "real" problem... Edited August 2, 2017 by AndyG Link to comment Share on other sites More sharing options...
Malkey Posted August 2, 2017 Share Posted August 2, 2017 Further to AndyG's comments If you want to view data after the 65,525th row (index 65,524) , then _ArrayDisplay() is not the way to go. (See _ArrayDisplay - 2nd paragraph of "Remarks") #include <array.au3> ;read in data file Local $raw ;= FileRead(@Scriptdir & "\data_file.txt") ; Create test data For $i = 0 To 169 * 70000 ; 850000 $raw &= Chr(Mod($i, 26) + 97) Next ConsoleWrite("Test data created " & 169 * 850000 / 1000000 & " Mb" & @CRLF) ; Returns 143.65 Mb ;create 1d array of all values in file (would love to just have a regex to made a 2d array in one hit... sigh.. i suck) $array = StringRegExp($raw, "(.{10})(.{35})(.{4})(.{2})(.{8})(.{1})(.{15})(.{10})(.{16})(.{50})(.{10})(.{8})", 3) _ArrayDisplay($array) $rows = Ceiling(UBound($array) / 12) ;12 fields per line in data file & regex makes 1d array so this is the line count of file $cols = 12 ;12 fields per line Local $array2d[$rows][$cols] ;2d array to hold results ; 1D array, $array, to 2D, array, $array2d. For $i = 0 To UBound($array) - 1 $array2d[Int($i / $cols)][Mod($i, $cols)] = $array[$i] Next _ArrayDisplay($array2d, "Results", "70000") ; Note: Only 65525 rows shown, not 70,000. An alternative method is: Local $raw = FileRead(@ScriptDir & "\data_file.txt") Local $sTabSeparatedFields = StringRegExpReplace($raw, "(.{10})(.{35})(.{4})(.{2})(.{8})(.{1})(.{15})(.{10})(.{16})(.{50})(.{10})(.{8})", _ "\1" & @TAB & "\2" & @TAB & "\3" & @TAB & "\4" & @TAB & "\5" & @TAB & "\6" & @TAB & "\7" & @TAB & "\8" & @TAB & _ "\9" & @TAB & "\10" & @TAB & "\11" & @CRLF) ; <-- Trailing " & @CRLF" may not be needed if data fields already are in lines 169 characters long. ConsoleWrite($sTabSeparatedFields & @CRLF) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now