Jump to content

fastest build to array from TAB delimited file?


gcue
 Share

Recommended Posts

I would recommend you post what UDF's you've tried, and I am thinking you meant 60MB not 60k?

I would personally try the following...

#include <Array.au3>
Local $strFileContents = FileRead("FilePath/File.ext")
Local $aryItems = StringSplit($strFileContents, '\t', 2)
_ArrayDisplay($aryItems)

The above code is 100% untested, but I think it will give you the desired results. I don't know that it will be faster than anything else, but it's worth a shot.

Please show what you've done so we know in the future. Also if you make your posts detailed, it allows other people who may need this exact answer in the future to find what you used, and know not to head in that direction. This isn't just for your benefit that we ask for details here. It's for posterity.

Thanks,

Jarvis

AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Link to comment
Share on other sites

im sorry, i meant 60k lines not 60kb

heres the most promising udf i tried that still took ~7 minutes

http://www.autoitscript.fr/forum/viewtopic.php?p=16275#p16275

sorry about that.

JS: the lines you wrote wont work for me because the array would be 31 dimensions (the tab delimited file has 31 columns)

Link to comment
Share on other sites

im sorry, i meant 60k lines not 60kb

heres the most promising udf i tried that still took ~7 minutes

http://www.autoitscript.fr/forum/viewtopic.php?p=16275#p16275

sorry about that.

JS: the lines you wrote wont work for me because the array would be 31 dimensions (the tab delimited file has 31 columns)

Okay... so you have tried ArrayFileToArray() function...I don't know that I have seen that code, it might be able to be optimized. Does this file always have 31 columns? So you're wanting to split the file by line, and by tab? 60k lines is a decent amount. It should be able to process it though. Array's can get tough as you would have a $aryData[60000][31].

Can I ask what your end goal is? There may be a better way to complete the task at hand.

I hope I can help,

Jarvis

AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Link to comment
Share on other sites

Okay... so you have tried ArrayFileToArray() function...I don't know that I have seen that code, it might be able to be optimized. Does this file always have 31 columns? So you're wanting to split the file by line, and by tab? 60k lines is a decent amount. It should be able to process it though. Array's can get tough as you would have a $aryData[60000][31].

Can I ask what your end goal is? There may be a better way to complete the task at hand.

I hope I can help,

Jarvis

Thanks for your help Jarvis =)

Yes the file always has 31 columns.

Yes, id like to split line by line and tab by tab (so each line is a record, and each new tab is a new dimension)

my end goal is this:

we have an inventory database that gets exported to a text file (they wont allow us to query a live production database because it does much more than just inventory) anyway, my script allows us to pull inventory information based on the computer asset number which is $array[$x][0]. so, if i wanted to get the registered owner of any asset, id pull $array[$x][1]. there's 31 fields i can query for now.. later we may add more although i doubt it for now.

thanks again!

Link to comment
Share on other sites

#include <Array.au3>

Global Const $sFile = @ScriptDir & '\file.txt'
Global $hFile
Global $sText
Global $aMatch

Global $iInit = TimerInit()
$hFile = FileOpen($sFile, 0)
$sText = FileRead($hFile)
FileClose($hFile)

$aMatch = StringRegExp($sText, '([^\t\r]*+)(?>\t|(?:\r\n)*)', 3)

Global Const $iUpperBound = UBound($aMatch)
Global Const $iRows = Int($iUpperBound/31)
Global Const $iCols = 31
Global $avCSVArray[$iRows][$iCols]
Global $iCounter = 0

For $i = 0 To $iUpperBound-$iCols Step $iCols
    For $j = 0 To $iCols-1
        $avCSVArray[$iCounter][$j] = $aMatch[$i+$j]
    Next
    
    $iCounter += 1
Next

ConsoleWrite(TimerDiff($iInit) & @LF)
For $i = 0 To 2
    For $j = 0 To $iCols-1
        ConsoleWrite($avCSVArray[$i][$j] & @TAB)
    Next
    ConsoleWrite(@CRLF)
Next

Link to comment
Share on other sites

I wrote a small function wich could be faster than this ArrayFileToArray() function, but i din't test it >_<

;~   flag = 1 (default), data is taken to be ANSI
;~   flag = 2, data is taken to be UTF16 Little Endian
;~   flag = 3, data is taken to be UTF16 Big Endian
;~   flag = 4, data is taken to be UTF8
Func _FileReadToArray2D($File, $Delim=@TAB, $flag=1)
    ; Author: Prog@ndy
    If $flag < 0 Or $flag > 4 Then Return SetError(1,0,0)
    Local $flags[5] = [0,0,32,64,128]
    Local $hFile = FileOpen($File, $flags[$flag])
    If @error Then Return SetError(2,0,0)
    Local $Line = FileReadLine($hFile)
    If @error Then Return SetError(3,0,0*FileClose($hFile))
    
    $Line = StringSplit($Line, $Delim, 1)
    Local $Count, $MaxCount=100, $Length=$Line[0], $Array[$MaxCount][$Length]
    While 1
        If $Length < $Line[0] Then 
            $Length = $Line[0]
            ReDim $Array[$MaxCount][$Length]
        EndIf
        If $Count >= $MaxCount Then 
            $MaxCount += 100
            ReDim $Array[$MaxCount][$Length]
        EndIf
        For $i = 1 To $Line[0]
            $Array[$Count][$i-1] = $Line[$i]
        Next
        
        $Count += 1
        $Line = FileReadLine($hFile)
        If @error Then ExitLoop
        $Line = StringSplit($Line, $Delim, 1)
    WEnd
    ReDim $Array[$Count][$Length]
    FileClose($hFile)
    Return $Array
EndFunc

#include<Array.au3>
$Array = _FileReadToArray2D(@DesktopDir&"\test.txt")
_ArrayDisplay($Array)
Edited by ProgAndy

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Link to comment
Share on other sites

authenticity yours was pretty fast (13948 milliseconds).. so ya faster than before

progandy: still waiting for it to complete (i think it's because u didnt test) =)

Hmm, well...

i think the filereadline and ReDims are not too fast, StringRegExp should be better. I just read about a few problems with big strings and StringRegExp, so i din't use it.

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Link to comment
Share on other sites

Have you thought about using a memory db with SQLite instead of an array? Seems like it would be much more suited to your situation than array.

zorphnog.. ive thought about memory, but

1. dont have the foggiest idea on how - lol

2. scared what else thatd implicate - would pc lose performance bc im using up its resources?

Edited by gcue
Link to comment
Share on other sites

  • Moderators

This took about 6 secs on my PC with a 43.3 mb file and the required 6 columns and 60,000 rows. The first dimension starts at 1, however the 2nd dimension starts at zero.

#include <Array.au3>
MsgBox(0, "Ready", "Go")
Global $i_t = TimerInit()
Global $a_ = _DelimFile_To_Array2D("Test60kLines.log", @TAB, 6)
Global $i_d = TimerDiff($i_t)
ConsoleWrite("Total time in seconds = " & Round($i_d / 1000, 2) & @CRLF & "UB 1d = " & UBound($a_, 1) - 1 & @CRLF & "UB 2d = " & UBound($a_, 2) & @CRLF)
_ArrayDisplay($a_)

Func _DelimFile_To_Array2D($s_file, $s_delim = @TAB, $i_max_2d = 0)
    
    Local $s_str = $s_file
    If FileExists($s_str) Then $s_str = FileRead($s_file)
    
    
    Local $i_enum_max = False
    If Int($i_max_2d) < 1 Then
        $i_enum_max = True
        $i_max_2d = 1
    EndIf
    
    Local $a_split = StringSplit(StringStripCR($s_str), @LF)
    Local $a_ret[$a_split[0] + 1][$i_max_2d] = [[$a_split[0]]], $a_delim
    
    For $i = 1 To $a_split[0]
        $a_delim = StringSplit($a_split[$i], $s_delim, 1)
        If $i_enum_max And $i_max_2d < $a_delim[0] Then
            ReDim $a_ret[$a_split[0] + 1][$a_delim[0]]
            $i_max_2d = $a_delim[0]
        EndIf
        For $j = 1 To $a_delim[0]
            $a_ret[$i][$j - 1] = $a_delim[$j]
        Next
    Next
    
    Return $a_ret
EndFunc

Edit:

Fixed Edit!

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

This took about 6 secs on my PC with a 43.3 mb file and the required 6 columns and 60,000 rows. The first dimension starts at 1, however the 2nd dimension starts at zero.

#include <Array.au3>
MsgBox(0, "Ready", "Go")
Global $i_t = TimerInit()
Global $a_ = _DelimFile_To_Array2D("Test60kLines.log", @TAB, 6)
Global $i_d = TimerDiff($i_t)
ConsoleWrite("Total time in seconds = " & Round($i_d / 1000, 2) & @CRLF & "UB 1d = " & UBound($a_, 1) - 1 & @CRLF & "UB 2d = " & UBound($a_, 2) & @CRLF)
_ArrayDisplay($a_)

Func _DelimFile_To_Array2D($s_file, $s_delim = @TAB, $i_max_2d = 0)
    
    Local $s_str = $s_file
    If FileExists($s_str) Then $s_str = FileRead($s_file)
    
    
    Local $i_enum_max = False
    If Int($i_max_2d) < 1 Then
        $i_enum_max = True
        $i_max_2d = 1
    EndIf
    
    Local $a_split = StringSplit(StringStripCR($s_str), @LF)
    Local $a_ret[$a_split[0] + 1][$i_max_2d] = [[$a_split[0]]], $a_delim
    
    For $i = 1 To $a_split[0]
        $a_delim = StringSplit($a_split[$i], $s_delim, 1)
        If $i_enum_max And $i_max_2d < $a_delim[0] Then
            ReDim $a_ret[$a_split[0] + 1][$a_delim[0]]
            $i_max_2d = $a_delim[0]
        EndIf
        For $j = 1 To $i_max_2d
            $a_ret[$i][$j - 1] = $a_delim[$j]
        Next
    Next
    
    Return $a_ret
EndFunc

wow that does sound fast. i tried to plug in my file but am getting an error (maybe bc im using 31 dimensions?)

Global $a_ = _DelimFile_To_Array2D("\\server\export.txt", @TAB, 31) ; made this change - not sure if need to change anything else.

Build\4. arraytest.au3 (33) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.:

$a_ret[$i][$j - 1] = $a_delim[$j]

$a_ret[$i][$j - 1] = ^ ERROR

Link to comment
Share on other sites

  • Moderators

Sorry, working...

Try not setting the dimensions (2nd and 3rd param are optional):

Global $a_ = _DelimFile_To_Array2D("\\server\export.txt")

What result do you get then?

Edit:

Also note a small change in the 2nd For/Next loop in the previous code I posted.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

weird... still getting an error:

U:\Build\4. arraytest.au3 (33) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.:

$a_ret[$i][$j - 1] = $a_delim[$j]

^ ERROR

#include <Array.au3>

$begin = TimerInit()

Global $a_ = _DelimFile_To_Array2D("\\server\Extract.txt")

$dif = TimerDiff($begin)
MsgBox(0,"Time Difference",$dif)

_ArrayDisplay($a_)

Func _DelimFile_To_Array2D($s_file, $s_delim = @TAB, $i_max_2d = 0)
    
    Local $s_str = $s_file
    If FileExists($s_str) Then $s_str = FileRead($s_file)    
    
    Local $i_enum_max = False
    If Int($i_max_2d) < 1 Then
        $i_enum_max = True
        $i_max_2d = 1
    EndIf
    
    Local $a_split = StringSplit(StringStripCR($s_str), @LF)
    Local $a_ret[$a_split[0] + 1][$i_max_2d] = [[$a_split[0]]], $a_delim
    
    For $i = 1 To $a_split[0]
        $a_delim = StringSplit($a_split[$i], $s_delim, 1)
        If $i_enum_max And $i_max_2d < $a_delim[0] Then
            ReDim $a_ret[$a_split[0] + 1][$a_delim[0]]
            $i_max_2d = $a_delim[0]
        EndIf
        For $j = 1 To $a_delim[$j]
            $a_ret[$i][$j - 1] = $a_delim[$j]
        Next
    Next
    
    Return $a_ret
EndFunc
Link to comment
Share on other sites

  • Moderators

$j = 1 To $a_delim[$j]

Should be:

$j = 1 To $a_delim[0]

I'll check the posted code, if it was my mistake I apologize.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...