Jump to content
Sign in to follow this  
gcue

fastest build to array from TAB delimited file?

Recommended Posts

gcue

ive tried a few udfs out there but most take pretty long... (7 min or so)

anyone know of anything thats fast???

many thanks in advance

Edited by gcue

Share this post


Link to post
Share on other sites
JSThePatriot

I would recommend you post what UDF's you've tried, and I am thinking you meant 60MB not 60k?

I would personally try the following...

#include <Array.au3>
Local $strFileContents = FileRead("FilePath/File.ext")
Local $aryItems = StringSplit($strFileContents, '\t', 2)
_ArrayDisplay($aryItems)

The above code is 100% untested, but I think it will give you the desired results. I don't know that it will be faster than anything else, but it's worth a shot.

Please show what you've done so we know in the future. Also if you make your posts detailed, it allows other people who may need this exact answer in the future to find what you used, and know not to head in that direction. This isn't just for your benefit that we ask for details here. It's for posterity.

Thanks,

Jarvis


AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites
gcue

im sorry, i meant 60k lines not 60kb

heres the most promising udf i tried that still took ~7 minutes

http://www.autoitscript.fr/forum/viewtopic.php?p=16275#p16275

sorry about that.

JS: the lines you wrote wont work for me because the array would be 31 dimensions (the tab delimited file has 31 columns)

Share this post


Link to post
Share on other sites
JSThePatriot

im sorry, i meant 60k lines not 60kb

heres the most promising udf i tried that still took ~7 minutes

http://www.autoitscript.fr/forum/viewtopic.php?p=16275#p16275

sorry about that.

JS: the lines you wrote wont work for me because the array would be 31 dimensions (the tab delimited file has 31 columns)

Okay... so you have tried ArrayFileToArray() function...I don't know that I have seen that code, it might be able to be optimized. Does this file always have 31 columns? So you're wanting to split the file by line, and by tab? 60k lines is a decent amount. It should be able to process it though. Array's can get tough as you would have a $aryData[60000][31].

Can I ask what your end goal is? There may be a better way to complete the task at hand.

I hope I can help,

Jarvis


AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites
gcue

Okay... so you have tried ArrayFileToArray() function...I don't know that I have seen that code, it might be able to be optimized. Does this file always have 31 columns? So you're wanting to split the file by line, and by tab? 60k lines is a decent amount. It should be able to process it though. Array's can get tough as you would have a $aryData[60000][31].

Can I ask what your end goal is? There may be a better way to complete the task at hand.

I hope I can help,

Jarvis

Thanks for your help Jarvis =)

Yes the file always has 31 columns.

Yes, id like to split line by line and tab by tab (so each line is a record, and each new tab is a new dimension)

my end goal is this:

we have an inventory database that gets exported to a text file (they wont allow us to query a live production database because it does much more than just inventory) anyway, my script allows us to pull inventory information based on the computer asset number which is $array[$x][0]. so, if i wanted to get the registered owner of any asset, id pull $array[$x][1]. there's 31 fields i can query for now.. later we may add more although i doubt it for now.

thanks again!

Share this post


Link to post
Share on other sites
Authenticity

#include <Array.au3>

Global Const $sFile = @ScriptDir & '\file.txt'
Global $hFile
Global $sText
Global $aMatch

Global $iInit = TimerInit()
$hFile = FileOpen($sFile, 0)
$sText = FileRead($hFile)
FileClose($hFile)

$aMatch = StringRegExp($sText, '([^\t\r]*+)(?>\t|(?:\r\n)*)', 3)

Global Const $iUpperBound = UBound($aMatch)
Global Const $iRows = Int($iUpperBound/31)
Global Const $iCols = 31
Global $avCSVArray[$iRows][$iCols]
Global $iCounter = 0

For $i = 0 To $iUpperBound-$iCols Step $iCols
    For $j = 0 To $iCols-1
        $avCSVArray[$iCounter][$j] = $aMatch[$i+$j]
    Next
    
    $iCounter += 1
Next

ConsoleWrite(TimerDiff($iInit) & @LF)
For $i = 0 To 2
    For $j = 0 To $iCols-1
        ConsoleWrite($avCSVArray[$i][$j] & @TAB)
    Next
    ConsoleWrite(@CRLF)
Next

Share this post


Link to post
Share on other sites
ProgAndy

I wrote a small function wich could be faster than this ArrayFileToArray() function, but i din't test it >_<

;~   flag = 1 (default), data is taken to be ANSI
;~   flag = 2, data is taken to be UTF16 Little Endian
;~   flag = 3, data is taken to be UTF16 Big Endian
;~   flag = 4, data is taken to be UTF8
Func _FileReadToArray2D($File, $Delim=@TAB, $flag=1)
    ; Author: Prog@ndy
    If $flag < 0 Or $flag > 4 Then Return SetError(1,0,0)
    Local $flags[5] = [0,0,32,64,128]
    Local $hFile = FileOpen($File, $flags[$flag])
    If @error Then Return SetError(2,0,0)
    Local $Line = FileReadLine($hFile)
    If @error Then Return SetError(3,0,0*FileClose($hFile))
    
    $Line = StringSplit($Line, $Delim, 1)
    Local $Count, $MaxCount=100, $Length=$Line[0], $Array[$MaxCount][$Length]
    While 1
        If $Length < $Line[0] Then 
            $Length = $Line[0]
            ReDim $Array[$MaxCount][$Length]
        EndIf
        If $Count >= $MaxCount Then 
            $MaxCount += 100
            ReDim $Array[$MaxCount][$Length]
        EndIf
        For $i = 1 To $Line[0]
            $Array[$Count][$i-1] = $Line[$i]
        Next
        
        $Count += 1
        $Line = FileReadLine($hFile)
        If @error Then ExitLoop
        $Line = StringSplit($Line, $Delim, 1)
    WEnd
    ReDim $Array[$Count][$Length]
    FileClose($hFile)
    Return $Array
EndFunc

#include<Array.au3>
$Array = _FileReadToArray2D(@DesktopDir&"\test.txt")
_ArrayDisplay($Array)
Edited by ProgAndy

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Share this post


Link to post
Share on other sites
gcue

let me test/time them

thanks authenticity/progandy!

Share this post


Link to post
Share on other sites
gcue

authenticity yours was pretty fast (13948 milliseconds).. so ya faster than before

progandy: still waiting for it to complete (i think it's because u didnt test) =)

Share this post


Link to post
Share on other sites
zorphnog

Have you thought about using a memory db with SQLite instead of an array? Seems like it would be much more suited to your situation than array.

Share this post


Link to post
Share on other sites
ProgAndy

authenticity yours was pretty fast (13948 milliseconds).. so ya faster than before

progandy: still waiting for it to complete (i think it's because u didnt test) =)

Hmm, well...

i think the filereadline and ReDims are not too fast, StringRegExp should be better. I just read about a few problems with big strings and StringRegExp, so i din't use it.


*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Share this post


Link to post
Share on other sites
gcue

Have you thought about using a memory db with SQLite instead of an array? Seems like it would be much more suited to your situation than array.

zorphnog.. ive thought about memory, but

1. dont have the foggiest idea on how - lol

2. scared what else thatd implicate - would pc lose performance bc im using up its resources?

Edited by gcue

Share this post


Link to post
Share on other sites
SmOke_N

This took about 6 secs on my PC with a 43.3 mb file and the required 6 columns and 60,000 rows. The first dimension starts at 1, however the 2nd dimension starts at zero.

#include <Array.au3>
MsgBox(0, "Ready", "Go")
Global $i_t = TimerInit()
Global $a_ = _DelimFile_To_Array2D("Test60kLines.log", @TAB, 6)
Global $i_d = TimerDiff($i_t)
ConsoleWrite("Total time in seconds = " & Round($i_d / 1000, 2) & @CRLF & "UB 1d = " & UBound($a_, 1) - 1 & @CRLF & "UB 2d = " & UBound($a_, 2) & @CRLF)
_ArrayDisplay($a_)

Func _DelimFile_To_Array2D($s_file, $s_delim = @TAB, $i_max_2d = 0)
    
    Local $s_str = $s_file
    If FileExists($s_str) Then $s_str = FileRead($s_file)
    
    
    Local $i_enum_max = False
    If Int($i_max_2d) < 1 Then
        $i_enum_max = True
        $i_max_2d = 1
    EndIf
    
    Local $a_split = StringSplit(StringStripCR($s_str), @LF)
    Local $a_ret[$a_split[0] + 1][$i_max_2d] = [[$a_split[0]]], $a_delim
    
    For $i = 1 To $a_split[0]
        $a_delim = StringSplit($a_split[$i], $s_delim, 1)
        If $i_enum_max And $i_max_2d < $a_delim[0] Then
            ReDim $a_ret[$a_split[0] + 1][$a_delim[0]]
            $i_max_2d = $a_delim[0]
        EndIf
        For $j = 1 To $a_delim[0]
            $a_ret[$i][$j - 1] = $a_delim[$j]
        Next
    Next
    
    Return $a_ret
EndFunc

Edit:

Fixed Edit!

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
gcue

This took about 6 secs on my PC with a 43.3 mb file and the required 6 columns and 60,000 rows. The first dimension starts at 1, however the 2nd dimension starts at zero.

#include <Array.au3>
MsgBox(0, "Ready", "Go")
Global $i_t = TimerInit()
Global $a_ = _DelimFile_To_Array2D("Test60kLines.log", @TAB, 6)
Global $i_d = TimerDiff($i_t)
ConsoleWrite("Total time in seconds = " & Round($i_d / 1000, 2) & @CRLF & "UB 1d = " & UBound($a_, 1) - 1 & @CRLF & "UB 2d = " & UBound($a_, 2) & @CRLF)
_ArrayDisplay($a_)

Func _DelimFile_To_Array2D($s_file, $s_delim = @TAB, $i_max_2d = 0)
    
    Local $s_str = $s_file
    If FileExists($s_str) Then $s_str = FileRead($s_file)
    
    
    Local $i_enum_max = False
    If Int($i_max_2d) < 1 Then
        $i_enum_max = True
        $i_max_2d = 1
    EndIf
    
    Local $a_split = StringSplit(StringStripCR($s_str), @LF)
    Local $a_ret[$a_split[0] + 1][$i_max_2d] = [[$a_split[0]]], $a_delim
    
    For $i = 1 To $a_split[0]
        $a_delim = StringSplit($a_split[$i], $s_delim, 1)
        If $i_enum_max And $i_max_2d < $a_delim[0] Then
            ReDim $a_ret[$a_split[0] + 1][$a_delim[0]]
            $i_max_2d = $a_delim[0]
        EndIf
        For $j = 1 To $i_max_2d
            $a_ret[$i][$j - 1] = $a_delim[$j]
        Next
    Next
    
    Return $a_ret
EndFunc

wow that does sound fast. i tried to plug in my file but am getting an error (maybe bc im using 31 dimensions?)

Global $a_ = _DelimFile_To_Array2D("\\server\export.txt", @TAB, 31) ; made this change - not sure if need to change anything else.

Build\4. arraytest.au3 (33) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.:

$a_ret[$i][$j - 1] = $a_delim[$j]

$a_ret[$i][$j - 1] = ^ ERROR

Share this post


Link to post
Share on other sites
SmOke_N

Sorry, working...

Try not setting the dimensions (2nd and 3rd param are optional):

Global $a_ = _DelimFile_To_Array2D("\\server\export.txt")

What result do you get then?

Edit:

Also note a small change in the 2nd For/Next loop in the previous code I posted.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
gcue

weird... still getting an error:

U:\Build\4. arraytest.au3 (33) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.:

$a_ret[$i][$j - 1] = $a_delim[$j]

^ ERROR

#include <Array.au3>

$begin = TimerInit()

Global $a_ = _DelimFile_To_Array2D("\\server\Extract.txt")

$dif = TimerDiff($begin)
MsgBox(0,"Time Difference",$dif)

_ArrayDisplay($a_)

Func _DelimFile_To_Array2D($s_file, $s_delim = @TAB, $i_max_2d = 0)
    
    Local $s_str = $s_file
    If FileExists($s_str) Then $s_str = FileRead($s_file)    
    
    Local $i_enum_max = False
    If Int($i_max_2d) < 1 Then
        $i_enum_max = True
        $i_max_2d = 1
    EndIf
    
    Local $a_split = StringSplit(StringStripCR($s_str), @LF)
    Local $a_ret[$a_split[0] + 1][$i_max_2d] = [[$a_split[0]]], $a_delim
    
    For $i = 1 To $a_split[0]
        $a_delim = StringSplit($a_split[$i], $s_delim, 1)
        If $i_enum_max And $i_max_2d < $a_delim[0] Then
            ReDim $a_ret[$a_split[0] + 1][$a_delim[0]]
            $i_max_2d = $a_delim[0]
        EndIf
        For $j = 1 To $a_delim[$j]
            $a_ret[$i][$j - 1] = $a_delim[$j]
        Next
    Next
    
    Return $a_ret
EndFunc

Share this post


Link to post
Share on other sites
SmOke_N

$j = 1 To $a_delim[$j]

Should be:

$j = 1 To $a_delim[0]

I'll check the posted code, if it was my mistake I apologize.


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
gcue

np man.

ill test it when i go to work tomorrow... many many thanks!!!!!!

Share this post


Link to post
Share on other sites
gcue

im have to use file from server so i think theres a delay bc of that

still pretty good tho smoke =) (16173 milliseconds)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.