Sign in to follow this  
Followers 0
squishy

loading large binary file?

5 posts in this topic

Hello!

I am trying to read large binary files (specifically, .dbf files) into an array. However, using my current code, this takes FOREVER! I have started learning about using DLLStructs...would using these be any faster when loading each record?

Any help would be appreciated...

Thanks!

#include <File.au3>
#include <Date.au3>

Local $fDBF,$data,$rsHeader[6],$dLastUpdate,$numfields,$temp,$fielddesc[2],$rsData[2],$x,$y,$numrecs,$numbytesheader,$reclen,$pos,$currecord,$curfield,$t

$t = _NowCalc()
$fDBF = FileOpen("G:\bvinfo.dbf",16)
$data = FileRead($fDBF,32);get header
$dLastUpdate = "200" & Asc(BinaryToString(BinaryMid($data,2,1))) & "/" & Asc(BinaryToString(BinaryMid($data,3,1))) & "/" & Asc(BinaryToString(BinaryMid($data,4,1)))
$numrecs = Dec(Hex(BinaryMid($data,8,1) & BinaryMid($data,7,1) & BinaryMid($data,6,1) & BinaryMid($data,5,1)))
$numbytesheader = Dec(Hex(BinaryMid($data,10,1) & BinaryMid($data,9,1)))
$numfields = Int(Dec(Hex(BinaryMid($data,10,1) & BinaryMid($data,9,1))) / 32) - 1
$reclen = Dec(Hex(BinaryMid($data,12,1) & BinaryMid($data,11,1)))
ReDim $fielddesc[$numfields+1][6]
$pos = 32
For $x = 1 To $numfields;load field descriptions
    $data = FileRead($fDBF,32)
    $fielddesc[$x][0] = BinaryToString(BinaryMid($data,1,11));field name
    $fielddesc[$x][1] = BinaryToString(BinaryMid($data,12,1));field type
    $fielddesc[$x][2] = BinaryToString(BinaryMid($data,13,4));data address
    $fielddesc[$x][3] = Asc(BinaryToString(BinaryMid($data,17,1)));field length
    $fielddesc[$x][4] = BinaryToString(BinaryMid($data,18,1));decimal count
;$fielddesc[$x][5] = BinaryToString(BinaryMid($data,19,14));junk
    $pos = $pos + 32
Next
ReDim $rsData[$numrecs+1][$numfields+1]
$data = FileRead($fDBF,$numbytesheader - $pos);seek to beginning of data
For $x = 1 To $numrecs
    $rsData[$x][0] = BinaryToString(FileRead($fDBF,$reclen))
    If StringLeft($rsData[$x][0],1) <> "*" Then;active record
        $pos = 2;set start of data (first pos is deleted field)
        For $y = 1 To $numfields
            $curfield = StringMid($rsData[$x][0],$pos,$fielddesc[$y][3])
            Switch $fielddesc[$y][1]
            Case "D";date
                $rsData[$x][$y] = StringLeft($curfield,4) & "/" & StringMid($curfield,5,2) & "/" & StringRight($curfield,2)
            Case "C";char
                $rsData[$x][$y] = StringStripWS($curfield,3)
            Case "N";numeric
                $rsData[$x][$y] = Number($curfield)
            Case "L";logical
                $rsData[$x][$y] = ($curfield = "T" Or $curfield = "Y")
            Case "@";timestamp
                $rsData[$x][$y] = $curfield
            Case "I";long
                $rsData[$x][$y] = $curfield
            Case "F";float
                $rsData[$x][$y] = Number($curfield)
            Case "O";double
                $rsData[$x][$y] = Number($curfield)
            Case "M";memo
                $rsData[$x][$y] = $curfield
            EndSwitch
            $pos = $pos + $fielddesc[$y][3]
        Next
    EndIf
Next
MsgBox(0,"Time Diff",_DateDiff("s",$t,_NowCalc()));display time to load file
FileClose($fDBF)

Troy.

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Hi,

[EDIT] PS. if it is possible to do a single fileread and then handle the resulting binaryresult with binary mid etc?; do that for speed! - - or do the ones we have work with that binary from reading a whole binary file anyway??]

I've not seen that done before; but i am not sure if you can get it faster or not.

1. Is it the read part which is slow, or the processing (the latter could be faster with vbscript subfunctions)

2. Could you possibly send me a sample dbf file to check [i have been trying to get tailRW working for just reading the end of a binary file, and this would help me too]

3. is it dbase4 or some other dbf file?

Sorry I don't know your answer, but I would be interested to have a look

Best, Randall

[The read part would be 40x as fast on a small file; see attached file example]

Edited by randallc

Share this post


Link to post
Share on other sites

PS -

See new APITailRW in signature for option of direct binary tail or part read; I still do not recommend it here as for this purpose you need to read the whole file anyway and that would be a lot quicker.

Best, randall

Share this post


Link to post
Share on other sites

Thanks, randallc, for the responses!

Basically, I need to work on a few different files, all of which are .dbf format. I will need to essentially run queries on each file and return results to custom forms, adding functionality to some outdated software we use. The biggest problem I am running into is the speed factor...the smallest file we work with is around 9 MB, and the largest is over 100 MB.

1. Is it the read part which is slow, or the processing (the latter could be faster with vbscript subfunctions)

I have tried reading the whole file first, which works fairly quickly, and then the processing slows it down greatly. I didn't think of using vbscript functions...just assumes AutoIt would run them just as fast...

2. Could you possibly send me a sample dbf file to check [i have been trying to get tailRW working for just reading the end of a binary file, and this would help me too]

Not a problem...just send me a message with an account that can handle a 9MB email attachment...

3. is it dbase4 or some other dbf file?

Not sure...it does follow the xbase file format for the header and record formats tho.

Once again, thanks for your help! Perhaps we can work together to come up with some mutual solutions!

Troy.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Hi,

As you may see, I fixed the tailRW for binary anyway, so don't need the 9Mb file! [vbscript for string manipulation can be 10-50x faster; never used it for binary, though! - not sure if it is supported]

I wonder if the AutoIt inbuilt SqLiteExe Import would import the file [tho I doubt it - perhaps Excel would import it; convert it to csv, then use SqLite?]; the SqLiteExe Import runs at low-level speed, but no idea whether your format would be supported? @Piccaso might know.. [of course if that worked, you could just query it in Excel anyway?..]

Best, Randall

Edited by randallc

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0