Jump to content

Opening/Parsing Extremely Large Single-line files


Recommended Posts

I am trying to parse a VERY large single-line file, and I'm running into the obvious issue that variables can only store so much information before they have met their allocation. I have XML files that are dumped from a search service that return 20mb+ of results in a single line XML file. Obviously, I have to break that up somehow, but I'm at a loss.

Can anybody help me out with this? Can I read the file byte by byte or word by word?

How can I do this elegantly? Speed is a crucial issue.

Thanks in advance.

Link to comment
Share on other sites

I'm trying this from CPPMAN

Func _FileReadBytes($sFileName, $nStart, $nEnd)
    Local $nBytesToRead = $nEnd - $nStart
    Local $ResultBuffer = DLLStructCreate("byte[" & $nBytesToRead & "]")
    Local $BytesRead = 0
    Local $OverlappedStruct = DLLStructCreate($tagOVERLAPPED)
    Local $ResultString = ""
    DLLStructSetData($OverlappedStruct, "Offset", $nStart)
    
    $hFile = _WinAPI_CreateFile($sFileName, 2, 2, 2, 4, 0)
    if ($hFile == 0) Then return ""
    
    if (_WinAPI_ReadFile($hFile, DLLStructGetPtr($ResultBuffer), $nBytesToRead, $BytesRead, DLLStructGetPtr($OverlappedStruct)) <> true) then 
        _WinAPI_CloseHandle($hFile)
        Return ""
    EndIf
    
    For $i = 1 to $nBytesToRead + 1
        $ResultString &= Chr(DLLStructGetData($ResultBuffer, 1, $i))
    Next
    
    _WinAPI_CloseHandle($hFile)
    
    Return $ResultString
EndFunc
Link to comment
Share on other sites

I am trying to parse a VERY large single-line file, and I'm running into the obvious issue that variables can only store so much information before they have met their allocation. I have XML files that are dumped from a search service that return 20mb+ of results in a single line XML file. Obviously, I have to break that up somehow, but I'm at a loss.

Can anybody help me out with this? Can I read the file byte by byte or word by word?

How can I do this elegantly? Speed is a crucial issue.

Thanks in advance.

What makes you think 20MB is a problem?

An AutoIt string variant can theoretically be 2GB, though about 128MB in 32-bit Windows seems to be a more practical limit. Your 20MB is not really a challenge for the system.

:D

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...