help understanding how to parse any file like a hex editor would

sulfurious · May 24, 2013

Hello. Please do forgive my terminology, been awhile since I coded much and age has fogged things up lol.

I've used search a good deal, but haven't found a good thread yet to help me.

The goal is pretty simple I would think though, although I am not sure how to begin. What I want to do is take any file and open it raw, like how you would see the data if you opened it in a hex editor (although I only want to "read" a chunk of the file).

Next, I would like to parse or step through a certain number of bits or bytes, or characters, however you want to look at it, and do something with these chunks of data.

For example, in a hex editor you would start at offset 00 and end at offset 0F. The next "line" (although its not really a line) starts at off at 10 and ends at offset 1F, and so on until the end of the file.

In the past I have opened text files raw, and stepped through them looking for characters like 0D 0A and then converting to and from different types. But what I am wanting to learn how to do is read a segment or chunk of a given file, store it or whatever, then continue from the next offset to another given offset, read that, do something with it, and keep going until the end of the file.

Now I understand that I could use a file open and read binary or raw, but I anticipate that I might want to open a large file, lets say 200mb, and I don't want to actually open the whole thing into memory, but instead would want to read a portion, then another portion, so on and so on.

What I am having a hard time understanding is just how I could do this. I assume I would need to find the last addressable offset somehow, then open the first chunk, record my position, then read the next chunk.

I have thought about doing this many times in the past, and finally found a little extra time to learning something new. I don't expect any code to be handed to me. I know full well I need to do the work. But is there anybody who might want to either give me some good pointers on how to start understanding this, or does understand it (and what I am asking lol) and could explain a few things to me. I am confident that I can do this, but don't really understand what to look for in way of help. It may just be getting the terminology correct, as I've forgotten much from not using what I knew

Thanks to any takers.

AZJIO · May 24, 2013

link

FileSetPos

sulfurious · May 25, 2013

Thank you for the reply. That link you gave led me to being able to use ceiling and 16 to find the rows. Thats a good start.

The use of FileSetPos as you indicate is a good method to start reading at a point, and by keeping track of what your point is, you can always read it from a new point (of course). However, I don't see how to limit what is read to only say 16 characters for example. And, unless I am mistaken, the FileRead function will open the entire contents of the file. Useful for most file that are small, but not for large files.

So that helps, but the tricky part is how to find the total offset addresses (or whatever that would be called) in the file (as you have provided a good start for), then read only from offset 1 to offset F, do something in the script, then read from offset 10 to 1F, and so on, only ever actually "opening" a chunk of the file.

Now maybe this is not possible, but I was looking at some C code and while I don't implicitly understand it, it seems it can be done.

Of course since I don't know the terms exactly it makes it hard for me to communicate what I need or want.

Any other suggestions?

sulfurious · May 25, 2013

Oh, maybe I should ask, as I assumed here. When you use a file open with a count parameter, does it only read that portion in the parameter into memory?

I was under the impression it was this way, but perhaps not. Help file says the count, when read as binary (byte) will do this. So maybe that is all I need.

Can anyone tell me, when I look at a file with a hex editor, each character, whether ascii or binary, is this a byte? For example, when you are seeing an ansi file in hex, you see 0D 00 0A 00. If it is unicode you have an extended set, so you might see 0A BC 10 DA. Is each of these hex values (which in text would represent a character) a byte? So there are 16 bytes per "row" in the hex editor?

Not remembering all that like I used to. And to think I used to do a lot of conversion of registry values with autoit for every type. Must have been a long time ago now lol.

mLipok · May 25, 2013

maybe Im wrong

but maybe this can help You

AZJIO · May 25, 2013

FileRead ( "filehandle/filename" [, count] )

- there is a Hex-viewer

Edited May 25, 2013 by AZJIO

Gianni · May 25, 2013

read help on this commands:

FileOpen

FileSetPos

FileGetPos

FileRead

a simple "experiment" on those commands:

$File = FileOpenDialog('Please choose file', '', 'All files (*.*)', 1)

$myfile = FileOpen($File,16) ; open file in binary mode

FileSetPos($myfile,0,2) ; position "cursor" to last byte in file

$lenoffile = FileGetPos($myfile) ; read actual "cursor" position (last byte in this case)

ConsoleWrite("Dimension of " & $File & " is " & $lenoffile & " bytes" & @CRLF)

FileSetPos($myfile,0,0) ; position "cursor" to first byte in file

FileFlush($myfile)

for $x = 0 to 2048 step 16 ; show first 2048 bytes of the file

    ; FileSetPos($myfile,$x,0) ; position "cursor" to byte nr $x

    for $i = 1 to 15

        $byte = FileRead($myfile,1)  ; read 1 byte and autoincrement position

        ConsoleWrite($byte & " " & StringStripWS(chr($byte),8) & Chr(9))

    Next
    ConsoleWrite(@CRLF)
    FileFlush($myfile)

Next

Bye

Zedna · May 25, 2013

Here is another old example how to do it by Windows API

Notes:

This approach allows to read also other data types into structures.

This topic was created when AutoIt didn't have FileSetPos() implemented yet.

sulfurious · May 25, 2013

I'm at work now so I cannot look into any of these examples yet (thanks for the replies btw). The first question that comes to mind though after a quick look at this is, in order to use filesetpos and related functions, the file has to be opened first. What exactly happens when you use FileOpen (with any of the parameters), does the entire file get loaded into memory and then the handle is used to point subsequent functions like FileRead to the memory location? So SetFilePos would be used to point FileRead to a memory address?

I ask because I was wanting to learn if its possible, and how to, not have to load an entire file into memory, but be able to read only a certain portion. Maybe it isn't possible, but that is why I am here.

I will look at the rest of this tonight after work. Much of what I ask could be answered by just building a script and experimenting, but it never hurts to have others input either.

Thanks again for the replies thus far.

Zedna · May 25, 2013

FileOpen doesn't load file content to memory.

Only FileRead loads apropriate part (or whole) of file to memory buffer.

Edited May 25, 2013 by Zedna

sulfurious · May 26, 2013

Ah. That solves that then.

My initial reason was to modify files that are created with another app I made, but is not text. The makeup of the files created have a certain structure to them so I could modify them rather than load them (and decide what would need changed) and recreate them (overwrite). Its not a big deal either way as what I have works, but it would be a good exercise to navigate to a specific offset, read a certain portion of the file, then use that data and perhaps modify it. Learning for the sake of curiosity I guess

I was uncertain of the differences involved as I usually manipulate text files. I will work up a small sample and see if I have any issues. If it does what I need, then I have to decide if I want to pursue it to the point of modifying the file rather than recreating it, as opposed to just learning how to read a portion of it.

Thanks for the info.

Sign In

help understanding how to parse any file like a hex editor would

Recommended Posts

sulfurious

AZJIO

sulfurious

sulfurious

mLipok

AZJIO

Gianni

Zedna

sulfurious

Zedna

sulfurious

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta