Jump to content

help understanding how to parse any file like a hex editor would


Recommended Posts

Hello. Please do forgive my terminology, been awhile since I coded much and age has fogged things up lol.

I've used search a good deal, but haven't found a good thread yet to help me.

The goal is pretty simple I would think though, although I am not sure how to begin. What I want to do is take any file and open it raw, like how you would see the data if you opened it in a hex editor (although I only want to "read" a chunk of the file).

Next, I would like to parse or step through a certain number of bits or bytes, or characters, however you want to look at it, and do something with these chunks of data.

For example, in a hex editor you would start at offset 00 and end at offset 0F. The next "line" (although its not really a line) starts at off at 10 and ends at offset 1F, and so on until the end of the file.

In the past I have opened text files raw, and stepped through them looking for characters like 0D 0A and then converting to and from different types. But what I am wanting to learn how to do is read a segment or chunk of a given file, store it or whatever, then continue from the next offset to another given offset, read that, do something with it, and keep going until the end of the file.

Now I understand that I could use a file open and read binary or raw, but I anticipate that I might want to open a large file, lets say 200mb, and I don't want to actually open the whole thing into memory, but instead would want to read a portion, then another portion, so on and so on.

What I am having a hard time understanding is just how I could do this. I assume I would need to find the last addressable offset somehow, then open the first chunk, record my position, then read the next chunk.

I have thought about doing this many times in the past, and finally found a little extra time to learning something new. I don't expect any code to be handed to me. I know full well I need to do the work. But is there anybody who might want to either give me some good pointers on how to start understanding this, or does understand it (and what I am asking lol) and could explain a few things to me. I am confident that I can do this, but don't really understand what to look for in way of help. It may just be getting the terminology correct, as I've forgotten much from not using what I knew ;)

Thanks to any takers.

Link to comment
Share on other sites

Thank you for the reply. That link you gave led me to being able to use ceiling and 16 to find the rows. Thats a good start.

The use of FileSetPos as you indicate is a good method to start reading at a point, and by keeping track of what your point is, you can always read it from a new point (of course). However, I don't see how to limit what is read to only say 16 characters for example. And, unless I am mistaken, the FileRead function will open the entire contents of the file. Useful for most file that are small, but not for large files. 

So that helps, but the tricky part is how to find the total offset addresses (or whatever that would be called) in the file (as you have provided a good start for), then read only from offset 1 to offset F, do something in the script, then read from offset 10 to 1F, and so on, only ever actually "opening" a chunk of the file.

Now maybe this is not possible, but I was looking at some C code and while I don't implicitly understand it, it seems it can be done.

Of course since I don't know the terms exactly it makes it hard for me to communicate what I need or want.

Any other suggestions?

Link to comment
Share on other sites

Oh, maybe I should ask, as I assumed here. When you use a file open with a count parameter, does it only read that portion in the parameter into memory?

I was under the impression it was this way, but perhaps not. Help file says the count, when read as binary (byte) will do this. So maybe that is all I need.

Can anyone tell me, when I look at a file with a hex editor, each character, whether ascii or binary, is this a byte? For example, when you are seeing an ansi file in hex, you see 0D 00 0A 00. If it is unicode you have an extended set, so you might see 0A BC 10 DA. Is each of these hex values (which in text would represent a character) a byte? So there are 16 bytes per "row" in the hex editor?

Not remembering all that like I used to. And to think I used to do a lot of conversion of registry values with autoit for every type. Must have been a long time ago now lol.

Link to comment
Share on other sites

maybe Im wrong

but maybe this can help You

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 *

 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related:How to get reference to PDF object embeded in IE * IE on Windows 11

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2023-04-24

Link to comment
Share on other sites

read help on this commands:

FileOpen

FileSetPos

FileGetPos

FileRead

a simple "experiment" on those commands:

$File = FileOpenDialog('Please choose file', '', 'All files (*.*)', 1)

$myfile = FileOpen($File,16) ; open file in binary mode

FileSetPos($myfile,0,2) ; position "cursor" to last byte in file

$lenoffile = FileGetPos($myfile) ; read actual "cursor" position (last byte in this case)

ConsoleWrite("Dimension of " & $File & " is " & $lenoffile & " bytes" & @CRLF)

FileSetPos($myfile,0,0) ; position "cursor" to first byte in file

FileFlush($myfile)

for $x = 0 to 2048 step 16 ; show first 2048 bytes of the file

    ; FileSetPos($myfile,$x,0) ; position "cursor" to byte nr $x

    for $i = 1 to 15

        $byte = FileRead($myfile,1)  ; read 1 byte and autoincrement position

        ConsoleWrite($byte & " " & StringStripWS(chr($byte),8) & Chr(9))

    Next
    ConsoleWrite(@CRLF)
    FileFlush($myfile)

Next

Bye

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

I'm at work now so I cannot look into any of these examples yet (thanks for the replies btw). The first question that comes to mind though after a quick look at this is, in order to use filesetpos and related functions, the file has to be opened first. What exactly happens when you use FileOpen (with any of the parameters), does the entire file get loaded into memory and then the handle is used to point subsequent functions like FileRead to the memory location? So SetFilePos would be used to point FileRead to a memory address?

I ask because I was wanting to learn if its possible, and how to, not have to load an entire file into memory, but be able to read only a certain portion. Maybe it isn't possible, but that is why I am here.

I will look at the rest of this tonight after work. Much of what I ask could be answered by just building a script and experimenting, but it never hurts to have others input either.

Thanks again for the replies thus far.

Link to comment
Share on other sites

Ah. That solves that then.

My initial reason was to modify files that are created with another app I made, but is not text. The makeup of the files created have a certain structure to them so I could modify them rather than load them (and decide what would need changed) and recreate them (overwrite). Its not a big deal either way as what I have works, but it would be a good exercise to navigate to a specific offset, read a certain portion of the file, then use that data and perhaps modify it. Learning for the sake of curiosity I guess ;)

I was uncertain of the differences involved as I usually manipulate text files. I will work up a small sample and see if I have any issues. If it does what I need, then I have to decide if I want to pursue it to the point of modifying the file rather than recreating it, as opposed to just learning how to read a portion of it.

Thanks for the info.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...