Jump to content

Read binary file and split as hex


Go to solution Solved by ercicttech,

Recommended Posts

Hi, there.

This is a little "niche"..

I have an old computer file which I can load into Notepad, and see the text delimited by a couple of hex codes.

Example
I am a line of text
FEFF
I am another line of text
FEFF
I am yet another line of text

I know I can use stringsplit to split with files it can read in as text.
However, when I try to load these in, I can't stringsplit($str, Chr(254) & Chr(255), 1).

Which suggests it's not reading the file in as "normal" text.

I've tried the load as as binary option (16) - but split still doesn't work.
I've tried then converting the hex to string, and splitting - still no joy.

Can someone put me out of my misery? What really simple thing am I missing?

 

Thanks!

Link to comment
Share on other sites

Tested it, and it is working for me :

#include <Constants.au3>
#include <Array.au3>

Local $hFile = FileOpen("Test.txt", $FO_OVERWRITE)
FileWriteLine($hFile, "Line 1")
FileWrite($hFile, Chr(254) & Chr(255))
FileWriteLine($hFile, "Line 2")
FileWrite($hFile, Chr(254) & Chr(255))
FileWriteLine($hFile, "Line 3")
FileClose($hFile)

Local $sText = FileRead("Test.txt")
Local $aArray = StringSplit($sText, Chr(254) & Chr(255), $STR_ENTIRESPLIT)
_ArrayDisplay($aArray)
ConsoleWrite(StringToBinary($aArray[1]) & @CRLF)

You may want to upload a copy of your file, so we can see...

Edited by Nine
Link to comment
Share on other sites

I put your sample content in a sample.dat file and run this code:

Local $sData = FileRead('sample.dat')
Local $aLine = StringSplit($sData, @CRLF, 1)

If IsArray($aLine) Then
    For $Index = 1 To $aLine[0]
        If StringRegExp($aLine[$Index], '(?i)^(0x)?([a-f0-9]{2})+$') Then
            ; Print just lines that contains hex data
            ConsoleWrite('Line ' & $Index & ': ' & $aLine[$Index] & @CRLF)
        EndIf
    Next
EndIf

It should print just the lines that contain hex data.

When the words fail... music speaks.

Link to comment
Share on other sites

16 hours ago, ercicttech said:


I have an old computer file which I can load into Notepad, and see the text delimited by a couple of hex codes.

Example
I am a line of text
FEFF
I am another line of text
FEFF
I am yet another line of text
 

Is this what you see in Notepad?  I don't think Notepad can show HEX values, so are you sure that these are not charachters "FEFF"?

Rule #1: Always do a backup         Rule #2: Always do a backup (backup of rule #1)

Link to comment
Share on other sites

As @ajag pointed out already notepad isn't an HEX editor.

 

Get notepad++

plugins -> Plugins Admin -> install HEX plugin

 

open your file -> plugins -> view as hex.

 

What do you see?

Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Link to comment
Share on other sites

22 hours ago, Andreik said:

Is there a pattern in your data? From your sample it looks like it's a line of text then a line of hex data. Is that true?

I was over simplifying slightly, but essentially yes.
A block of hex splitting the various bits of text that I'm interested in.
A standard couple of bytes come before each bit of text.

So it's maybe something like

abcdefabcdefFFEEHello I am line 1
df327328491FFEEHello I am line 2

So the plan is to split by FFEE

Link to comment
Share on other sites

Posted (edited)
6 hours ago, ajag said:

Is this what you see in Notepad?  I don't think Notepad can show HEX values, so are you sure that these are not charachters "FEFF"?

My bad - I meant to say that I'm using a dedicated Hex Editor to look at the file to get the hex codes - not Notepad/Notepad++.
It's definitely hex codes FEFF

Edited by ercicttech
Link to comment
Share on other sites

1 minute ago, Nine said:

Did you read my post ?  It is working fine as I showed you.  So upload your file so we can see what's going on.

Hi. I'm currently at my work laptop, rather than my personal one - I'll upload it when I get home.

I have another one that's curious.

https://drive.google.com/file/d/10_NJawt5iPEMtUl_AQGFJ4ZKnRPW4Ngi/view?usp=sharing

Using a hex editor, I can see that it starts with hex

03-37-00-0a-00-1e-00 and then essentially it's plain text.

I can read it absolutely fine in Notepad ++ (normal mode - not hex edit), crop out the offending rogue characters at the start and save it off as a text file.

But I'd rather not have to do that.

If I do a simple

 

$Infile = @ScriptDir & "\15_Q.bin"
$R = FileOpen($InFile, 0)
$Str = FileRead($R)
FileClose($R)
Msgbox(0, "", $Str)

The message box I get is gibberish.

I did manage to get these working using a complete fudge derived from someone else's work here.

 

;https://www.autoitscript.com/forum/topic/206247-binary-replacement-by-byte-position/
$AFile = @ScriptDir & "\" & $Num & "_A.bin"

$R = FileOpen($AFile, 16)
$Bin_Data = FileRead($)
FileClose($R)

    Local $tByteBuffer

    ;Create a byte buffer struct and move the binary data to it
    $tByteBuffer      = DllStructCreate(StringFormat("byte data[%i]", BinaryLen($BIN_DATA)))
    $tByteBuffer.data = $BIN_DATA

    ;Display binary data & byte buffer contents before modification
    ConsoleWrite("$BIN_DATA            = " & $BIN_DATA & @CRLF)
    ConsoleWrite("$tByteBuffer before  = " & $tByteBuffer.data & @CRLF)

    ;Modify first bytes
    $tByteBuffer.data(01) = 0x20
    $tByteBuffer.data(02) = 0x20
    $tByteBuffer.data(03) = 0x20
    $tByteBuffer.data(04) = 0x20
    $tByteBuffer.data(05) = 0x20
    
    $hFile = FileOpen(@ScriptDir & "\TestNew.txt", $FO_CREATEPATH+$FO_BINARY+$FO_OVERWRITE)
    FileWrite($hFile, $tbyte.arr)
    FileClose($hFile)

But - I'm ideally just wanting some way to skip these.
Or have a way of stripping out a range of hex codes that I'm not interested in.

In case anyone is wondering what on Earth I'm doing - I'm loading quiz games from the old ZX Spectrum, and ripping through the data to extract the questions/answers.
Niche? You betcha.

There are loads of different games, with the questions/answers stored using vaguely similar ways but with different hex codes either getting in the way or being used as markers.

So my starter for 10 is getting AutoIt to handle files like the ones linked above without using the byte-by-byte code I nicked :)

 

Link to comment
Share on other sites

And where are these hex code?? :shocked:

I don't see any data as you talked above. Just use StringReplace() or StringRegEx() if you want replaced some characters.

In your latest example you have some 15_Q.bin but in your files uploaded on google drive there is no such file. Is that hard to post a valid question and to provide all required data?

Edited by Andreik

When the words fail... music speaks.

Link to comment
Share on other sites

6 hours ago, Andreik said:

And where are these hex code?? :shocked:

I don't see any data as you talked above. Just use StringReplace() or StringRegEx() if you want replaced some characters.

In your latest example you have some 15_Q.bin but in your files uploaded on google drive there is no such file. Is that hard to post a valid question and to provide all required data?

Oops .. I linked to the wrong ZIP file - that was the versions I ripped to text.
Sorry - I was doing this on top of about 100 other things at work. 

BIN files are attached here. This should contain 18 BIN files with "Q" in the filename.
These should all start with the same initial bit of hex, followed by raw text.
I have already ripped the raw text by manually loading them into Notepad++ and deleting them, but I wish to be able to do this in AutoIT


 

RothmanBINFiles.zip

Link to comment
Share on other sites

You can do something like this:

$hFile = FileOpen('15_Q.bin', 16)
$dData = FileRead($hFile)
FileClose($hFile)

$dReplace = BinaryReplace($dData, '0x0337000A001E00', '')
MsgBox(0, '', BinaryToString($dReplace))

Func BinaryReplace($dData, $dHex, $dReplace)
    $dHex = StringReplace($dHex, '0x', '')
    $dReplace = StringReplace($dReplace, '0x', '')
    Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}')
EndFunc

But if you are not sure about byte sequences you can replace  for example all null bytes with a slightly modified version:

$hFile = FileOpen('15_Q.bin', 16)
$dData = FileRead($hFile)
FileClose($hFile)

$dReplace = BinaryReplace($dData, '0x00', '', True)
MsgBox(0, '', BinaryToString($dReplace))

Func BinaryReplace($dData, $dHex, $dReplace, $fAll = False)
    $dHex = StringReplace($dHex, '0x', '')
    $dReplace = StringReplace($dReplace, '0x', '')
    Switch $fAll
        Case False
            Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}')
        Case True
            Do
                $dData = StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}')
            Until @extended = 0
            Return $dData
    EndSwitch
EndFunc

 

When the words fail... music speaks.

Link to comment
Share on other sites

  • Solution
Posted (edited)
10 hours ago, Andreik said:

You can do something like this:

$hFile = FileOpen('15_Q.bin', 16)
$dData = FileRead($hFile)
FileClose($hFile)

$dReplace = BinaryReplace($dData, '0x0337000A001E00', '')
MsgBox(0, '', BinaryToString($dReplace))

Func BinaryReplace($dData, $dHex, $dReplace)
    $dHex = StringReplace($dHex, '0x', '')
    $dReplace = StringReplace($dReplace, '0x', '')
    Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}')
EndFunc

But if you are not sure about byte sequences you can replace  for example all null bytes with a slightly modified version:

$hFile = FileOpen('15_Q.bin', 16)
$dData = FileRead($hFile)
FileClose($hFile)

$dReplace = BinaryReplace($dData, '0x00', '', True)
MsgBox(0, '', BinaryToString($dReplace))

Func BinaryReplace($dData, $dHex, $dReplace, $fAll = False)
    $dHex = StringReplace($dHex, '0x', '')
    $dReplace = StringReplace($dReplace, '0x', '')
    Switch $fAll
        Case False
            Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}')
        Case True
            Do
                $dData = StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}')
            Until @extended = 0
            Return $dData
    EndSwitch
EndFunc

 

Superb! That looks absolutely ideal.
The 1st method worked perfectly with the Rothman files I attached yesterday.

And I can easily modify that second method as a func, and use it in combination with a for next to get rid of a range of hex codes.

Thank you VERY much!

 

Edited by ercicttech
Clarification
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...