ercicttech Posted March 13, 2024 Posted March 13, 2024 Hi, there. This is a little "niche".. I have an old computer file which I can load into Notepad, and see the text delimited by a couple of hex codes. Example I am a line of text FEFF I am another line of text FEFF I am yet another line of text I know I can use stringsplit to split with files it can read in as text. However, when I try to load these in, I can't stringsplit($str, Chr(254) & Chr(255), 1). Which suggests it's not reading the file in as "normal" text. I've tried the load as as binary option (16) - but split still doesn't work. I've tried then converting the hex to string, and splitting - still no joy. Can someone put me out of my misery? What really simple thing am I missing? Thanks!
Andreik Posted March 13, 2024 Posted March 13, 2024 Is there a pattern in your data? From your sample it looks like it's a line of text then a line of hex data. Is that true?
Nine Posted March 13, 2024 Posted March 13, 2024 (edited) Tested it, and it is working for me : #include <Constants.au3> #include <Array.au3> Local $hFile = FileOpen("Test.txt", $FO_OVERWRITE) FileWriteLine($hFile, "Line 1") FileWrite($hFile, Chr(254) & Chr(255)) FileWriteLine($hFile, "Line 2") FileWrite($hFile, Chr(254) & Chr(255)) FileWriteLine($hFile, "Line 3") FileClose($hFile) Local $sText = FileRead("Test.txt") Local $aArray = StringSplit($sText, Chr(254) & Chr(255), $STR_ENTIRESPLIT) _ArrayDisplay($aArray) ConsoleWrite(StringToBinary($aArray[1]) & @CRLF) You may want to upload a copy of your file, so we can see... Edited March 13, 2024 by Nine “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
Andreik Posted March 13, 2024 Posted March 13, 2024 I put your sample content in a sample.dat file and run this code: Local $sData = FileRead('sample.dat') Local $aLine = StringSplit($sData, @CRLF, 1) If IsArray($aLine) Then For $Index = 1 To $aLine[0] If StringRegExp($aLine[$Index], '(?i)^(0x)?([a-f0-9]{2})+$') Then ; Print just lines that contains hex data ConsoleWrite('Line ' & $Index & ': ' & $aLine[$Index] & @CRLF) EndIf Next EndIf It should print just the lines that contain hex data.
ajag Posted March 14, 2024 Posted March 14, 2024 16 hours ago, ercicttech said: I have an old computer file which I can load into Notepad, and see the text delimited by a couple of hex codes. Example I am a line of text FEFF I am another line of text FEFF I am yet another line of text Is this what you see in Notepad? I don't think Notepad can show HEX values, so are you sure that these are not charachters "FEFF"? Rule #1: Always do a backup Rule #2: Always do a backup (backup of rule #1)
rudi Posted March 14, 2024 Posted March 14, 2024 As @ajag pointed out already notepad isn't an HEX editor. Get notepad++ plugins -> Plugins Admin -> install HEX plugin open your file -> plugins -> view as hex. What do you see? Earth is flat, pigs can fly, and Nuclear Power is SAFE!
ercicttech Posted March 14, 2024 Author Posted March 14, 2024 22 hours ago, Andreik said: Is there a pattern in your data? From your sample it looks like it's a line of text then a line of hex data. Is that true? I was over simplifying slightly, but essentially yes. A block of hex splitting the various bits of text that I'm interested in. A standard couple of bytes come before each bit of text. So it's maybe something like abcdefabcdefFFEEHello I am line 1 df327328491FFEEHello I am line 2 So the plan is to split by FFEE
ercicttech Posted March 14, 2024 Author Posted March 14, 2024 (edited) 6 hours ago, ajag said: Is this what you see in Notepad? I don't think Notepad can show HEX values, so are you sure that these are not charachters "FEFF"? My bad - I meant to say that I'm using a dedicated Hex Editor to look at the file to get the hex codes - not Notepad/Notepad++. It's definitely hex codes FEFF Edited March 14, 2024 by ercicttech
Nine Posted March 14, 2024 Posted March 14, 2024 Did you read my post ? It is working fine as I showed you. So upload your file so we can see what's going on. “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
ercicttech Posted March 14, 2024 Author Posted March 14, 2024 1 minute ago, Nine said: Did you read my post ? It is working fine as I showed you. So upload your file so we can see what's going on. Hi. I'm currently at my work laptop, rather than my personal one - I'll upload it when I get home. I have another one that's curious. https://drive.google.com/file/d/10_NJawt5iPEMtUl_AQGFJ4ZKnRPW4Ngi/view?usp=sharing Using a hex editor, I can see that it starts with hex 03-37-00-0a-00-1e-00 and then essentially it's plain text. I can read it absolutely fine in Notepad ++ (normal mode - not hex edit), crop out the offending rogue characters at the start and save it off as a text file. But I'd rather not have to do that. If I do a simple $Infile = @ScriptDir & "\15_Q.bin" $R = FileOpen($InFile, 0) $Str = FileRead($R) FileClose($R) Msgbox(0, "", $Str) The message box I get is gibberish. I did manage to get these working using a complete fudge derived from someone else's work here. ;https://www.autoitscript.com/forum/topic/206247-binary-replacement-by-byte-position/ $AFile = @ScriptDir & "\" & $Num & "_A.bin" $R = FileOpen($AFile, 16) $Bin_Data = FileRead($) FileClose($R) Local $tByteBuffer ;Create a byte buffer struct and move the binary data to it $tByteBuffer = DllStructCreate(StringFormat("byte data[%i]", BinaryLen($BIN_DATA))) $tByteBuffer.data = $BIN_DATA ;Display binary data & byte buffer contents before modification ConsoleWrite("$BIN_DATA = " & $BIN_DATA & @CRLF) ConsoleWrite("$tByteBuffer before = " & $tByteBuffer.data & @CRLF) ;Modify first bytes $tByteBuffer.data(01) = 0x20 $tByteBuffer.data(02) = 0x20 $tByteBuffer.data(03) = 0x20 $tByteBuffer.data(04) = 0x20 $tByteBuffer.data(05) = 0x20 $hFile = FileOpen(@ScriptDir & "\TestNew.txt", $FO_CREATEPATH+$FO_BINARY+$FO_OVERWRITE) FileWrite($hFile, $tbyte.arr) FileClose($hFile) But - I'm ideally just wanting some way to skip these. Or have a way of stripping out a range of hex codes that I'm not interested in. In case anyone is wondering what on Earth I'm doing - I'm loading quiz games from the old ZX Spectrum, and ripping through the data to extract the questions/answers. Niche? You betcha. There are loads of different games, with the questions/answers stored using vaguely similar ways but with different hex codes either getting in the way or being used as markers. So my starter for 10 is getting AutoIt to handle files like the ones linked above without using the byte-by-byte code I nicked
Andreik Posted March 14, 2024 Posted March 14, 2024 (edited) And where are these hex code?? I don't see any data as you talked above. Just use StringReplace() or StringRegEx() if you want replaced some characters. In your latest example you have some 15_Q.bin but in your files uploaded on google drive there is no such file. Is that hard to post a valid question and to provide all required data? Edited March 14, 2024 by Andreik
ercicttech Posted March 14, 2024 Author Posted March 14, 2024 6 hours ago, Andreik said: And where are these hex code?? I don't see any data as you talked above. Just use StringReplace() or StringRegEx() if you want replaced some characters. In your latest example you have some 15_Q.bin but in your files uploaded on google drive there is no such file. Is that hard to post a valid question and to provide all required data? Oops .. I linked to the wrong ZIP file - that was the versions I ripped to text. Sorry - I was doing this on top of about 100 other things at work. BIN files are attached here. This should contain 18 BIN files with "Q" in the filename. These should all start with the same initial bit of hex, followed by raw text. I have already ripped the raw text by manually loading them into Notepad++ and deleting them, but I wish to be able to do this in AutoIT RothmanBINFiles.zip
Andreik Posted March 15, 2024 Posted March 15, 2024 You can do something like this: $hFile = FileOpen('15_Q.bin', 16) $dData = FileRead($hFile) FileClose($hFile) $dReplace = BinaryReplace($dData, '0x0337000A001E00', '') MsgBox(0, '', BinaryToString($dReplace)) Func BinaryReplace($dData, $dHex, $dReplace) $dHex = StringReplace($dHex, '0x', '') $dReplace = StringReplace($dReplace, '0x', '') Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}') EndFunc But if you are not sure about byte sequences you can replace for example all null bytes with a slightly modified version: $hFile = FileOpen('15_Q.bin', 16) $dData = FileRead($hFile) FileClose($hFile) $dReplace = BinaryReplace($dData, '0x00', '', True) MsgBox(0, '', BinaryToString($dReplace)) Func BinaryReplace($dData, $dHex, $dReplace, $fAll = False) $dHex = StringReplace($dHex, '0x', '') $dReplace = StringReplace($dReplace, '0x', '') Switch $fAll Case False Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}') Case True Do $dData = StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}') Until @extended = 0 Return $dData EndSwitch EndFunc
Solution ercicttech Posted March 15, 2024 Author Solution Posted March 15, 2024 (edited) 10 hours ago, Andreik said: You can do something like this: $hFile = FileOpen('15_Q.bin', 16) $dData = FileRead($hFile) FileClose($hFile) $dReplace = BinaryReplace($dData, '0x0337000A001E00', '') MsgBox(0, '', BinaryToString($dReplace)) Func BinaryReplace($dData, $dHex, $dReplace) $dHex = StringReplace($dHex, '0x', '') $dReplace = StringReplace($dReplace, '0x', '') Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}') EndFunc But if you are not sure about byte sequences you can replace for example all null bytes with a slightly modified version: $hFile = FileOpen('15_Q.bin', 16) $dData = FileRead($hFile) FileClose($hFile) $dReplace = BinaryReplace($dData, '0x00', '', True) MsgBox(0, '', BinaryToString($dReplace)) Func BinaryReplace($dData, $dHex, $dReplace, $fAll = False) $dHex = StringReplace($dHex, '0x', '') $dReplace = StringReplace($dReplace, '0x', '') Switch $fAll Case False Return StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}') Case True Do $dData = StringRegExpReplace($dData, '(?i)^(0x)?((?:[0-9a-f]{2})*?)(' & $dHex & ')((?:[0-9a-f]{2})*?)$', '${1}${2}' & $dReplace & '${4}') Until @extended = 0 Return $dData EndSwitch EndFunc Superb! That looks absolutely ideal. The 1st method worked perfectly with the Rothman files I attached yesterday. And I can easily modify that second method as a func, and use it in combination with a for next to get rid of a range of hex codes. Thank you VERY much! Edited March 15, 2024 by ercicttech Clarification
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now