Jump to content

AmrAli

Members
  • Posts

    8
  • Joined

  • Last visited

AmrAli's Achievements

Seeker

Seeker (1/7)

0

Reputation

  1. I was playing with the 'test.bin' file you posted before to check different encodings. I used powershell to encode the file to utf16, as I tried with notepad plus and it is buggy. function EncodeToUtf16($InFile, $Charset, $OutFile) { $Encoding = [System.Text.Encoding]::GetEncoding($Charset) $BinaryText = [System.IO.File]::ReadAllText($InFile, $Encoding) $Utf16LE = New-Object System.Text.UnicodeEncoding -ArgumentList $False, $False [System.IO.File]::WriteAllText($OutFile, $BinaryText, $Utf16LE) } EncodeToUtf16 -InFile "$pwd\test.bin" -Charset "Windows-1252" -OutFile "$pwd\ansi.bin" EncodeToUtf16 -InFile "$pwd\test.bin" -Charset "iso-8859-1" -OutFile "$pwd\iso_8859-1.bin" Then the hex differences was examined in BeyondCompare. This definitely shows that the first 256 Unicode code points are actually the ascii codes of the ISO 8859-1 (Latin1) charset. Wikipedia links: https://en.wikipedia.org/wiki/Windows-1252 https://en.wikipedia.org/wiki/ISO/IEC_8859-1
  2. Appreciating your help. see you,
  3. Strange. I don't know what is the problem with your machine. My results: Microsoft Windows x64 [Version 10.0.18363.959] AutoIt v3.3.14.5 D:\AutoIt\BinFind>BinFind test.bin "\x80.." Filename: test.bin Regex pattern: \x80.. Offset: 0x00000080 Length: 3 Bytes: 0x80 0x81 0x82 Char: [?ü?] D:\AutoIt\BinFind>BinFind test.bin "\x81.." Filename: test.bin Regex pattern: \x81.. Offset: 0x00000081 Length: 3 Bytes: 0x81 0x82 0x83 Char: [ü??] D:\AutoIt\BinFind>BinFind test.bin "[\x80-\x9F]" Filename: test.bin Regex pattern: [\x80-\x9F] Offset: 0x00000080 Length: 1 Bytes: 0x80 Char: [?] Offset: 0x00000081 Length: 1 Bytes: 0x81 Char: [ü] Offset: 0x00000082 Length: 1 Bytes: 0x82 Char: [?] Offset: 0x00000083 Length: 1 Bytes: 0x83 Char: [?] Offset: 0x00000084 Length: 1 Bytes: 0x84 Char: [?] Offset: 0x00000085 Length: 1 Bytes: 0x85 Char: [?] Offset: 0x00000086 Length: 1 Bytes: 0x86 Char: [?] Offset: 0x00000087 Length: 1 Bytes: 0x87 Char: [?] Offset: 0x00000088 Length: 1 Bytes: 0x88 Char: [?] Offset: 0x00000089 Length: 1 Bytes: 0x89 Char: [?] Offset: 0x0000008A Length: 1 Bytes: 0x8A Char: [?] Offset: 0x0000008B Length: 1 Bytes: 0x8B Char: [?] Offset: 0x0000008C Length: 1 Bytes: 0x8C Char: [?] Offset: 0x0000008D Length: 1 Bytes: 0x8D Char: [ì] Offset: 0x0000008E Length: 1 Bytes: 0x8E Char: [?] Offset: 0x0000008F Length: 1 Bytes: 0x8F Char: [Å] Offset: 0x00000090 Length: 1 Bytes: 0x90 Char: [É] Offset: 0x00000091 Length: 1 Bytes: 0x91 Char: [?] Offset: 0x00000092 Length: 1 Bytes: 0x92 Char: [?] Offset: 0x00000093 Length: 1 Bytes: 0x93 Char: [?] Offset: 0x00000094 Length: 1 Bytes: 0x94 Char: [?] Offset: 0x00000095 Length: 1 Bytes: 0x95 Char: [?] Offset: 0x00000096 Length: 1 Bytes: 0x96 Char: [?] Offset: 0x00000097 Length: 1 Bytes: 0x97 Char: [?] Offset: 0x00000098 Length: 1 Bytes: 0x98 Char: [?] Offset: 0x00000099 Length: 1 Bytes: 0x99 Char: [?] Offset: 0x0000009A Length: 1 Bytes: 0x9A Char: [?] Offset: 0x0000009B Length: 1 Bytes: 0x9B Char: [?] Offset: 0x0000009C Length: 1 Bytes: 0x9C Char: [?] Offset: 0x0000009D Length: 1 Bytes: 0x9D Char: [¥] Offset: 0x0000009E Length: 1 Bytes: 0x9E Char: [?] Offset: 0x0000009F Length: 1 Bytes: 0x9F Char: [?] Press any key to continue . . . I suspect you have the antivirus software blocking access to binary files. BTW, the latin1 string is in fact a ucs-2 wide character string encoded from iso-latin1 to ucs-2. For example byte 0x80 from the input binary data is encoded as 0x0080 (little-Endian) in the mirror ucs-2 string. The \x{FFFF} regex token searches 16-bit code units (not bytes) for a hexadecimal FFFF code point. The trick here, is that you can omit the leading FF to become \xFF, thus \x80 matches at the code unit (wchar) 0x0080. If the ANSI code page (Windows-1252, for most users) was used for conversion, the 0x80 byte will be encoded as the Euro sign (code point: 0x20AC), and then the regex search for \x80 will fail to find the expected 0x0080. As the AutoIt function BinaryToString() uses the ANSI code page, it destroys all the C1 control characters (0x80 - 0x9F) during conversion of the input binary data. You could also double check with the attached script HexFind.au3 that I wrote to validate the results of BinFind. The script uses a linear search algorithm, instead of regular expressions. HexFind.au3 #Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Change2CUI=y #AutoIt3Wrapper_Run_Tidy=y #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** AutoItSetOption("MustDeclareVars", 1) ;~ A demonstration to show how to perform search over binary files from command line. ;~ https://www.autoitscript.com/forum/topic/188564-use-regexp-on-binary-data ;~ Examples: ;~ HexFind "C:\Windows\System32\notepad.exe" "0x4D5A" ;~ HexFind "C:\Windows\System32\notepad.exe" "0x8984" #include <FileConstants.au3> #include <StringConstants.au3> If $CmdLine[0] <> 2 Then ConsoleWrite("Wrong command line arguments." & @CRLF & @CRLF & "Usage: HexFind <filename> <0xFFFF...>" & @CRLF) ; Exit EndIf Local Const $sFilePath = $CmdLine[1] Local Const $dSequence = Binary($CmdLine[2]) If Not FileExists($sFilePath) Then ConsoleWrite("File not found: " & $sFilePath & @CRLF) Exit EndIf ConsoleWrite("Filename: " & $sFilePath & @CRLF) ConsoleWrite("Hexadecimal sequence: " & String($dSequence) & @CRLF) ; Get the binary data Local $hFileOpen = FileOpen($sFilePath, $FO_READ + $FO_Binary) If $hFileOpen = -1 Then ConsoleWrite("An error occurred when reading the file." & @CRLF) Exit EndIf Local $BinaryData = FileRead($hFileOpen) FileClose($hFileOpen) ; Perform a linear search over the binary data. Local $iOffset = 1, _ $iMatches = 0 While 1 $iOffset = _HexFind($BinaryData, $dSequence, $iOffset) If @error Then ExitLoop $iMatches += 1 ConsoleWrite("Offset: 0x" & Hex($iOffset - 1) & " ") ; convert to zero-based file offset ConsoleWrite("Length: " & BinaryLen($dSequence) & " ") ConsoleWrite("Bytes: ") For $j = 1 To BinaryLen($dSequence) Local $iByte = BinaryMid($dSequence, $j, 1) ConsoleWrite("0x" & Hex($iByte, 2) & " ") Next ConsoleWrite(@CRLF) $iOffset += BinaryLen($dSequence) ; seek to end of match WEnd If $iMatches = 0 Then ConsoleWrite("No matches could be found." & @CRLF) EndIf ; #FUNCTION# ==================================================================================================================== ; Name ..........: _HexFind ; Description ...: Search for a byte sequence in a binary data and return the position. ; Syntax ........: _HexFind($dBinaryData, $dSequence[, $iStart = 1]) ; Parameters ....: $dBinaryData - The binary data to search. ; $dSequence - The byte sequence to search for. ; $iStart - [optional] The starting position of the search. Default is 1. ; Return values .: Success: The position of the byte sequence. ; Failure: 0 and sets the @error flag to non-zero. ; Remarks .......: The first binary position is 1. ; Related .......: ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func _HexFind($dBinaryData, $dSequence, $iStart = 1) Local $iBinaryLength = BinaryLen($dBinaryData), _ $iSeqLength = BinaryLen($dSequence) If $iBinaryLength = 0 Or _ $iSeqLength = 0 Or _ $iStart < 1 Or _ $iStart > $iBinaryLength - $iSeqLength + 1 Then Return SetError(2, @extended, 0) EndIf For $iPosition = $iStart To ($iBinaryLength - $iSeqLength + 1) For $i = 1 To $iSeqLength Local $iTemp1 = BinaryMid($dBinaryData, $iPosition + $i - 1, 1) Local $iTemp2 = BinaryMid($dSequence, $i, 1) If $iTemp1 <> $iTemp2 Then ContinueLoop 2 EndIf Next Return SetError(0, @extended, $iPosition) Next Return SetError(1, @extended, 0) EndFunc ;==>_HexFind Expected output: D:\AutoIt>HexFind "C:\Windows\System32\notepad.exe" "0x4D5A" Filename: C:\Windows\System32\notepad.exe Hexadecimal sequence: 0x4D5A Offset: 0x00000000 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00012279 Length: 2 Bytes: 0x4D 0x5A Offset: 0x000156D0 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00015D27 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00019555 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00023474 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00023C62 Length: 2 Bytes: 0x4D 0x5A D:\AutoIt>HexFind "C:\Windows\System32\notepad.exe" "0x8984" Filename: C:\Windows\System32\notepad.exe Hexadecimal sequence: 0x8984 Offset: 0x000004A9 Length: 2 Bytes: 0x89 0x84 Offset: 0x00000D92 Length: 2 Bytes: 0x89 0x84 Offset: 0x000010AA Length: 2 Bytes: 0x89 0x84 Offset: 0x0000170F Length: 2 Bytes: 0x89 0x84 Offset: 0x00001BA0 Length: 2 Bytes: 0x89 0x84 Offset: 0x00005806 Length: 2 Bytes: 0x89 0x84 Offset: 0x000077E4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000AED0 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000B6F0 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000B7B4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E54E Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E5D2 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E5E9 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E6C1 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E6ED Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E7F4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E896 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000EB15 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000EBDB Length: 2 Bytes: 0x89 0x84 Offset: 0x0000F1F4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0001DB88 Length: 2 Bytes: 0x89 0x84 Note: Results may vary depending on your Windows version. HexFind.au3 test.cmd
  4. Nice workaround for the console output. ANSI-encoded utf8. Local $utf8 = BinaryToString(StringToBinary($s, 4), 1) Your explanation for the implementation of Unicode in AutoIt is excellent and clear. OK. As a proof of concept, I wrote this small app, as a demonstration to show how to perform a regular expression search over binary files from command line. Stay safe. BinFind.au3 #Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Change2CUI=y #AutoIt3Wrapper_Run_Tidy=y #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** AutoItSetOption("MustDeclareVars", 1) ;~ A demonstration to show how to perform a regular expression ;~ search over binary files from command line. ;~ https://www.autoitscript.com/forum/topic/188564-use-regexp-on-binary-data ;~ Examples: ;~ BinFind "C:\Windows\System32\notepad.exe" "\x4D\x5A.." ;~ BinFind "C:\Windows\System32\notepad.exe" "\x89\x84.." #include <FileConstants.au3> #include <StringConstants.au3> If $CmdLine[0] <> 2 Then ConsoleWrite("Wrong command line arguments." & @CRLF & @CRLF & "Usage: BinFind <filename> <regexp_pattern>" & @CRLF) ; Exit EndIf Local Const $sFilePath = $CmdLine[1] Local Const $sPattern = $CmdLine[2] If Not FileExists($sFilePath) Then ConsoleWrite("File not found: " & $sFilePath & @CRLF) Exit EndIf ConsoleWrite("Filename: " & $sFilePath & @CRLF) ConsoleWrite("RegExp pattern: " & $sPattern & @CRLF) ; Get the binary data Local $hFileOpen = FileOpen($sFilePath, $FO_READ + $FO_Binary) If $hFileOpen = -1 Then ConsoleWrite("An error occurred when reading the file." & @CRLF) Exit EndIf Local $BinaryData = FileRead($hFileOpen) FileClose($hFileOpen) ; Convert the binary data into a string with identical one-to-one ; byte to character representation. This is useful for performing ; regular expressions on binary data. Local $sBinaryText = "" For $i = 1 To BinaryLen($BinaryData) Local $iCode = BinaryMid($BinaryData, $i, 1) Local $sChrW = ChrW($iCode) $sBinaryText &= $sChrW Next ; Perform a regular expression search on the mirror-image text. ; Note: search is not run over the original byte array. Local $aMatch = 0, _ $iOffset = 1, _ $iMatches = 0 While 1 $aMatch = StringRegExp($sBinaryText, _ "(?sx)" & $sPattern, _ $STR_REGEXPARRAYFULLMATCH, _ $iOffset _ ) If @error Then ExitLoop $iOffset = @extended $iMatches += 1 Local $sMatch = $aMatch[0] ; get the full match as the first array element Local $iPos = $iOffset - StringLen($sMatch) - 1 ; seek to start of match ConsoleWrite("Offset: 0x" & Hex($iPos) & " ") ConsoleWrite("Length: " & StringLen($sMatch) & " ") ConsoleWrite("Bytes: ") For $j = 1 To StringLen($sMatch) Local $sChrW = StringMid($sMatch, $j, 1) Local $iCode = AscW($sChrW) ConsoleWrite("0x" & Hex($iCode, 2) & " ") Next ;~ ConsoleWrite(@TAB & "Char: [" & $sMatch & "]" & @CRLF) ConsoleWrite(@TAB & "Char: [" & StringRegExpReplace($sMatch, "[\x0\x09\x0D\x0A]", "?") & "]" & @CRLF) WEnd If $iMatches = 0 Then ConsoleWrite("No matches could be found." & @CRLF) EndIf Expected output: D:\AutoIt>BinFind "C:\Windows\System32\notepad.exe" "\x4D\x5A.." Filename: C:\Windows\System32\notepad.exe RegExp pattern: \x4D\x5A.. Offset: 0x00000000 Length: 4 Bytes: 0x4D 0x5A 0x90 0x00 Char: [MZÉ?] Offset: 0x00012279 Length: 4 Bytes: 0x4D 0x5A 0x00 0x00 Char: [MZ??] Offset: 0x000156D0 Length: 4 Bytes: 0x4D 0x5A 0x00 0x00 Char: [MZ??] Offset: 0x00015D27 Length: 4 Bytes: 0x4D 0x5A 0x00 0x00 Char: [MZ??] Offset: 0x00019555 Length: 4 Bytes: 0x4D 0x5A 0x00 0x00 Char: [MZ??] Offset: 0x00023474 Length: 4 Bytes: 0x4D 0x5A 0x00 0x00 Char: [MZ??] Offset: 0x00023C62 Length: 4 Bytes: 0x4D 0x5A 0x00 0x00 Char: [MZ??] D:\AutoIt>BinFind "C:\Windows\System32\notepad.exe" "\x89\x84.." Filename: C:\Windows\System32\notepad.exe RegExp pattern: \x89\x84.. Offset: 0x000004A9 Length: 4 Bytes: 0x89 0x84 0x24 0x80 Char: [??$?] Offset: 0x00000D92 Length: 4 Bytes: 0x89 0x84 0x24 0x40 Char: [??$@] Offset: 0x000010AA Length: 4 Bytes: 0x89 0x84 0x24 0x40 Char: [??$@] Offset: 0x0000170F Length: 4 Bytes: 0x89 0x84 0x24 0x10 Char: [??$?] Offset: 0x00001BA0 Length: 4 Bytes: 0x89 0x84 0x24 0x40 Char: [??$@] Offset: 0x00005806 Length: 4 Bytes: 0x89 0x84 0x24 0x70 Char: [??$p] Offset: 0x000077E4 Length: 4 Bytes: 0x89 0x84 0x24 0x50 Char: [??$P] Offset: 0x0000AED0 Length: 4 Bytes: 0x89 0x84 0x24 0xA0 Char: [??$á] Offset: 0x0000B6F0 Length: 4 Bytes: 0x89 0x84 0x24 0xB0 Char: [??$¦] Offset: 0x0000B7B4 Length: 4 Bytes: 0x89 0x84 0x24 0x10 Char: [??$?] Offset: 0x0000E54E Length: 4 Bytes: 0x89 0x84 0x24 0x48 Char: [??$H] Offset: 0x0000E5D2 Length: 4 Bytes: 0x89 0x84 0x24 0xF4 Char: [??$(] Offset: 0x0000E5E9 Length: 4 Bytes: 0x89 0x84 0x24 0xF8 Char: [??$°] Offset: 0x0000E6C1 Length: 4 Bytes: 0x89 0x84 0x24 0x88 Char: [??$?] Offset: 0x0000E6ED Length: 4 Bytes: 0x89 0x84 0x24 0x98 Char: [??$?] Offset: 0x0000E7F4 Length: 4 Bytes: 0x89 0x84 0x24 0x90 Char: [??$É] Offset: 0x0000E896 Length: 4 Bytes: 0x89 0x84 0x24 0xA0 Char: [??$á] Offset: 0x0000EB15 Length: 4 Bytes: 0x89 0x84 0x24 0xB8 Char: [??$+] Offset: 0x0000EBDB Length: 4 Bytes: 0x89 0x84 0x24 0xB0 Char: [??$¦] Offset: 0x0000F1F4 Length: 4 Bytes: 0x89 0x84 0x24 0x40 Char: [??$@] Offset: 0x0001DB88 Length: 4 Bytes: 0x89 0x84 0x24 0x60 Char: [??$`] Note: Results may vary depending on your Windows version. BinFind.au3 test.cmd
  5. So, according to what you mentioned before, how can you explain why AutoIt uses these (not) supported constants? Is it a design error or a misnomer? BinaryToString ( expression [, flag = 1] ) flag: [optional] Changes how the binary data is converted: $SB_ANSI (1) = binary data is ANSI (default) $SB_UTF16LE (2) = binary data is UTF16 Little Endian $SB_UTF16BE (3) = binary data is UTF16 Big Endian $SB_UTF8 (4) = binary data is UTF8 Edit: I am now adopting your pathway.. anyway.
  6. If we’re going to scan for any arbitrary sequence of bytes with values between 0 and 255, we have to be sure that each character of a string that represents the binary data maps back to its respective byte value. Unfortunately, none of the encoding schemes that are allowed in AutoIt provide a one-to-one mapping of characters back to its respective byte value. There is a magic encoding scheme that does, however: ISO-8859-1 (Codepage: 28591). Note that using regular expression over binary data is not limited to search, but also for validating a specific byte format. See this link for using a regular expression to validate the binary format UT-8 text files. https://www.w3.org/International/questions/qa-forms-utf-8 and this link also https://stackoverflow.com/a/63049031/4208440 ; Transcode.au3 ========================================================================= ; If we’re going to scan for any arbitrary sequence of bytes with values between 0 ; and 255, we have to be sure that each character of a string that represents the ; binary data maps back to its respective byte value. ; Unfortunately, none of the encoding schemes that are allowed in AutoIt provide ; a one-to-one mapping of characters back to its respective byte value. There is a ; magic encoding scheme that does, however: ISO-8859-1 (Codepage: 28591). ; ======================================================================================= ; = 1a.= Create binary $tBuffer = DllStructCreate("byte[256]") For $i = 0x00 To 0xFF DllStructSetData($tBuffer, 1, $i, $i + 1) Next $BinaryData = DllStructGetData($tBuffer, 1) ConsoleWrite('$BinaryData ' & @TAB & '= ' & $BinaryData & @CRLF & @CRLF ) ; = 1b.= Convert to string ;~ $BinaryData = BinaryToString( $BinaryData ) $sString = BinaryToLatin1String( $BinaryData ) ; = 1c.= Transcode back to binary ;~ $Transcoded = StringToBinary( $BinaryData ) $Transcoded = Latin1StringToBinary( $sString ) ConsoleWrite('$Transcoded ' & @TAB & '= ' & $Transcoded & @CRLF & @CRLF ) If $BinaryData = $Transcoded Then ConsoleWrite( 'Transcoding is OK.' & @CRLF ) Else ConsoleWrite( 'Transcoding Failed.' & @CRLF ) EndIf ;~ Expected OUTPUT: ;~ $BinaryData = 0x000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B... ;~ $Transcoded = 0x000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B... ;~ Transcoding is OK. @jchd, I also did a test for the Unicode string from your post. ; Transcode_v2.au3 ====================================================================== ; If we’re going to scan for any arbitrary sequence of bytes with values between 0 ; and 255, we have to be sure that each character of a string that represents the ; binary data maps back to its respective byte value. ; Unfortunately, none of the encoding schemes that are allowed in AutoIt provide ; a one-to-one mapping of characters back to its respective byte value. There is a ; magic encoding scheme that does, however: ISO-8859-1 (Codepage: 28591). ; ======================================================================================= #include <StringConstants.au3> ; = 1a.= Create binary Local $s = "Μεγάλο πρόβλημα Большая проблема 大问题 बड़ी समस्या مشكلة كبيرة" $BinaryData = StringToBinary($s, $SB_UTF16LE) ; raw memory image ConsoleWrite('$BinaryData ' & @TAB & '= ' & $BinaryData & @CRLF & @CRLF ) ; = 1b.= Convert to string ;~ $BinaryData = BinaryToString( $BinaryData ) $sString = BinaryToLatin1String( $BinaryData ) ; = 1c.= Transcode back to binary ;~ $Transcoded = StringToBinary( $BinaryData ) $Transcoded = Latin1StringToBinary( $sString ) ConsoleWrite('$Transcoded ' & @TAB & '= ' & $Transcoded & @CRLF & @CRLF ) If $BinaryData = $Transcoded Then ConsoleWrite( 'Transcoding is OK.' & @CRLF ) Else ConsoleWrite( 'Transcoding Failed.' & @CRLF ) EndIf ;~ Expected OUTPUT: ;~ $BinaryData = 0x9C03B503B303AC03BB03BF032000C003C103CC03B203BB03B703BC03B1032000200011043E043B044C044804... ;~ $Transcoded = 0x9C03B503B303AC03BB03BF032000C003C103CC03B203BB03B703BC03B1032000200011043E043B044C044804... ;~ Transcoding is OK. Thank you for your reply.
  7. To be able to match a regular expression against binary data (bytes), the binary data is converted first to a Unicode string (all AutoIt strings are Unicode) using "iso-8859-1", aka, Latin1 encoding. It is the only single-byte encoding that has one-to-one mapping with the first 256 Unicode code points. Other encodings do not preserve all the binary bytes after conversion to text.
  8. A working solution to use regular expressions on binary data, #Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Change2CUI=y #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** ; #FUNCTION# ==================================================================================================================== ; Name ..........: BinaryToLatin1String ; Description ...: Convert binary data into a string with a one-to-one ; byte to character representation. This is useful for performing ; regular expressions on binary data. ; Syntax ........: BinaryToLatin1String($dBinary) ; Parameters ....: $dBinary - binary data. ; Return values .: String ; Remarks .......: ; Related .......: ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func BinaryToLatin1String($dBinary) If Not IsBinary($dBinary) Then Return "" Local $sText = "" For $i = 1 To BinaryLen($dBinary) Local $iCode = BinaryMid($dBinary, $i, 1) Local $sChr = ChrW($iCode) $sText &= $sChr Next Return $sText EndFunc ;==>BinaryToLatin1String ; #FUNCTION# ==================================================================================================================== ; Name ..........: Latin1StringToBinary ; Description ...: Convert a string to binary data with a one-to-one ; character to byte representation. This is useful for dispalying ; the binary matches of regular expressions. ; Syntax ........: Latin1StringToBinary($sText) ; Parameters ....: $sText - string. ; Return values .: Binary ; Remarks .......: ; Related .......: ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func Latin1StringToBinary($sText) If Not IsString($sText) Then Return Null Local $tBuffer = DllStructCreate("byte[" & StringLen($sText) & "]") Local $dBinary For $i = 1 To StringLen($sText) Local $sChr = StringMid($sText, $i, 1) Local $iCode = AscW($sChr) DllStructSetData($tBuffer, 1, $iCode, $i) Next Return DllStructGetData($tBuffer, 1) EndFunc ;==>Latin1StringToBinary ; The following shows an example of using BinaryToLatin1String to perform a regular ; expression search on Notepad.exe #include <FileConstants.au3> #include <StringConstants.au3> ; = 1a.= Get Data $BinaryData = FileRead( FileOpen( @SystemDir & "\notepad.exe" , $FO_Binary), 0x1000) ConsoleWrite('$BinaryData ' & @TAB & '= ' & $BinaryData & @CRLF) ; = 1b.= Convert ;~ $BinaryData = BinaryToString( $BinaryData ) $BinaryData = BinaryToLatin1String( $BinaryData ) ; = 2.= seek ;~ $pat = "(?s)\x4D\x5A.." $pat = "(?s)\x89\x84.." ConsoleWrite('$pat ' & @TAB & @TAB & '= ' & $pat & @CRLF) $match = StringRegExp($BinaryData, _ $pat, _ $STR_REGEXPARRAYFULLMATCH _ ) ;3 Output $Pos = @extended If IsArray($match) Then $Pos -= StringLen( $match [0] ) ; seek to start of match $Pos -= 1 ; make it zero-based offset ConsoleWrite('$Pos ' & @TAB & @TAB &'= 0x' & hex( $Pos ) & @CRLF ) ConsoleWrite('$match[0] ' & @TAB & '= ' & Latin1StringToBinary ( $match[0] ) & @CRLF ) Else ConsoleWrite('No matches could be found.' & @CRLF) EndIf ;~ Expected OUTPUT: ;~ $BinaryData = 0x4D5A90000300000004000000FFFF00... ;~ $pat = (?s)\x89\x84.. ;~ $Pos = 0x00000A36 ;~ $match[0] = 0x89842440
×
×
  • Create New...