AmrAli Posted July 29, 2020 Posted July 29, 2020 (edited) Strange. I don't know what is the problem with your machine. My results: Microsoft Windows x64 [Version 10.0.18363.959] AutoIt v3.3.14.5 expandcollapse popupD:\AutoIt\BinFind>BinFind test.bin "\x80.." Filename: test.bin Regex pattern: \x80.. Offset: 0x00000080 Length: 3 Bytes: 0x80 0x81 0x82 Char: [?ü?] D:\AutoIt\BinFind>BinFind test.bin "\x81.." Filename: test.bin Regex pattern: \x81.. Offset: 0x00000081 Length: 3 Bytes: 0x81 0x82 0x83 Char: [ü??] D:\AutoIt\BinFind>BinFind test.bin "[\x80-\x9F]" Filename: test.bin Regex pattern: [\x80-\x9F] Offset: 0x00000080 Length: 1 Bytes: 0x80 Char: [?] Offset: 0x00000081 Length: 1 Bytes: 0x81 Char: [ü] Offset: 0x00000082 Length: 1 Bytes: 0x82 Char: [?] Offset: 0x00000083 Length: 1 Bytes: 0x83 Char: [?] Offset: 0x00000084 Length: 1 Bytes: 0x84 Char: [?] Offset: 0x00000085 Length: 1 Bytes: 0x85 Char: [?] Offset: 0x00000086 Length: 1 Bytes: 0x86 Char: [?] Offset: 0x00000087 Length: 1 Bytes: 0x87 Char: [?] Offset: 0x00000088 Length: 1 Bytes: 0x88 Char: [?] Offset: 0x00000089 Length: 1 Bytes: 0x89 Char: [?] Offset: 0x0000008A Length: 1 Bytes: 0x8A Char: [?] Offset: 0x0000008B Length: 1 Bytes: 0x8B Char: [?] Offset: 0x0000008C Length: 1 Bytes: 0x8C Char: [?] Offset: 0x0000008D Length: 1 Bytes: 0x8D Char: [ì] Offset: 0x0000008E Length: 1 Bytes: 0x8E Char: [?] Offset: 0x0000008F Length: 1 Bytes: 0x8F Char: [Å] Offset: 0x00000090 Length: 1 Bytes: 0x90 Char: [É] Offset: 0x00000091 Length: 1 Bytes: 0x91 Char: [?] Offset: 0x00000092 Length: 1 Bytes: 0x92 Char: [?] Offset: 0x00000093 Length: 1 Bytes: 0x93 Char: [?] Offset: 0x00000094 Length: 1 Bytes: 0x94 Char: [?] Offset: 0x00000095 Length: 1 Bytes: 0x95 Char: [?] Offset: 0x00000096 Length: 1 Bytes: 0x96 Char: [?] Offset: 0x00000097 Length: 1 Bytes: 0x97 Char: [?] Offset: 0x00000098 Length: 1 Bytes: 0x98 Char: [?] Offset: 0x00000099 Length: 1 Bytes: 0x99 Char: [?] Offset: 0x0000009A Length: 1 Bytes: 0x9A Char: [?] Offset: 0x0000009B Length: 1 Bytes: 0x9B Char: [?] Offset: 0x0000009C Length: 1 Bytes: 0x9C Char: [?] Offset: 0x0000009D Length: 1 Bytes: 0x9D Char: [¥] Offset: 0x0000009E Length: 1 Bytes: 0x9E Char: [?] Offset: 0x0000009F Length: 1 Bytes: 0x9F Char: [?] Press any key to continue . . . I suspect you have the antivirus software blocking access to binary files. BTW, the latin1 string is in fact a ucs-2 wide character string encoded from iso-latin1 to ucs-2. For example byte 0x80 from the input binary data is encoded as 0x0080 (little-Endian) in the mirror ucs-2 string. The \x{FFFF} regex token searches 16-bit code units (not bytes) for a hexadecimal FFFF code point. The trick here, is that you can omit the leading FF to become \xFF, thus \x80 matches at the code unit (wchar) 0x0080. If the ANSI code page (Windows-1252, for most users) was used for conversion, the 0x80 byte will be encoded as the Euro sign (code point: 0x20AC), and then the regex search for \x80 will fail to find the expected 0x0080. As the AutoIt function BinaryToString() uses the ANSI code page, it destroys all the C1 control characters (0x80 - 0x9F) during conversion of the input binary data. You could also double check with the attached script HexFind.au3 that I wrote to validate the results of BinFind. The script uses a linear search algorithm, instead of regular expressions. HexFind.au3 expandcollapse popup#Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Change2CUI=y #AutoIt3Wrapper_Run_Tidy=y #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** AutoItSetOption("MustDeclareVars", 1) ;~ A demonstration to show how to perform search over binary files from command line. ;~ https://www.autoitscript.com/forum/topic/188564-use-regexp-on-binary-data ;~ Examples: ;~ HexFind "C:\Windows\System32\notepad.exe" "0x4D5A" ;~ HexFind "C:\Windows\System32\notepad.exe" "0x8984" #include <FileConstants.au3> #include <StringConstants.au3> If $CmdLine[0] <> 2 Then ConsoleWrite("Wrong command line arguments." & @CRLF & @CRLF & "Usage: HexFind <filename> <0xFFFF...>" & @CRLF) ; Exit EndIf Local Const $sFilePath = $CmdLine[1] Local Const $dSequence = Binary($CmdLine[2]) If Not FileExists($sFilePath) Then ConsoleWrite("File not found: " & $sFilePath & @CRLF) Exit EndIf ConsoleWrite("Filename: " & $sFilePath & @CRLF) ConsoleWrite("Hexadecimal sequence: " & String($dSequence) & @CRLF) ; Get the binary data Local $hFileOpen = FileOpen($sFilePath, $FO_READ + $FO_Binary) If $hFileOpen = -1 Then ConsoleWrite("An error occurred when reading the file." & @CRLF) Exit EndIf Local $BinaryData = FileRead($hFileOpen) FileClose($hFileOpen) ; Perform a linear search over the binary data. Local $iOffset = 1, _ $iMatches = 0 While 1 $iOffset = _HexFind($BinaryData, $dSequence, $iOffset) If @error Then ExitLoop $iMatches += 1 ConsoleWrite("Offset: 0x" & Hex($iOffset - 1) & " ") ; convert to zero-based file offset ConsoleWrite("Length: " & BinaryLen($dSequence) & " ") ConsoleWrite("Bytes: ") For $j = 1 To BinaryLen($dSequence) Local $iByte = BinaryMid($dSequence, $j, 1) ConsoleWrite("0x" & Hex($iByte, 2) & " ") Next ConsoleWrite(@CRLF) $iOffset += BinaryLen($dSequence) ; seek to end of match WEnd If $iMatches = 0 Then ConsoleWrite("No matches could be found." & @CRLF) EndIf ; #FUNCTION# ==================================================================================================================== ; Name ..........: _HexFind ; Description ...: Search for a byte sequence in a binary data and return the position. ; Syntax ........: _HexFind($dBinaryData, $dSequence[, $iStart = 1]) ; Parameters ....: $dBinaryData - The binary data to search. ; $dSequence - The byte sequence to search for. ; $iStart - [optional] The starting position of the search. Default is 1. ; Return values .: Success: The position of the byte sequence. ; Failure: 0 and sets the @error flag to non-zero. ; Remarks .......: The first binary position is 1. ; Related .......: ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func _HexFind($dBinaryData, $dSequence, $iStart = 1) Local $iBinaryLength = BinaryLen($dBinaryData), _ $iSeqLength = BinaryLen($dSequence) If $iBinaryLength = 0 Or _ $iSeqLength = 0 Or _ $iStart < 1 Or _ $iStart > $iBinaryLength - $iSeqLength + 1 Then Return SetError(2, @extended, 0) EndIf For $iPosition = $iStart To ($iBinaryLength - $iSeqLength + 1) For $i = 1 To $iSeqLength Local $iTemp1 = BinaryMid($dBinaryData, $iPosition + $i - 1, 1) Local $iTemp2 = BinaryMid($dSequence, $i, 1) If $iTemp1 <> $iTemp2 Then ContinueLoop 2 EndIf Next Return SetError(0, @extended, $iPosition) Next Return SetError(1, @extended, 0) EndFunc ;==>_HexFind Expected output: D:\AutoIt>HexFind "C:\Windows\System32\notepad.exe" "0x4D5A" Filename: C:\Windows\System32\notepad.exe Hexadecimal sequence: 0x4D5A Offset: 0x00000000 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00012279 Length: 2 Bytes: 0x4D 0x5A Offset: 0x000156D0 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00015D27 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00019555 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00023474 Length: 2 Bytes: 0x4D 0x5A Offset: 0x00023C62 Length: 2 Bytes: 0x4D 0x5A D:\AutoIt>HexFind "C:\Windows\System32\notepad.exe" "0x8984" Filename: C:\Windows\System32\notepad.exe Hexadecimal sequence: 0x8984 Offset: 0x000004A9 Length: 2 Bytes: 0x89 0x84 Offset: 0x00000D92 Length: 2 Bytes: 0x89 0x84 Offset: 0x000010AA Length: 2 Bytes: 0x89 0x84 Offset: 0x0000170F Length: 2 Bytes: 0x89 0x84 Offset: 0x00001BA0 Length: 2 Bytes: 0x89 0x84 Offset: 0x00005806 Length: 2 Bytes: 0x89 0x84 Offset: 0x000077E4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000AED0 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000B6F0 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000B7B4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E54E Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E5D2 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E5E9 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E6C1 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E6ED Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E7F4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000E896 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000EB15 Length: 2 Bytes: 0x89 0x84 Offset: 0x0000EBDB Length: 2 Bytes: 0x89 0x84 Offset: 0x0000F1F4 Length: 2 Bytes: 0x89 0x84 Offset: 0x0001DB88 Length: 2 Bytes: 0x89 0x84 Note: Results may vary depending on your Windows version. HexFind.au3 test.cmd Edited July 31, 2020 by AmrAli Upload .au3 file
jchd Posted July 29, 2020 Posted July 29, 2020 6 hours ago, AmrAli said: Strange. I don't know what is the problem with your machine. The problem was the clock here! It was too late for me to think clearly, sorry. 6 hours ago, AmrAli said: I suspect you have the antivirus software blocking access to binary files. 😁 No thank you, forget that. 6 hours ago, AmrAli said: BTW, the latin1 string is in fact a ucs-2 wide character string encoded from iso-latin1 to ucs-2. For example byte 0x80 from the input binary data is encoded as 0x0080 (little-Endian) in the mirror ucs-2 string. If you remove references to Latin1, your statement is correct. It's precisely because we DON'T convert to Latin1 that the conversion is verbatim. I correct my previous (too-late-to-be true) post! Your uses of AscW and ChrW were correct, my mistake. 7 hours ago, AmrAli said: The script uses a linear search algorithm, instead of regular expressions. Yes, but regexes are all linear as well, albeit done in optimized low-level compiled C, faster than interpreted AutoIt. AmrAli 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
AmrAli Posted July 29, 2020 Posted July 29, 2020 23 minutes ago, jchd said: The problem was the clock here! It was too late for me to think clearly, sorry. 😁 No thank you, forget that. If you remove references to Latin1, your statement is correct. It's precisely because we DON'T convert to Latin1 that the conversion is verbatim. I correct my previous (too-late-to-be true) post! Your uses of AscW and ChrW were correct, my mistake. Yes, but regexes are all linear as well, albeit done in optimized low-level compiled C, faster than interpreted AutoIt. Appreciating your help. see you,
jchd Posted July 29, 2020 Posted July 29, 2020 Again sorry for my mistake, it was > 3:30 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
AmrAli Posted July 29, 2020 Posted July 29, 2020 (edited) I was playing with the 'test.bin' file you posted before to check different encodings. I used powershell to encode the file to utf16, as I tried with notepad plus and it is buggy. function EncodeToUtf16($InFile, $Charset, $OutFile) { $Encoding = [System.Text.Encoding]::GetEncoding($Charset) $BinaryText = [System.IO.File]::ReadAllText($InFile, $Encoding) $Utf16LE = New-Object System.Text.UnicodeEncoding -ArgumentList $False, $False [System.IO.File]::WriteAllText($OutFile, $BinaryText, $Utf16LE) } EncodeToUtf16 -InFile "$pwd\test.bin" -Charset "Windows-1252" -OutFile "$pwd\ansi.bin" EncodeToUtf16 -InFile "$pwd\test.bin" -Charset "iso-8859-1" -OutFile "$pwd\iso_8859-1.bin" Then the hex differences was examined in BeyondCompare. This definitely shows that the first 256 Unicode code points are actually the ascii codes of the ISO 8859-1 (Latin1) charset. Wikipedia links: https://en.wikipedia.org/wiki/Windows-1252 https://en.wikipedia.org/wiki/ISO/IEC_8859-1 Edited July 29, 2020 by AmrAli
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now