Jump to content

Recommended Posts

Strange. I don't know what is the problem with your machine.

My results:

  • Microsoft Windows x64 [Version 10.0.18363.959]
  • AutoIt v3.3.14.5
D:\AutoIt\BinFind>BinFind test.bin "\x80.."
Filename: test.bin
Regex pattern: \x80..
Offset: 0x00000080  Length: 3  Bytes: 0x80 0x81 0x82    Char: [?ü?]

D:\AutoIt\BinFind>BinFind test.bin "\x81.."
Filename: test.bin
Regex pattern: \x81..
Offset: 0x00000081  Length: 3  Bytes: 0x81 0x82 0x83    Char: [ü??]

D:\AutoIt\BinFind>BinFind test.bin "[\x80-\x9F]"
Filename: test.bin
Regex pattern: [\x80-\x9F]
Offset: 0x00000080  Length: 1  Bytes: 0x80      Char: [?]
Offset: 0x00000081  Length: 1  Bytes: 0x81      Char: [ü]
Offset: 0x00000082  Length: 1  Bytes: 0x82      Char: [?]
Offset: 0x00000083  Length: 1  Bytes: 0x83      Char: [?]
Offset: 0x00000084  Length: 1  Bytes: 0x84      Char: [?]
Offset: 0x00000085  Length: 1  Bytes: 0x85      Char: [?]
Offset: 0x00000086  Length: 1  Bytes: 0x86      Char: [?]
Offset: 0x00000087  Length: 1  Bytes: 0x87      Char: [?]
Offset: 0x00000088  Length: 1  Bytes: 0x88      Char: [?]
Offset: 0x00000089  Length: 1  Bytes: 0x89      Char: [?]
Offset: 0x0000008A  Length: 1  Bytes: 0x8A      Char: [?]
Offset: 0x0000008B  Length: 1  Bytes: 0x8B      Char: [?]
Offset: 0x0000008C  Length: 1  Bytes: 0x8C      Char: [?]
Offset: 0x0000008D  Length: 1  Bytes: 0x8D      Char: [ì]
Offset: 0x0000008E  Length: 1  Bytes: 0x8E      Char: [?]
Offset: 0x0000008F  Length: 1  Bytes: 0x8F      Char: [Å]
Offset: 0x00000090  Length: 1  Bytes: 0x90      Char: [É]
Offset: 0x00000091  Length: 1  Bytes: 0x91      Char: [?]
Offset: 0x00000092  Length: 1  Bytes: 0x92      Char: [?]
Offset: 0x00000093  Length: 1  Bytes: 0x93      Char: [?]
Offset: 0x00000094  Length: 1  Bytes: 0x94      Char: [?]
Offset: 0x00000095  Length: 1  Bytes: 0x95      Char: [?]
Offset: 0x00000096  Length: 1  Bytes: 0x96      Char: [?]
Offset: 0x00000097  Length: 1  Bytes: 0x97      Char: [?]
Offset: 0x00000098  Length: 1  Bytes: 0x98      Char: [?]
Offset: 0x00000099  Length: 1  Bytes: 0x99      Char: [?]
Offset: 0x0000009A  Length: 1  Bytes: 0x9A      Char: [?]
Offset: 0x0000009B  Length: 1  Bytes: 0x9B      Char: [?]
Offset: 0x0000009C  Length: 1  Bytes: 0x9C      Char: [?]
Offset: 0x0000009D  Length: 1  Bytes: 0x9D      Char: [¥]
Offset: 0x0000009E  Length: 1  Bytes: 0x9E      Char: [?]
Offset: 0x0000009F  Length: 1  Bytes: 0x9F      Char: [?]
Press any key to continue . . .

I suspect you have the antivirus software blocking access to binary files.

BTW, the latin1 string is in fact a ucs-2 wide character string encoded from iso-latin1 to ucs-2. For example byte 0x80 from the input binary data is encoded as 0x0080 (little-Endian) in the mirror ucs-2 string. 

The \x{FFFF} regex token searches 16-bit code units (not bytes) for a hexadecimal FFFF code point. The trick here, is that you can omit the leading FF to become \xFF, thus \x80 matches at the code unit (wchar) 0x0080.

If the ANSI code page (Windows-1252, for most users) was used for conversion, the 0x80 byte will be encoded as the Euro sign (code point: 0x20AC), and then the regex search for \x80 will fail to find the expected 0x0080. 

As the AutoIt function BinaryToString() uses the ANSI code page, it destroys all the C1 control characters (0x80 - 0x9F) during conversion of the input binary data. 

You could also double check with the attached script HexFind.au3 that I wrote to validate the results of BinFind.

The script uses a linear search algorithm, instead of regular expressions.

 

HexFind.au3

#Region ;**** Directives created by AutoIt3Wrapper_GUI ****
#AutoIt3Wrapper_Change2CUI=y
#AutoIt3Wrapper_Run_Tidy=y
#EndRegion ;**** Directives created by AutoIt3Wrapper_GUI ****

AutoItSetOption("MustDeclareVars", 1)

;~ A demonstration to show how to perform search over binary files from command line.
;~ https://www.autoitscript.com/forum/topic/188564-use-regexp-on-binary-data

;~ Examples:
;~ HexFind "C:\Windows\System32\notepad.exe" "0x4D5A"
;~ HexFind "C:\Windows\System32\notepad.exe" "0x8984"

#include <FileConstants.au3>
#include <StringConstants.au3>

If $CmdLine[0] <> 2 Then
    ConsoleWrite("Wrong command line arguments." & @CRLF & @CRLF & "Usage: HexFind <filename> <0xFFFF...>" & @CRLF) ;
    Exit
EndIf

Local Const $sFilePath = $CmdLine[1]
Local Const $dSequence = Binary($CmdLine[2])

If Not FileExists($sFilePath) Then
    ConsoleWrite("File not found: " & $sFilePath & @CRLF)
    Exit
EndIf

ConsoleWrite("Filename: " & $sFilePath & @CRLF)
ConsoleWrite("Hexadecimal sequence: " & String($dSequence) & @CRLF)

; Get the binary data
Local $hFileOpen = FileOpen($sFilePath, $FO_READ + $FO_Binary)
If $hFileOpen = -1 Then
    ConsoleWrite("An error occurred when reading the file." & @CRLF)
    Exit
EndIf
Local $BinaryData = FileRead($hFileOpen)
FileClose($hFileOpen)

; Perform a linear search over the binary data.
Local $iOffset = 1, _
        $iMatches = 0
While 1
    $iOffset = _HexFind($BinaryData, $dSequence, $iOffset)
    If @error Then ExitLoop

    $iMatches += 1
    ConsoleWrite("Offset: 0x" & Hex($iOffset - 1) & "  ")   ; convert to zero-based file offset
    ConsoleWrite("Length: " & BinaryLen($dSequence) & "  ")
    ConsoleWrite("Bytes: ")
    For $j = 1 To BinaryLen($dSequence)
        Local $iByte = BinaryMid($dSequence, $j, 1)
        ConsoleWrite("0x" & Hex($iByte, 2) & " ")
    Next
    ConsoleWrite(@CRLF)
    $iOffset += BinaryLen($dSequence)   ; seek to end of match
WEnd

If $iMatches = 0 Then
    ConsoleWrite("No matches could be found." & @CRLF)
EndIf

; #FUNCTION# ====================================================================================================================
; Name ..........: _HexFind
; Description ...: Search for a byte sequence in a binary data and return the position.
; Syntax ........: _HexFind($dBinaryData, $dSequence[, $iStart = 1])
; Parameters ....: $dBinaryData         - The binary data to search.
;                  $dSequence           - The byte sequence to search for.
;                  $iStart              - [optional] The starting position of the search. Default is 1.
; Return values .: Success:               The position of the byte sequence.
;                  Failure:               0 and sets the @error flag to non-zero.
; Remarks .......: The first binary position is 1.
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _HexFind($dBinaryData, $dSequence, $iStart = 1)
    Local $iBinaryLength = BinaryLen($dBinaryData), _
            $iSeqLength = BinaryLen($dSequence)

    If $iBinaryLength = 0 Or _
            $iSeqLength = 0 Or _
            $iStart < 1 Or _
            $iStart > $iBinaryLength - $iSeqLength + 1 Then

        Return SetError(2, @extended, 0)
    EndIf

    For $iPosition = $iStart To ($iBinaryLength - $iSeqLength + 1)
        For $i = 1 To $iSeqLength
            Local $iTemp1 = BinaryMid($dBinaryData, $iPosition + $i - 1, 1)
            Local $iTemp2 = BinaryMid($dSequence, $i, 1)
            If $iTemp1 <> $iTemp2 Then
                ContinueLoop 2
            EndIf
        Next
        Return SetError(0, @extended, $iPosition)
    Next

    Return SetError(1, @extended, 0)
EndFunc   ;==>_HexFind

Expected output:

D:\AutoIt>HexFind "C:\Windows\System32\notepad.exe" "0x4D5A"
Filename: C:\Windows\System32\notepad.exe
Hexadecimal sequence: 0x4D5A
Offset: 0x00000000  Length: 2  Bytes: 0x4D 0x5A
Offset: 0x00012279  Length: 2  Bytes: 0x4D 0x5A
Offset: 0x000156D0  Length: 2  Bytes: 0x4D 0x5A
Offset: 0x00015D27  Length: 2  Bytes: 0x4D 0x5A
Offset: 0x00019555  Length: 2  Bytes: 0x4D 0x5A
Offset: 0x00023474  Length: 2  Bytes: 0x4D 0x5A
Offset: 0x00023C62  Length: 2  Bytes: 0x4D 0x5A

D:\AutoIt>HexFind "C:\Windows\System32\notepad.exe" "0x8984"
Filename: C:\Windows\System32\notepad.exe
Hexadecimal sequence: 0x8984
Offset: 0x000004A9  Length: 2  Bytes: 0x89 0x84
Offset: 0x00000D92  Length: 2  Bytes: 0x89 0x84
Offset: 0x000010AA  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000170F  Length: 2  Bytes: 0x89 0x84
Offset: 0x00001BA0  Length: 2  Bytes: 0x89 0x84
Offset: 0x00005806  Length: 2  Bytes: 0x89 0x84
Offset: 0x000077E4  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000AED0  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000B6F0  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000B7B4  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000E54E  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000E5D2  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000E5E9  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000E6C1  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000E6ED  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000E7F4  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000E896  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000EB15  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000EBDB  Length: 2  Bytes: 0x89 0x84
Offset: 0x0000F1F4  Length: 2  Bytes: 0x89 0x84
Offset: 0x0001DB88  Length: 2  Bytes: 0x89 0x84

Note: Results may vary depending on your Windows version.

 

HexFind.au3 test.cmd

Edited by AmrAli
Upload .au3 file
Link to post
Share on other sites
6 hours ago, AmrAli said:

Strange. I don't know what is the problem with your machine.

The problem was the clock here! It was too late for me to think clearly, sorry.

6 hours ago, AmrAli said:

I suspect you have the antivirus software blocking access to binary files.

😁 No thank you, forget that.

6 hours ago, AmrAli said:

BTW, the latin1 string is in fact a ucs-2 wide character string encoded from iso-latin1 to ucs-2. For example byte 0x80 from the input binary data is encoded as 0x0080 (little-Endian) in the mirror ucs-2 string. 

If you remove references to Latin1, your statement is correct. It's precisely because we DON'T convert to Latin1 that the conversion is verbatim. I correct my previous (too-late-to-be true) post! Your uses of AscW and ChrW were correct, my mistake.

7 hours ago, AmrAli said:

The script uses a linear search algorithm, instead of regular expressions.

Yes, but regexes are all linear as well, albeit done in optimized low-level compiled C, faster than interpreted AutoIt.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to post
Share on other sites
23 minutes ago, jchd said:

The problem was the clock here! It was too late for me to think clearly, sorry.

😁 No thank you, forget that.

If you remove references to Latin1, your statement is correct. It's precisely because we DON'T convert to Latin1 that the conversion is verbatim. I correct my previous (too-late-to-be true) post! Your uses of AscW and ChrW were correct, my mistake.

Yes, but regexes are all linear as well, albeit done in optimized low-level compiled C, faster than interpreted AutoIt.

Appreciating your help. 

see you,

Link to post
Share on other sites

Again sorry for my mistake, it was > 3:30

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to post
Share on other sites

I was playing with the 'test.bin' file you posted before to check different encodings.

I used powershell to encode the file to utf16, as I tried with notepad plus and it is buggy.

function EncodeToUtf16($InFile, $Charset, $OutFile)  {
    $Encoding = [System.Text.Encoding]::GetEncoding($Charset)
    $BinaryText = [System.IO.File]::ReadAllText($InFile, $Encoding)
    $Utf16LE = New-Object System.Text.UnicodeEncoding -ArgumentList $False, $False
    [System.IO.File]::WriteAllText($OutFile, $BinaryText, $Utf16LE)
}

EncodeToUtf16  -InFile "$pwd\test.bin"  -Charset "Windows-1252"  -OutFile "$pwd\ansi.bin"
EncodeToUtf16  -InFile "$pwd\test.bin"  -Charset "iso-8859-1"    -OutFile "$pwd\iso_8859-1.bin"

Then the hex differences was examined in BeyondCompare.

This definitely shows that the first 256 Unicode code points are actually the ascii codes of the ISO 8859-1 (Latin1) charset.

Capture.PNG

Wikipedia links:

https://en.wikipedia.org/wiki/Windows-1252

https://en.wikipedia.org/wiki/ISO/IEC_8859-1

Edited by AmrAli
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By FrancescoDiMuro
      Good morning
      I'm playing with SRE and trying to obtain some information from a test file.
      I was testing the pattern on regex101, but when I bring it to AutoIt, it doesn't return the same result as on regex101.
      I am surely (?:missing some important notes about PCRE engine|the pattern is not correct at all).
      Script:
      #include <Array.au3> #include <StringConstants.au3> Test() Func Test() Local $strFileName = @ScriptDir & "\TestFile.txt", _ $strFileContent, _ $arrResult $strFileContent = FileRead($strFileName) If @error Then Return ConsoleWrite("FileRead ERR: " & @error & @CRLF) $arrResult = StringRegExp($strFileContent, '(?sx)User:\h([^\n]+)\n' & _ 'Login\-name:\h([^\n]+)\n' & _ '(?:CaseSensitive:\h([^\n]+)\n)?' & _ 'NTSecurity:\h([^\n]+)\n' & _ '(?:NO\n)?' & _ '(?:Domain:\h([^\n]+)\n)?' & _ 'Timeout:\h([^\n]+)\n' & _ '.*?' & _ 'Member:\h([^\n]+)\n', $STR_REGEXPARRAYGLOBALMATCH) If IsArray($arrResult) Then _ArrayDisplay($arrResult) EndFunc Test file:
      User: AMMINISTRATORE Login-name: ADM CaseSensitive: YES NTSecurity: NO NO Timeout: 00:05:00 Member: AMMINISTRATORI User: Test_User Login-name: Test_User NTSecurity: YES Domain: DNEU Timeout: 00:00:00 Member: OPERATORS Member: OPERATORS Any help (even from cats) it's highly appreciated.

      Cheers
    • By Tosyk
      Hi,
      Please help me to change metasymbol line. Right now I have this condition code:
      If StringInStr($_sName, 'TEXT ') Then $_sName = StringRegExpReplace($_sName, '(^.*)\TEXT (.*)$', '$2') $_sName = StringRegExpReplace($_sName, '(^.*)\ (.*)$', '$1') If Not CheckIsSave_($_sName) Then It work fine with this text file and finds each line which start from 'TEXT':
      Material B7E671143D244B ==================================== TEXT 2F3139D816C34D 1 TEXT B6A968EF2505A2 1 TEXT 35206697A04F91 1 TEXT EB485AF490D83D 1 TEXT 0DAB42294BD9B3 1 TEXT 3D6525BEE360E1 0 Material D6906B886B06E3 ==================================== TEXT 0CCECCCCFB62AE 1 TEXT 1E14CB29AB43F0 1 TEXT FB7F0DCE9B5950 1 But I have a new text file now the lines of which now are start with 0:, 1: and so on:
      sm_0 --------------- 0: dummy_gray 1: c_com_socksa_mt 2: c_com_socksa_tn 3: dummy_white 4: default_z 5: dummy_nmap 6: --- 7: --- sm_1 --------------- 0: c_com_prisoner_shoes_di 1: c_com_prisoner_shoes_mt 2: c_com_prisoner_shoes_tn 3: dummy_white 4: default_z 5: c_com_leatherb_rt 6: --- 7: --- how to change (or add) the condition code above to work with new text file?
      I'm trying to change this script: http://autoit-script.ru/threads/poisk-fajlov-rekursivno-po-dannomu-spisku.26970/post-148646
       
    • By seadoggie01
      I'm trying to capture everything after a "#ToDo" in my scripts. I got that like this:
      (?i)[^\v]*#todo(.*) But then I thought it would be nice to use underscores to continue the ToDo... kind of like this:
      #ToDo: This is a really long explanation about something _ # that is very in-depth and needs to take up a lot of _ # space in a ToDo comment Global $variables = "Bad" I can't seem to capture everything... and maybe I'm trying to do too much with Regex... I keep trying variations of this:
      Condensed Version: (?im)[^\v]*#todo(?:([^\v]*)_\s*)*#([^\v]*) Expanded with comments (?ixm)(?# Ignore case, ignore newlines in Regex, use multiline option)# [^\v]*(?# Match leading space/s)# \#todo(?# Match the #todo)# (?:([^\v]*)_\s*)*(?# Match lines ending with _)# \#([^\v]*)(?# Last line only, no _'s)# I never seem to be able to build an array well with Regex... I saw something once about not being able to capture repeated patterns, and I think that's my issue
    • By genius257
      Inspired by PHP's preg_split.
      Split string by a regular expression.
      Also supports the same flags as the PHP equivalent.
      v1.0.1
       
      Example:
      #include "StringRegExpSplit.au3" StringRegExpSplit('splitCamelCaseWords', '(?<=\w)(?=[A-Z])') ; ['split', 'Camel', 'Case', 'Words']  
    • By jmp
      i am trying to get number from string using this code :
      #include <IE.au3> $oIE = _IEAttach ("Edu.corner") Local $aName = "Student name & Code:", $iaName = "0" Local $oTds = _IETagNameGetCollection($oIE, "td") For $oTd In $oTds If $oTd.InnerText = $aName Then $iaName = $oTd.NextElementSibling.InnerText $iGet = StringRegExpReplace($iaName, "\D", "") EndIf Next MsgBox(0, "", $iGet) it was get number like 52503058
      But, I want to get only student code 5250. (Different student have different code, sometime its 3 digits, Sometime 4)

       
×
×
  • Create New...