Jump to content

regex to prepend '0x' to string chunks


Recommended Posts

in this post I used a regex pattern found here: (https://www.techiedelight.com/split-string-into-chunks-csharp/) it breaks a string into pieces of the desired size.
Can that pattern be modified so to add the prefix "0x" to each of the pieces?
thanks

; https://www.techiedelight.com/split-string-into-chunks-csharp/
#include <array.au3>
Local $sString = "123456789"
Local $sPattern ="\w{3}"
Local $aArray = StringRegExp($sString, $sPattern, 3)
_ArrayDisplay($aArray)

 

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to post
Share on other sites
#include <array.au3>
$str = '123456789'
$arr = StringRegExp($str, '\w{3}', 3)
for $i = 0 to UBound($arr)-1
    $arr[$i] = '0x' & $arr[$i]
next
_ArrayDisplay($arr)
$tmp = StringRegExpReplace($str, '(\w{3})', '0x$0 ')
ConsoleWrite($tmp & @CRLF)

Oupsi @Subz was a bit faster

Anyway it may be possible with PCRE backtracking (with stuff like (*ACCEPT) or (*COMMIT)) but I'm not sure

Edited by 636C65616E
Link to post
Share on other sites

well in the first instance I didn't posted this idea because i thought it should be slower than a naive loop, but :

Spoiler
#include <String.au3>

func println($msg = '')
    ConsoleWrite($msg & @CRLF)
endfunc

$nb  = 1000
$str = _StringRepeat('123',500)
global $arr = [ [0,0] , [0,0] ]

for $i = 1 to $nb

    $time = TimerInit()
    $res = StringRegExp(StringRegExpReplace($str, '\w{3}', '0x$0'), '\w{5}', 3)
    $time = TimerDiff($time)
    $arr[0][0] += $time
    $arr[0][1] += $time * $time

    $time = TimerInit()
    $res = StringRegExp($str, '\w{3}', 3)
    for $j = 0 to UBound($res)-1
        $res[$j] = '0x' & $res[$j]
    next
    $time = TimerDiff($time)
    $arr[1][0] += $time
    $arr[1][1] += $time * $time

next

$arr[0][0] /= $nb
$arr[0][1] = Sqrt( $arr[0][1] / $nb - $arr[0][0] * $arr[0][0] )
$arr[1][0] /= $nb
$arr[1][1] = Sqrt( $arr[1][1] / $nb - $arr[1][0] * $arr[1][0] )

println('                 mean     stdev')
println(StringFormat('Double RegExp :  %.4f | %.4f', $arr[0][0], $arr[0][1]))
println(StringFormat('For Loop      :  %.4f | %.4f', $arr[1][0], $arr[1][1]))

The double regexp is slightly faster.

Double RegExp :  0.6755 | 0.1718
For Loop      :  0.8684 | 0.2984

It looks quite straight-forward, but :

Two-sample asymptotic difference of means test

data:  d$V1 and d$V2
statistic = -39.352, p-value < 2.2e-16
alternative hypothesis: true difference of means is not equal to 0
95 percent confidence interval:
 -0.2066905 -0.1870783
sample estimates:
difference of means 
         -0.1968844

 

Edited by 636C65616E
Link to post
Share on other sites
15 minutes ago, 636C65616E said:

well in the first instance I didn't posted this idea because i thought it should be slower than a naive loop, but :

A couple couple of thoughts here.

1) For solution evaluation, I prefer to play @NineBall.  Which means I use an informal series of Nine’s dicta, asides and snarks from past topics, as my guide.  In this case, since

  a) performance is not mentioned as a req

  b) no comprehensive data set is available 

  then the controlling rule would be “One-liners win”.

2) A native loop might be fast, but the Autoit interpretive For loop is not such an animal.

3) The PCRE engine on the other hand is, AFAIK, written in native compiled code.  

4) The data in the example is trivial though.  I would expect with longer strings would come even more performance disparity.

Code hard, but don’t hard code...

Link to post
Share on other sites

haha yeah I agree with you, but getting a trivial answer 10 times slower than a not much more complexe solution is not an answer for me.

I completly agree with all your points, except the 'One-Liner' rule (well we're not playing Pythonesk stuff here). Buuuuutt because of how AutoIt interpretes stuff and how the code might be 'compiled' : short code could be something you're aiming for.

Anyway, I guess we have an answer (I never intended to do any kind of 'competition' ^^)

EDIT:

15 minutes ago, JockoDundee said:

4) The data in the example is trivial though.  I would expect with longer strings would come even more performance disparity.

My code uses 500 * 3 char string, and the asymp test is computed with a sample size of 4000.

Edited by 636C65616E
Link to post
Share on other sites

thank you all,
sorry, but i didn't mention in the original post that the goal was not to do post-processing on the array, but get the result in one go from the regexp.
however, two nested regexes are also interesting... :)

The purpose of this stuff is to have an array with one hex byte for each element extracted from a contiguous byte structure (as used in the link I posted in the first post),
in short, there is a sequence of bytes coming out of the DllStructGetData() command which is in this form:

0x03f477ffa50245 ......

therefore the regex kindly provided by you should be applied to the sequence of bytes excluding the first two characters, that is 0x
so the final function could be this:

 $aKeyboardState = StringRegExp(StringRegExpReplace (StringTrimLeft (DllStructGetData (_WinAPI_GetKeyboardState (), 1), 2), "\w{2}", "0x$0"), "\w{4}", 3)

I used StringTrimLeft() to remove the first two characters 0x,
but, at this point, another question arises spontaneously,
is there any regex "clause" to apply the regex starting from the third byte onwards? in this way I could delete the StringTrimLeft part

Thanks everyone for the interesting approaches

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to post
Share on other sites

Chimp after looking through the getkybdstate function documentation I'm not so sure you're getting the correct data out of it.  Or maybe not the data you think you're supposed to be getting.   It really seems like you should be shuffling through the struct byte by byte and it looks like the value should be either a 1, a 0 or 128 depending on the type of key and the state. I'm guess that the index+1 corresponds to the virtual key code. 0-254 being all the keys on an ascii keyboard. 

Edit.  Scratch that I saw what you posted in a different thread.   I'm not sure why you are interested in adding "0x" to everything.   A byte is just a number between 0-255. Whether it looks like 0000 0001 0x01 or 1 really shouldn't make a bit of difference.   An unsigned byte anyways a signed byte is probably -128 to 128. Not 100% on that tho highlydoubtful that these are signed.

Edited by markyrocks
Link to post
Share on other sites
8 hours ago, Chimp said:

is there any regex "clause" to apply the regex starting from the third byte onwards?

Of course there is. Don't forget the 4th parameter of the SRE function :)

#include <array.au3>
$str = "0x03f477ffa50245"
Local $aArray = StringRegExp(StringRegExpReplace ($str, "\w{2}", "0x$0"), "\w{4}", 3, 5)
_ArrayDisplay($aArray)

Edit
It could (should ?) also be done using the correct syntax :huh2:

Local $aArray = StringRegExp(StringRegExpReplace ($str, "[[:xdigit:]]{2}", "0x$0"), ".{4}", 3, 3)

 

Edited by mikell
Link to post
Share on other sites
5 hours ago, markyrocks said:

Chimp after looking through the getkybdstate function documentation I'm not so sure you're getting the correct data out of it.  Or maybe not the data you think you're supposed to be getting.   It really seems like you should be shuffling through the struct byte by byte and it looks like the value should be either a 1, a 0 or 128 depending on the type of key and the state. I'm guess that the index+1 corresponds to the virtual key code. 0-254 being all the keys on an ascii keyboard.

Yes you are right, the byte sequence example I posted above is just a generic sample to better illustrate what kind of data needs to be handled by the regex, not the real one obtained from the command DllStructGetData (_WinAPI_GetKeyboardState (), 1) which instead corresponds to something like this: 0x0001010000000000010000000001000000000100........

5 hours ago, markyrocks said:

Edit.  Scratch that I saw what you posted in a different thread.   I'm not sure why you are interested in adding "0x" to everything.   A byte is just a number between 0-255. Whether it looks like 0000 0001 0x01 or 1 really shouldn't make a bit of difference.   An unsigned byte anyways a signed byte is probably -128 to 128. Not 100% on that tho highlydoubtful that these are signed.

when a key is pressed, the corresponding byte becomes 80 (hex)
when the value 80 (hex) is taken from the string and placed in an element of the array by the regex, this is then considered as a decimal value 80 when it is read from the array; adding 0x instead this is interpreted correctly as a hex value

hex 0x80 -> 10000000 the msb bit is 1
Dec   80 -> 01010000 the msb bit is 0

 

59 minutes ago, mikell said:

Of course there is. Don't forget the 4th parameter of the SRE function :)

:doh: .... I'm speechless....

Thanks a lot @mikell :)

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to post
Share on other sites
3 minutes ago, Chimp said:

which instead corresponds to something like this: 0x0001010000000000010000000001000000000100........

Wait, now I’m confused - isn’t that base 2, not base 16?

 

Code hard, but don’t hard code...

Link to post
Share on other sites

not base 2,  it's base 16 an 512 digits hex of 256 values.

It seems base 2 because all keys of the keyboard are up. when a key is pressed some value becomes 80

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to post
Share on other sites

Maybe I misunderstood something but, as GetKeyboardState states: you should get a byte array, so I don't get what you're trying to do oO

And the interpretation of each byte is quite clear (the index correpond to the mapped virtual key code, depending on your keyboard)

When the function returns, each member of the array pointed to by the lpKeyState parameter contains status data for a virtual key. If the high-order bit is 1, the key is down; otherwise, it is up. If the key is a toggle key, for example CAPS LOCK, then the low-order bit is 1 when the key is toggled and is 0 if the key is untoggled. The low-order bit is meaningless for non-toggle keys. A toggle key is said to be toggled when it is turned on. A toggle key's indicator light (if any) on the keyboard will be on when the key is toggled, and off when the key is untoggled.

 

Edited by 636C65616E
Link to post
Share on other sites

Hex 0x80==128 decimal or 1000 0000  I get confused between msb an lsb. But high order bit ==1

Anything besides zero is an indication that a key is on (toggle) or currently pressed.  So not false.  A raw byte will evaluate to true or false. 

Also 2 hex digits == 1 byte.  Base 16 is 4 bits can be one of 16 numbers 0-15 or 0000, 0001, 0010, 0011,0100,0101,0110, and so on 1111==15

Edited by markyrocks
Link to post
Share on other sites

To complete @markyrocks:

  • Untoggled & Up: 0x00
  • Untoggled & Down: 0x80
  • Toggled & Up : 0x01
  • Toggled & Down : 0x81

Just use something like that:

#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <WinAPISys.au3>

Opt('GUIOnEventMode',1)
$form = GUICreate('TestProgramm', 236, 286, -1, -1, BitOR($WS_CAPTION,$WS_SYSMENU))
$label = GUICtrlCreateLabel('', 8, 8, 220, 270, default, 0)
GUICtrlSetFont($label, 9, default, default, 'consolas')
GUISetOnEvent($GUI_EVENT_CLOSE,'close')
GUISetState(@SW_SHOW)

func close()
    Exit
endfunc

$VK_LBUTTON = 0x01
$VK_CONTROL = 0x11
$VK_NUMLOCK = 0x90

func GetState($arr, $code)
    local $state = DllStructGetData($arr, 1, 1 + $code)
    return StringFormat('%9s', (BitAND($state, 0x01) ? 'TOGGLED' : 'UNTOGGLED')) & ' ' & (BitAND($state,0x80) ? 'DOWN' : 'UP')
endfunc

while 1
    local $arr = _WinAPI_GetKeyboardState()
    local $data = 'LBUTTON: ' & GetState($arr, $VK_LBUTTON) & @CRLF
    $data &= 'CONTROL: ' & GetState($arr, $VK_CONTROL) & @CRLF
    $data &= 'NUMLOCK: ' & GetState($arr, $VK_NUMLOCK)
    if ($data <> GUICtrlRead($label)) then
        GUICtrlSetData($label, $data)
    endif
    sleep(100)
wend

As you can see the toggle bit has just a 'switch' meaning when inspecting a 'normal' key, so no real use (the only use could be that, between two calls, if the toggle is different you're sure the key was pressed and odd amount of times, but it could have been pressed an even amount of time when it remains the same, that's why we usually use GetAsyncKeyState).

Edited by 636C65616E
Link to post
Share on other sites
Posted (edited)

try this little snippet too,
you can see that by pressing a key or even several keys at the same time, some bytes become 80 or 81, while when there are no keys pressed all the bytes can be 0 or 1 (see doc on _WinAPI_GetKeyboardState)

it's like having a screenshot of the keyboard at a given moment, seeing the status of all the keys at once.
what I want to do with the required regex is to transfer the state of all keys in an array (in a single statement without doing a loop) to then be able to analyze that array where each element should contain the state of a corresponding key ( in reality it is not exactly like that, but just to give an idea)

#include <WinAPISys.au3>
#include <GUIConstantsEx.au3>

; _WinAPI_GetKeyboardState fails if there is not a GUI
Local $wd = 450, $hi = 250
Global $hCRT = GUICreate("_WinAPI_GetKeyboardState Output preview", $wd, $hi)
$hShow = GUICtrlCreateEdit("", 0, 0, $wd, $hi, BitOR(0x0004, 0x0800))
GUICtrlSetFont(-1, 12, 400, 0, "courier new")
GUISetState()

Do
    GUICtrlSetData($hShow, DllStructGetData(_WinAPI_GetKeyboardState(), 1))
    Sleep(500)
Until GUIGetMsg() = $GUI_EVENT_CLOSE

 

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...