Sign in to follow this  
Followers 0
Jon

Binary String Rewrite Incoming

15 posts in this topic

Why are they changing?

I wasn't happy with the way that binary strings were implemented in AutoIt because:

- They aren't obviously documented. I'm still not sure under what conditions a string is classed as binary and when it's converted between types. This needs to be clearly defined so that they can be used properly rather than by accident.

- They don't exist as a proper variant type - they "share" the string portion and are converted on-the-fly between a hexidecimal string format and binary format. This is different to all other types of variant where a string representation of the variant is stored alongside for quick and easy access.

- They rely on an ANSI string storage method because under many conditions there is no distinction internally between an ANSI string and binary data. This causes me a large problem with the Unicode rewrite.

- Sometimes the presence of a null character is all that differentiates a string from binary data (i.e. if a string contains a null in the middle it is assumed to be a binary type). This method has serious flaws - not all binary data is "null-less".

- Because of the implementation - internal functions that want to work with binary data have to tiptoe around the implementation and spend too much time and effort working out the best way to store the data.

Because of this and alongside the Unicode work I rewrote the Variant class to cope with binary data stored like all the other variant types (ints, floats). The binary data is stored separately to its string representation and free from any Unicode/ANSI issues.

What is affected?

Pretty much anything that relies on binary data, including:

- FileRead / FileWrite

- DLLStruct

- TCPRecv/TCPSend

- BinaryString / String

Depending how you interpreted the way the binary strings worked, you may have some rewriting to do. Simple uses may not be affected.

How will they work?

In the future, it's likely that more functions to easily work with binary data will be added. But for the moment I'm working on existing features. Here's some notes:

Creating

To create binary data use the BinaryString() function (note: this will be renamed to Binary() to reinforce the fact that binary <> string). This will convert any variant type into binary data. The only conversions that make sense are:

1. BinaryString(string)

The most obvious and easy to work with method. Use the Hex format 0xAABBCCDD to define binary data.

To store the bytes AA, 99, 00, D1, F6 as binary data:

BinaryString("0xAA9900D1F6")

2. BinaryString(number)

The number will be stored as it's byte representation in memory (as it currently is). I am not convinced this is the best idea as it's not massively useful. It may be more sensible to truncate the number to 8 bits (ie. numbers 0-255) and store that data as a single byte - discuss.

Other variants will be converted in the same way as 1) - converted to a byte representation of the memory they occupy. Again, I'm not sold on the usefulness of that. Some of the reading methods also truncate this a single byte so again seems odd.

Reading

Here are some examples of reading and converting and comparing:

$bin = BinaryString("0xAABBCCDD")

; Outputs the binary data "0xAABBCCDD" - but note that $bin is a binary datatype, not a string!
MsgBox(0, "String representation", $bin)

; Concat the string "Hello:" with $bin.
; $new will be a STRING datatype
$new = "Hello:" & $bin
; Outputs the string "Hello:0xAABBCCDD"
MsgBox(0, "$new is a STRING", $new)

; Strings without 0x converted to binary are treated as ANSI codes
MsgBox(0, "Hex format 0x present", BinaryString("0x41"))   ; Shows "0x41"
MsgBox(0, "Hex format 0x not present", BinaryString("A"))   ; Shows "0x41"

; Concat the string "0x11223344" with $bin.
; $new will be a STRING datatype
$new = "0x11223344" & $bin
; Outputs the string "0x112233440xAABBCCDD"
MsgBox(0, "$new is a STRING", $new)

; Concat binary data "0x11223344" with $bin.
; $new will be a BINARY datatype - Should we do this or just do a string concat?
$new = BinaryString("0x11223344") & $bin
; Outputs the binary "0x112233440AABBCCDD"
MsgBox(0, "$new is a BINARY", $new)

; Comparison of a binary and string.  Each variant is converted its binary 
; representation and then compared
If BinaryString("0x0A") = "0x0A" Then MsgBox(0, "", "They match!")
If BinaryString("0x0A") = "0A" Then MsgBox(0, "", "No match!")

This is now in the latest beta.

Please test, discuss, bitch, etc and we'll take it from there.

Share this post


Link to post
Share on other sites



Updated:

Added BinaryLen() and BinaryMid().

Renamed BinaryString() to Binary().

Renamed IsBinaryString() to IsBinary().

If you do any work with TCP/FileRead/Write or DllStruct with binary data you should play with this version. Things are going to get broken and you need to be ready for it - especially those who write large libraries that are used by others.

Binary data will be treated completely separately from string data so relying on abusing String... functions is not going to go well.

Share this post


Link to post
Share on other sites

A question:

I dunno if this is possible (I'm guessing that the BinaryString is internally an object/struct) but if it was, it would be terrific if BinaryStrings had pointers, I mean something like BinaryPtr( $BinaryString ). Again, I don't know if this is possble but if it was imagine the lack of need to create all those Structs consisting of "char[ somebignumber ]"... I mean, one would't have to create a buffer, fill it and pass its pointer to an outer function (for example, C runtime library functions like *print*(); recv()). Doesn't all that filling take time? Furthermore, scripters would be less confused if they didn't have to create all those Structs for parameters that only need buffers and their sizes. They would simply do:

DllCall( $hDll, "return_type", "func_name", "ptr", BinaryPtr( $BinaryString ), "int", BinaryLen( $BinaryString ) )

In addition, this would solve the problem of plugins not being able to accept BinaryStrings.

All in all, it would be great, but again - I don't know if it is technically possible to accomplish and whether this would help a lot of people... But if it's easilly doable, why not? :shocked:

Thank you for your time.

Share this post


Link to post
Share on other sites

Wow thank you! That is really great :shocked: yay

Share this post


Link to post
Share on other sites

Is there any chance that something like BinaryPtr() will be implemented?

I dont know how Binary()'s reside in memory but if they are just blobs of data it would be

quite nice to have a way to get the address somehow instead of copying the data into a 'dllstruct'.

... just curios :)


CoProc Multi Process Helper libraryTrashBin.nfshost.com store your AutoIt related files here!AutoIt User Map

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Nice idea, the clarification of Binary v. String'll be helpful, once old libraries are updated and the ripples settle. Thanks for this!

One question - how many bytes can a Binary hold? Is it comparable to String() at 2 billion bytes? I'm thinking here of binary data being FileRead(), for example.

(HELP file note: if so there'll be a conversion overflow error to be trapped for long Binary's, since one byte in a Binary is represented by two characters in a String + the header "0x", or something)

Edited by Fox2

Share this post


Link to post
Share on other sites

I am testing out the binary functions so I can incorporate them into my scripts, but when i Msgbox(0,"binary",$bvar) where $bvar is a binary variable, I was half expecting a string = "0b11001111" or something equivalent to the notation used for hex numbers ie. "0xAA00BB". Was it intentional not to do it this way?

Share this post


Link to post
Share on other sites

To create binary data use the BinaryString() function (note: this will be renamed to Binary() to reinforce the fact that binary <> string). This will convert any variant type into binary data. The only conversions that make sense are:

1. BinaryString(string)

The most obvious and easy to work with method. Use the Hex format 0xAABBCCDD to define binary data.

To store the bytes AA, 99, 00, D1, F6 as binary data:

BinaryString("0xAA9900D1F6")

2. BinaryString(number)

The number will be stored as it's byte representation in memory (as it currently is). I am not convinced this is the best idea as it's not massively useful. It may be more sensible to truncate the number to 8 bits (ie. numbers 0-255) and store that data as a single byte - discuss.

Other variants will be converted in the same way as 1) - converted to a byte representation of the memory they occupy. Again, I'm not sold on the usefulness of that. Some of the reading methods also truncate this a single byte so again seems odd.

This is a belated reply since the changes have already be implemented by now, but 'better late than never', as they say. I came across Jon's post as I was trying to find ideas from other people's experiences with binary variants and I thought I should share the result of many hours of scratching my head over them, partly due to incomplete documentation and partly due to the fledgling state of this new type.

As far as I am concerned, the implementation is great so far, particularly the use of the & concatenation. Maybe the best way to describe binary variants for help purposes (I am not suggesting they should be called anything else, though, it is too late for that) is as raw memory blocks or as blobs in the MySQL sense of the word. Then the whole set of functions and operations defined for binaries should be coherent with this way of looking at them. In particular I would suggest the following:

(1) in reply to the question of how many bytes a number should be stored into raised by Jon above, I would suggest that the now renamed Binary() function should be given two optional additional arguments: Binary(value {, size} {,padchar}) where

- value evaluates to a string or a numeric

- when present the absolute value of $size is the size of the binary to be created in bytes, even if this means losing data in value as a result (programmers should know what they are doing and too much shielding from mistakes can result in making them very hard to debug)

- if size > 0, padding should be to the right

- if size < 0 padding should be to the left

- if size = 0 behaviour is identical as if it was not present

- if size is not present, the binary has the exact size needed to store value (this is because many people would not know whether the "normal" storage allocated to an integer or a float is 16, 32 or 64 bytes - not to mention that this may be different on different machines)

- if pad is present and padding is required, if it is a non empty string, its first character is used as padding byte, if it is a numeric, its value modulo 256 is used as padding byte

- if pad is not present and padding is required, it defaults to 0x00 if value is a numeric and 0x20 if value is a string.

(2) Two other functionalities should be added:

- changing the size of a binary: this would be easily implemented via Binary() by using the case when value is a binary (which is unused so far), with the meaning of size and pad remaining the same

- allowing the definition of an empty binary: this is very useful when you need to concatenate partial results in a loop. At present you can't initialise your result variable $res to an empty binary. So, at the end of the first loop, when it encounters a statement like $xres = $xres & $partial_binary_result, $xres is first initialised to a string, $partial_binary_result is converted to a string and the resulting $xres remains a string - which is a reasonable behaviour since the first element in the concatenation has to be initialised to something and there is no more reason to initialise it to a binary than to a string (in fact, this provides a convenient short-cut for displaying a series of expressions of different types). To avoid this you need to start with a binary initialised to one single byte and to take this byte out at the end, which is a rather illogical thing to do. So why not have empty binaries, just as we have the concept of an empty block or an empty string, ie one of length zero. Binary() could, once again, come in handy to help: if size is nil, binary would be an empty binary.

(3) Block type functionalities: taking our inspiration from Modula (or Delphi) it would then make sense to have functions allowing move and fill block operations:

- BinaryMove(bin_src, ByRef bin_dst[, start_dst]): copies binary bin_src into binary bin_dst from offset start_dst (or offset 1 if not present)

- BinaryFill(bin_dst [, pad] [,start] [, len]): fill bin_dst with byte pad (default 0x00) from start (default 1) over len bytes (default remaining bytes in block)

- Similar BinaryXOR, BinaryAND and BinaryOR would be very useful.

- I am sure other users would find more ideas, but these are the ones I had to implement in au3, thereby loosing precious execution time, whereas they would have been a breeze in C/C++ or whatever you use in the source.

(4) Functionalities connecting binaries with DllStruct: first I should say that I like very much the interface with DLLs you designed, which I find well thought out. However, I find much less convincing the Plugin approach you seem to have now adopted: if only because this will leave us with the same problem, namely that the vast majority of AutoIt users will still need to depend on C/C++ programmers to provide them with the right DLLs and, above everything else, the right documentation to use these DLLs, whether the usual ones or the AU3 dedicated Plugins. Anyway, this is another story for another thread.

The only issue I have with the present state of affairs regarding interfacing with DLL is the lack of a consistent (and documented) interface between DLLStruct and binaries. One should be able to transfer from one representation to another seamlessly, regardless of the fact that the internal represention of these objects may be very different: this is what 3rd generation language are supposed to be about, after all. In particular, if one considers a DLLStruct as the AutoIt version of a Pascal record (which was what I did in order to work with them), then they can also be considered as structured binaries which should be convertible into binaries as a whole, or component by component. Conversely, it should be possible to assign a binary to a 'ubyte binbuffer[128]' component of a DLLStruct or, in fact any other type of components except char ones (due to the particular treatment of 0x00 in string operations). If these facilities were available, it would provide a very efficient instrument to interface with DLL, because it would free us of many byte-level operations which are necessary at present.

I hope functionalities along these kind of lines will eventually appear in some future release (the earlier the better, though). Unfortunately, I cannot offer much help in terms of programming, as I have no working knowledge of C/C++, being more into script-driven system administration and web-oriented development. But I will say that having used AutoIt since the first stable version 3 came out, I find it meets all my needs. So many thanks to the development team and if there are things that you think people like me could do to help, let me know and I'll tell you what is possible.

PC.

Share this post


Link to post
Share on other sites

ok, I know this was almost year ago and probably wrong topic ( :P ) but...

BinaryToString is not returning right if binary input contains '0x00' chars. IsString of BinaryToString will return 1 but what comes out of conversation will mess with... something.

Example:

$e = "0x4175746F4974"
$f = BinaryToString($e)

MsgBox(0, "", $f & "  -OK")

; Let's add '0x00' character (this is, or should be empty string - Chr(0))
$e = "0x4175746F004974"
$f = BinaryToString($e)

MsgBox(0, "", $f & "  -OK")

; This helps
$f = String($f)

MsgBox(0, "", $f & "  -OK")

My question is why??? is there no " -OK" part in second message box and why is '0x00' considered as the end of the binary data that is to convert (if that is intentionally, could there be an option to avoid that behaviour)?

...and how do I get that fancy smancy autoit code tags in posts? pressing hard on that "A" but nothing happens lol


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

MsgBox() and ConsoleWrite() will strip everything after a null character. Use StringReplace() to strip Chr(0) if you need to "see" the result.

Also, AutoIt codebox is created like this [ autoit ] [ /autoit ] (without the spaces).

Edited by weaponx

Share this post


Link to post
Share on other sites

MsgBox() and ConsoleWrite() will strip everything after a null character. Use StringReplace() to strip Chr(0) if you need to "see" the result.

Also, AutoIt codebox is created like this [ autoit ] [ /autoit ] (without the spaces).

StringReplace() would not be right because this is about binary data and not strings. For example StringReplace("0x4175746F004974", "00", "") would return right but StringReplace("0x4175746F49747009", "00", "") would not.

BinaryReplace() -non existing feature is required.

Another possibility is to check every byte (one by one) for existance of '0x00' but that requires time consuming loop. I have tried this, and even created some kind of accelerator by spliting binary input into smaller portions and then check for '0x00' (that way is much, much faster), but it still takes some time, and... time is money, right? :P

;thanks for this tip

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

StringReplace() would not be right because this is about binary data and not strings. For example StringReplace("0x4175746F004974", "00", "") would return right but StringReplace("0x4175746F49747009", "00", "") would not.

BinaryReplace() -non existing feature is required.

Another possibility is to check every byte (one by one) for existance of '0x00' but that requires time consuming loop. I have tried this, and even created some kind of accelerator by spliting binary input into smaller portions and then check for '0x00' (that way is much, much faster), but it still takes some time, and... time is money, right? :P

;thanks for this tip
I meant use StringReplace($f, Chr(0), ""), after the string is already converted from binary.

Share this post


Link to post
Share on other sites

I meant use StringReplace($f, Chr(0), ""), after the string is already converted from binary.

you are my hero

thanks again


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0