Sign in to follow this  
Followers 0
Xwolf

String :Encounterring a new question

14 posts in this topic

#1 ·  Posted (edited)

Question:How do i know the bytes of a string?

For example:

$str = "abc好abc"

NOTE:

"好" is a chinese word which have 2 bytes.

how can i get the bytes of variable $str

Thanks for everyone. :)

Edited by Xwolf

Share this post


Link to post
Share on other sites



1 character = 1 byte.

so you just need to count the characters in the string.

or do something like this:

$Bin = StringToBinary("abc好abc")
$Len = BinaryLen($Bin) ;will be an integer value of 7 as there are 7 characters
Sorry,but in fact the string "好" is a word which have 2 bytes. :)

Share this post


Link to post
Share on other sites

write it in a file and...

$o=FileOpen("test.txt",16)
$f=FileRead($o)
MsgBox(0 , "" , BinaryLen($f)-2);-2 for the @cr and @lf
FileClose($o)

btw that string is unicode andd all characters are 2 bytes so it's actually 14 bytes


Only two things are infinite, the universe and human stupidity, and i'm not sure about the former -Alber EinsteinPractice makes perfect! but nobody's perfect so why practice at all?http://forum.ambrozie.ro

Share this post


Link to post
Share on other sites

write it in a file and...

$o=FileOpen("test.txt",16)
$f=FileRead($o)
MsgBox(0 , "" , BinaryLen($f)-2);-2 for the @cr and @lf
FileClose($o)

btw that string is unicode andd all characters are 2 bytes so it's actually 14 bytes

No, each character is 1 byte. And if the topicstarter uses chinese chars thats his case. We can't determine if a char is chinese, korean, from outerspace or w/e... the standard is ; 1 char == 1 byte.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

No, each character is 1 byte. And if the topicstarter uses chinese chars thats his case. We can't determine if a char is chinese, korean, from outerspace or w/e... the standard is ; 1 char == 1 byte.

I think not. If you know the string is ASCII then your statement is true. But unicode characters are not one byte, and according to the help file, AutoIt supports Unicode.

Bob

I have no idea what code set outerspace uses.

Edited by bobchernow

--------------------bobchernow, Bob ChernowWhat a long strange trip it's beenUDFs: [post="635594"]Multiple Monitor Screen Resolution Change[/post]

Share this post


Link to post
Share on other sites

No, each character is 1 byte. And if the topicstarter uses chinese chars thats his case. We can't determine if a char is chinese, korean, from outerspace or w/e... the standard is ; 1 char == 1 byte.

and your point is? :|

try it yourself copy his string and write it in unicode and then don't even run my example use properties in windows and see what the file size is... just subtract 2 (thats the @crlf)


Only two things are infinite, the universe and human stupidity, and i'm not sure about the former -Alber EinsteinPractice makes perfect! but nobody's perfect so why practice at all?http://forum.ambrozie.ro

Share this post


Link to post
Share on other sites

write it in a file and...

$o=FileOpen("test.txt",16)
$f=FileRead($o)
MsgBox(0 , "" , BinaryLen($f)-2);-2 for the @cr and @lf
FileClose($o)

btw that string is unicode andd all characters are 2 bytes so it's actually 14 bytes

3Q :)

Thanks for everyone above. :)

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

a b c 好 a b c @CR @LF

1 1 1 2 1 1 1 1 1

1+1+1+2+1+1+1+1+1=10

Total is 10.

Am i right? :)

Edited by Xwolf

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

a b c 好 a b c @CR @LF

1 1 1 2 1 1 1 1 1

1+1+1+2+1+1+1+1+1=10

Total is 10.

Am i right? :)

what encoding did you use ?

ansi won't recognize the char

unicode used 2 bytes for a-z and 好

and utf-8 actually uses 3 bytes for 好 and 1 byte for a-z

Edited by TheMadman

Only two things are infinite, the universe and human stupidity, and i'm not sure about the former -Alber EinsteinPractice makes perfect! but nobody's perfect so why practice at all?http://forum.ambrozie.ro

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

You could do sth like this:

;  flag = 1 (default), binary data will be ANSI
;  flag = 2, binary data will be UTF16 Little Endian
;  flag = 3, binary data will be UTF16 Big Endian
;  flag = 4, binary data will be UTF8
$Encoding = 4
$string = "abc好abc"
$Bytes = BinaryLen(StringToBinary($string,$Encoding))
MsgBox(0,"Number of bytes", $Bytes)

This returns the number of bytes depending on the encoding :)

Edited by ProgAndy

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Share this post


Link to post
Share on other sites

You could do sth like this:

;  flag = 1 (default), binary data will be ANSI
;  flag = 2, binary data will be UTF16 Little Endian
;  flag = 3, binary data will be UTF16 Big Endian
;  flag = 4, binary data will be UTF8
$Encoding = 4
$string = "abc好abc"
$Bytes = BinaryLen(StringToBinary($string,$Encoding))
MsgBox(0,"Number of bytes", $Bytes)

This returns the number of bytes depending on the encoding :)

:) Great!

A new way to deal with this problem.

Cheer!

Peter

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

what encoding did you use ?

ansi won't recognize the char

unicode used 2 bytes for a-z and 好

and utf-8 actually uses 3 bytes for 好 and 1 byte for a-z

ASCII or UFT8 ...?

In fact,i didn't know what i had used.

I just put "abc好abc" into the file test.txt .

And i found that the size of file was 8 bytes (without @CRLF).

IF i put the "enter"(with @CRLF) ,then the size of file is 10 bytes.

Edited by Xwolf

Share this post


Link to post
Share on other sites

$string = "abc好abc"
$z1 = StringToBinary($string,4)
ConsoleWrite($z1 & @CRLF)
$z2 = BinaryLen($z1)
ConsoleWrite($z2 & @CRLF)

;Result1
;0x616263E5A5BD616263
;9oÝ÷ Ù«­¢+ØÀÌØíé}¥±ô¥±=Á¸ ÅÕ½ÐìĹÑáÐÅÕ½Ðì°Äؤ((ÀÌØíèÄô¥±I ÀÌØíé}¥±¤)
½¹Í½±]É¥Ñ ÀÌØíèĵÀì
I1¤(ÀÌØíèÈô ¥¹Éå1¸ ÀÌØíèĤ)
½¹Í½±]É¥Ñ ÀÌØíèȵÀì
I1¤()¥±
±½Í ÀÌØíé}¥±¤(((íIÍÕ±ÐÈ(íÈÈäÀäí

Note:

The result2 was something strange.

Looking the result2 "abc好abc8"

The word "8" was behind the word "c".

In my opinion ,when i use code "ConsoleWrite($z1 & @CRLF)".

It should in this way

abc好abc

9

Help,help ...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0