Sign in to follow this  
Followers 0
FreeBeing

UTF8 conversion problem

11 posts in this topic

Hi,

I have a problem when I read my external IP from a simple external PHP script (with InetRead), result is provided as UTF8 from my web server.

When I convert it into decimal, few characters are added at the beginning, meaning it's not the right charset.

Into Hex, there is that header (UTF8) : 0xEFBBBF

So, I use BinaryToString($external_ip, 4) to convert from UTF8 into decimal.

With that, I am near of my goal : I obtain a full IP address, but it stays just a little "?" char before.

In Hex, I see that strange header : 0x3F

In theory, UTF8 header must disappear with the first BinaryToString conversion, no ?

For now, I can't use my script because conversion seems to be partial, or maybe there is another solution ?

Full example :

Binary resultat from the PHP script :

0xEFBBBF38372E3233312E32372E323335

If converted with BinaryToString($external_ip, 4) and converted again into Binary :

0x3F38372E3233312E32372E323335

Decimal obtained : ?87.231.27.235

Expected result :

0x38372E3233312E32372E323335

Decimal expected : 87.231.27.235

 

Thank you ;)

Share this post


Link to post
Share on other sites



Why not just use StringReplace to replace the '?' with '' so the string will read how you'd like?


Snips & Scripts


My Snips: graphCPUTemp ~ getENVvars
My Scripts: Short-Order Encrypter - message and file encryption V1.6.1 ~ AuPad - Notepad written entirely in AutoIt V1.9.4

Feel free to use any of my code for your own use.                                                                                                                                                           Forum FAQ

 

Share this post


Link to post
Share on other sites

Is it possible that the ox3F is the BOM of the file you're reading from?


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

>a post with a similar issue was on this forum few days ago,

don't know if the same issue has the same cause here, anyway you can try.
in short it seems the problem was due to a not well recognized "user agent string" that cause a wrong encoded string be returned to the InetRead() function from the server.
You could try to send this command just before the InetRead()

HttpSetUserAgent('Mozilla/5.0')

look >here for a better explanation

edit:

Changed InetGet with InetRead in text

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Ok, so if I understand, I just have to skip or delete the 3F BOM, without any bug ?

Edit : I tried to change User Agent to Firefox 33, but it don't change anything.

UA : Mozilla/5.0 (Windows NT 6.2; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0

Edited by FreeBeing

Share this post


Link to post
Share on other sites

Perfect.

And for my own knowledge, it's an AutoIt bug, or a missing feature ?

Thank you

Share this post


Link to post
Share on other sites

None of the above. These 3 bytes are the UTF8 "BOM", an initial sequence of bytes which represent a specific (invalid per se) Unicode character and the endianness. It indicates which encoding is being used. BOMs exist for the other UTF encodings.

A BOM is often used in text files albeit not mandatory. AutoIt has mecanism and options to handle a BOM when opening a file see help for FileOpen).

On the contrary, data passed "over the air" is most often using an implicit or fixed Unicode encoding, hence doesn't include a BOM. That's why you just have to skip, remove and ignore an initial BOM.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Thank you for your explainations, very accurate.

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

... hey, jchd is right! ..... :)

$hTemp = FileOpen(".\Temp.txt", 18) ; overwrite & binary mode
FileWrite($hTemp, "0xEFBBBF38372E3233312E32372E323335")
FileClose($hTemp)
$hTemp = FileOpen(".\Temp.txt") ; read mode
MsgBox(0, "", FileRead($hTemp))
FileClose($hTemp)
FileDelete(".\Temp.txt")

edit:

maybe you could use InetGet() instead of InetRead() then

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0