Jump to content

Recommended Posts

Posted (edited)

I'm working on a IRC bot script. It works great, and has over 3000 lines of code so I won't post the entire script yet, but maybe this takes a simple answer. I've searched all over and can't seem to find the exact documentation I need. 

The Issue:

NOTE: I'm almost positive it is NOT other user's settings and fonts being the cause, read below.

For some people, when the IRC bot sends a message to the chat, it appears like this:

BadUTF8.png.33df3227c370186f1f9754de429f4d91.png

When it should appear like this:

OKUTF8.png.41082428456e3a65b60ea0621cf1f013.png

The group of people helping me to test this bot have mixed results. Some users, it looks normal on their screens, and others it looks like square/?s <?>. On my screen it appears normally, but if one of the users who see's the <?> characters copies and pastes the message into the chat, it also appears with the <?> on my screen (pictured below) Also if I copy the string "-.,¸¸.-·°'`'°·-.,¸¸.-·°'`'°· \_O<   QUACK" directly from the script and paste it in chat, it appears normally to everyone, leading me to believe the issue is somewhere in autoit.

AnotherUTF8.png.66acdcd3d25e2a79efaa24669702b28c.png

Here in the below spoiler is the main function where this message originates in the script:

  Reveal hidden contents

Here in the below spoiler, also is the main function that sends the message to the IRC server.

  Reveal hidden contents

From what I have found in research, you can change the UTF-8 settings with wrapper directives, but cannot find the exact way to change it... Help file says default is UTF-8 without BOM   but can be changed to UTF-8 with BOM and other options. 

My main question is, what is the wrapper directive, or how do I change this from UTF-8 without BOM to UTF-8 with BOM? Or is my issue something completely different? I'm stumped...

Edited by coderusa
  • coderusa changed the title to UTF-8 issue (Or something else?)
Posted

I had a similar issue when the text in a MsgBox containing german Umlauts (öäü etc.) was inccorrectly displayed.
I solved the problem with setting the Encoding in SciTE to UTF-8 with BOM.
In SciTE use: File > Encoding > UTF-8 with BOM

My UDFs and Tutorials:

  Reveal hidden contents

 

Posted (edited)
  On 11/28/2023 at 10:53 PM, water said:

I had a similar issue when the text in a MsgBox containing german Umlauts (öäü etc.) was inccorrectly displayed.
I solved the problem with setting the Encoding in SciTE to UTF-8 with BOM.
In SciTE use: File > Encoding > UTF-8 with BOM

Expand  

Thanks. Just tried that, and testers report no change 😕

Edited by coderusa
Posted

Your screenshot looks like an output to the DOS console.
I needed to use the following function to write the Umlauts to the Console:

; Taken from: https://www.autoitscript.com/forum/topic/201398-german-umlauts-and-console-programs/
Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

My UDFs and Tutorials:

  Reveal hidden contents

 

Posted
  On 11/28/2023 at 11:29 PM, water said:

Your screenshot looks like an output to the DOS console.
I needed to use the following function to write the Umlauts to the Console:

; Taken from: https://www.autoitscript.com/forum/topic/201398-german-umlauts-and-console-programs/
Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

Expand  

I will try this. Its a IRC client, I am using mIRC myself, the testers are using mIRC, Adiirc, HexChat and I think IRSSI. 

Posted

Indeed I have, I couldn't send emoji with my scripts. I suspect that TCPSend is screwing with the encoding. You will have to manually find a way to convert your messages string to properly formatted UTF-8 binary and then send that directly with TCPSend.

By the way, don't bother with BOM, it is fairly useless and even undesired, so it won't help you with anything but it can for sure screw up your text potentially if programs are not properly made to handle it.

AutoIt's own IniRead function will fail if the file has a BOM.

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Posted

I totally forgot about those functions, very handy indeed!

But why doesn't TCPSend automatically handle that though? Is it just hardcoded to convert everything to ANSI?

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Posted (edited)
  On 11/29/2023 at 12:33 PM, Nine said:

As per help file of TCPSend :

 

Remarks

If Unicode strings need to be transmitted they must be encoded/decoded with StringToBinary()/BinaryToString().

Expand  

As @TheDcoder said, I completely forgot about those functions! I'll implement this and see what comes of it. Thanks!

  On 11/29/2023 at 2:32 PM, TheDcoder said:

I totally forgot about those functions, very handy indeed!

But why doesn't TCPSend automatically handle that though? Is it just hardcoded to convert everything to ANSI?

Expand  

I wonder this too. Python is similar and needs UTF-8 encoding (at least for IRC stuff send thru sockets). I think (but could be wrong) that its because some things need to be ANSI to work properly. 

Edited by coderusa
Posted (edited)
  On 11/28/2023 at 11:29 PM, water said:

Your screenshot looks like an output to the DOS console.
I needed to use the following function to write the Umlauts to the Console:

; Taken from: https://www.autoitscript.com/forum/topic/201398-german-umlauts-and-console-programs/
Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

Expand  

 

@water with the encoding for CMD batch files I have to use time by time TXT files created by an autoit script.

With this sample script I think it's quite interesting to compare the results inside a CMD box, especially the different file sizes. The smallest size and correctly displayed characters is is fact for

$FO_ANSI = 512 with your function:

 

$text="Umlauts, e and szlig: äöüÄÖÜéèß"


$hDisplay=FileOpen("C:\temp\display.cmd",2+8)
FileWriteLine($hDisplay,"@echo off")


Dim $aEnc[9]=[0,32,64,128,256,512,1024,2048,16384] ; encoding, see help file for fileopen()
#cs
    $FO_UNICODE or $FO_UTF16_LE (32) = Use Unicode UTF16 Little Endian reading and writing mode.
    $FO_UTF16_BE (64) = Use Unicode UTF16 Big Endian reading and writing mode.
    $FO_UTF8 (128) = Use Unicode UTF8 (with BOM) reading and writing mode.
    $FO_UTF8_NOBOM (256) = Use Unicode UTF8 (without BOM) reading and writing mode.
    $FO_ANSI (512) = Use ANSI reading and writing mode.
    $FO_UTF16_LE_NOBOM (1024) = Use Unicode UTF16 Little Endian (without BOM) reading and writing mode.
    $FO_UTF16_BE_NOBOM (2048) = Use Unicode UTF16 Big Endian (without BOM) reading and writing mode.
    $FO_FULLFILE_DETECT (16384) = When opening for reading and no BOM is present, use the entire file to determine if it is UTF8 or UTF16. If this is not used then only the initial part of the file (up to 64KB) is checked for performance reasons.
#ce

for $i in $aEnc
$Out="C:\temp\test-" & $i & ".txt"
$hOut=FileOpen($Out,2+8+$i)
FileWriteLine($hOut,"")
FileWriteLine($hOut,"Encoding Number = " & $i)
FileWriteLine($hOut,"Text directly : " & $text)
FileWriteLine($hOut,"Text converted: " & Ansi2Oem($text))
FileClose($hOut)
FileWriteLine($hDisplay,"type " & $Out)
FileWriteLine($hDisplay,"dir " & $Out)
FileWriteLine($hDisplay,"echo -----------------")
Next
FileWriteLine($hDisplay,"@echo on")
FileClose($hDisplay)

ConsoleWrite("To display results and file sizes open a CMD box and enter this command:" & @CRLF)
ConsoleWrite( & @CRLF)




Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

image.thumb.png.7285e59b7fe2bee81fbab9f14954297c.png

Edited by rudi

Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...