Jump to content

UTF-8 issue (Or something else?)


coderusa
 Share

Recommended Posts

I'm working on a IRC bot script. It works great, and has over 3000 lines of code so I won't post the entire script yet, but maybe this takes a simple answer. I've searched all over and can't seem to find the exact documentation I need. 

The Issue:

NOTE: I'm almost positive it is NOT other user's settings and fonts being the cause, read below.

For some people, when the IRC bot sends a message to the chat, it appears like this:

BadUTF8.png.33df3227c370186f1f9754de429f4d91.png

When it should appear like this:

OKUTF8.png.41082428456e3a65b60ea0621cf1f013.png

The group of people helping me to test this bot have mixed results. Some users, it looks normal on their screens, and others it looks like square/?s <?>. On my screen it appears normally, but if one of the users who see's the <?> characters copies and pastes the message into the chat, it also appears with the <?> on my screen (pictured below) Also if I copy the string "-.,¸¸.-·°'`'°·-.,¸¸.-·°'`'°· \_O<   QUACK" directly from the script and paste it in chat, it appears normally to everyone, leading me to believe the issue is somewhere in autoit.

AnotherUTF8.png.66acdcd3d25e2a79efaa24669702b28c.png

Here in the below spoiler is the main function where this message originates in the script:

Spoiler
; #FUNCTION# ======================================================================================================================
; Name............: SpawnDuck
; Description.....: Spawns a duck if $MaxDucks is below threshold
; Syntax..........: SpawnDuck($socket, $channel)
; Parameters......: $socket - main socket identifier
;                   $channel - channel to spawn duck in
; Return Values...: Returns 1 - Successful spawn
;                   Returns -1 - Maximum number of ducks already spawned
; Author..........: Neo_ (aka coderusa)
; Modified........:
; =================================================================================================================================
Func SpawnDuck($socket, $channel)
    For $d = 1 To $MaxDucks Step 1
        If Eval("Duck" & $d) = "" Then
            If IsDuck() = 0 Then
                Global $ScareFactor = 0
            EndIf
            If $IsGolden = False Then
                $GoldDuck = $GoldenDuck / 100
                If Random(0,1) < $GoldDuck Then
                    Global $IsGolden = "Duck" & $d
                    Assign("Duck" & $d, TimerInit(), 2)
                    IRC_Send_PrivMsg($socket, $channel, "-.,¸¸.-·°'`'°·-.,¸¸.-·°'`'°· \_O<   QUACK   * GOLDEN DUCK *")
                    Return 1
                EndIf
            EndIf
            Assign("Duck" & $d, TimerInit(), 2)
            IRC_Send_PrivMsg($socket, $channel, "-.,¸¸.-·°'`'°·-.,¸¸.-·°'`'°· \_O<   QUACK")
            Return 1
        EndIf
    Next
    Return -1
EndFunc ;===> SpawnDuck

 

Here in the below spoiler, also is the main function that sends the message to the IRC server.

Spoiler
; #FUNCTION# ====================================================================================================================
; Name............: IRC_Send_PrivMsg
; Description.....: Send PRIVMSG
; Syntax..........: IRC_Send_PrivMsg($socket, $target, $msg)
; Parameters......: $socket - main socket identifier
;                   $target - #Channel or Username
;                   $message - PRIVMSG text
; Return Values...: Failure: -1
;                   Success: 1
; Author..........: Neo_ (aka coderusa)
; Modified........:
; ================================================================================================================================
Func IRC_Send_PrivMsg($socket, $target, $msg)
    TCPSend($socket, "PRIVMSG " & $target & " :" & $msg & @CRLF)
    If @error Then
        Cons("Server connection lost.")
        Return -1
    EndIf
    Return 1
EndFunc ;===> IRC_Send_PrivMsg

 

From what I have found in research, you can change the UTF-8 settings with wrapper directives, but cannot find the exact way to change it... Help file says default is UTF-8 without BOM   but can be changed to UTF-8 with BOM and other options. 

My main question is, what is the wrapper directive, or how do I change this from UTF-8 without BOM to UTF-8 with BOM? Or is my issue something completely different? I'm stumped...

Edited by coderusa
Link to comment
Share on other sites

  • coderusa changed the title to UTF-8 issue (Or something else?)

I had a similar issue when the text in a MsgBox containing german Umlauts (öäü etc.) was inccorrectly displayed.
I solved the problem with setting the Encoding in SciTE to UTF-8 with BOM.
In SciTE use: File > Encoding > UTF-8 with BOM

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

23 minutes ago, water said:

I had a similar issue when the text in a MsgBox containing german Umlauts (öäü etc.) was inccorrectly displayed.
I solved the problem with setting the Encoding in SciTE to UTF-8 with BOM.
In SciTE use: File > Encoding > UTF-8 with BOM

Thanks. Just tried that, and testers report no change 😕

Edited by coderusa
Link to comment
Share on other sites

Your screenshot looks like an output to the DOS console.
I needed to use the following function to write the Umlauts to the Console:

; Taken from: https://www.autoitscript.com/forum/topic/201398-german-umlauts-and-console-programs/
Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

4 minutes ago, water said:

Your screenshot looks like an output to the DOS console.
I needed to use the following function to write the Umlauts to the Console:

; Taken from: https://www.autoitscript.com/forum/topic/201398-german-umlauts-and-console-programs/
Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

I will try this. Its a IRC client, I am using mIRC myself, the testers are using mIRC, Adiirc, HexChat and I think IRSSI. 

Link to comment
Share on other sites

Indeed I have, I couldn't send emoji with my scripts. I suspect that TCPSend is screwing with the encoding. You will have to manually find a way to convert your messages string to properly formatted UTF-8 binary and then send that directly with TCPSend.

By the way, don't bother with BOM, it is fairly useless and even undesired, so it won't help you with anything but it can for sure screw up your text potentially if programs are not properly made to handle it.

AutoIt's own IniRead function will fail if the file has a BOM.

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

I totally forgot about those functions, very handy indeed!

But why doesn't TCPSend automatically handle that though? Is it just hardcoded to convert everything to ANSI?

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

On 11/29/2023 at 5:33 AM, Nine said:

As per help file of TCPSend :

 

Remarks

If Unicode strings need to be transmitted they must be encoded/decoded with StringToBinary()/BinaryToString().

As @TheDcoder said, I completely forgot about those functions! I'll implement this and see what comes of it. Thanks!

On 11/29/2023 at 7:32 AM, TheDcoder said:

I totally forgot about those functions, very handy indeed!

But why doesn't TCPSend automatically handle that though? Is it just hardcoded to convert everything to ANSI?

I wonder this too. Python is similar and needs UTF-8 encoding (at least for IRC stuff send thru sockets). I think (but could be wrong) that its because some things need to be ANSI to work properly. 

Edited by coderusa
Link to comment
Share on other sites

On 11/29/2023 at 12:29 AM, water said:

Your screenshot looks like an output to the DOS console.
I needed to use the following function to write the Umlauts to the Console:

; Taken from: https://www.autoitscript.com/forum/topic/201398-german-umlauts-and-console-programs/
Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

 

@water with the encoding for CMD batch files I have to use time by time TXT files created by an autoit script.

With this sample script I think it's quite interesting to compare the results inside a CMD box, especially the different file sizes. The smallest size and correctly displayed characters is is fact for

$FO_ANSI = 512 with your function:

 

$text="Umlauts, e and szlig: äöüÄÖÜéèß"


$hDisplay=FileOpen("C:\temp\display.cmd",2+8)
FileWriteLine($hDisplay,"@echo off")


Dim $aEnc[9]=[0,32,64,128,256,512,1024,2048,16384] ; encoding, see help file for fileopen()
#cs
    $FO_UNICODE or $FO_UTF16_LE (32) = Use Unicode UTF16 Little Endian reading and writing mode.
    $FO_UTF16_BE (64) = Use Unicode UTF16 Big Endian reading and writing mode.
    $FO_UTF8 (128) = Use Unicode UTF8 (with BOM) reading and writing mode.
    $FO_UTF8_NOBOM (256) = Use Unicode UTF8 (without BOM) reading and writing mode.
    $FO_ANSI (512) = Use ANSI reading and writing mode.
    $FO_UTF16_LE_NOBOM (1024) = Use Unicode UTF16 Little Endian (without BOM) reading and writing mode.
    $FO_UTF16_BE_NOBOM (2048) = Use Unicode UTF16 Big Endian (without BOM) reading and writing mode.
    $FO_FULLFILE_DETECT (16384) = When opening for reading and no BOM is present, use the entire file to determine if it is UTF8 or UTF16. If this is not used then only the initial part of the file (up to 64KB) is checked for performance reasons.
#ce

for $i in $aEnc
$Out="C:\temp\test-" & $i & ".txt"
$hOut=FileOpen($Out,2+8+$i)
FileWriteLine($hOut,"")
FileWriteLine($hOut,"Encoding Number = " & $i)
FileWriteLine($hOut,"Text directly : " & $text)
FileWriteLine($hOut,"Text converted: " & Ansi2Oem($text))
FileClose($hOut)
FileWriteLine($hDisplay,"type " & $Out)
FileWriteLine($hDisplay,"dir " & $Out)
FileWriteLine($hDisplay,"echo -----------------")
Next
FileWriteLine($hDisplay,"@echo on")
FileClose($hDisplay)

ConsoleWrite("To display results and file sizes open a CMD box and enter this command:" & @CRLF)
ConsoleWrite( & @CRLF)




Func Ansi2Oem($text)
    $text = DllCall('user32.dll', 'Int', 'CharToOem', 'str', $text, 'str', '')
    Return $text[2]
EndFunc   ;==>Ansi2Oem

 

image.thumb.png.7285e59b7fe2bee81fbab9f14954297c.png

Edited by rudi

Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...