Sign in to follow this  
Followers 0
Briksins

Characters Unicode Problem

13 posts in this topic

Hello

I'm writing here automation test script for our application and facing difficulties with special Irish characters:

 

"á", "Á", "é", "É", "í", "Í", "ó", "Ó", "ú", "Ú"

 

unfortunately editor doesn't allow me to define them in the code, so I decide to use ascii numbers to define characters i need

Over her I found all ASCII codes i need and in this instruction I found how to use it 

So for example if we talking about character "á" which code is 225 according to the table I'm doing it like that:

$irishCharA = Chr(225)
$path = "some irish sentence with speci" & $irishCharA & "l character"

So i am expecting path value to be

 

some irish sentence with speciál character

 

however i'm not getting it. I had to debug the code and figure out that it is not "á" character, but unrecognised diamond shape with question mark

Then I tried doing:

$path = "some irish sentence with speci{Asc 225}l character"

but it produce the literal string with braces.

Finally I decide to use file. I store all the values as key value pair and save the file with UTF8 encoding, here is content

 

a=á

A=Á
e=é
E=É
i=í
I=Í
o=ó
O=Ó
u=ú
U=Ú

 

Then i start reading it as UTF8 with BOM (128 stand for UTF according to this manual )

Func readConfigFromPath($path)

    $openedFile = FileOpen($path, 128)
    Dim $guiCfg[0][0]
    If $openedFile Then
        Local $line = ""
        $lineCounter = 1
        Do
            $line = FileReadLine($openedFile, $lineCounter)
            If StringInStr($line, "=") Then
                $props = StringSplit($line, "=")
                ReDim $guiCfg[UBound($guiCfg) + 1][2]
                $guiCfg[UBound($guiCfg) - 1][0] = $props[1]
                $guiCfg[UBound($guiCfg) - 1][1] = $props[2]
            EndIf
            $lineCounter = $lineCounter +1
        Until $line = ""
        FileClose($path)
    EndIf

    Return $guiCfg
EndFunc

however that read each single special character as normal character

such that irish "á" become simple "a"

Irish "ú" become simple "u" 

so my key value pair looks like:  key = value:

key "a" and value also "a", however should be "á"

How do i get AutoIt accept non-standard characters?

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Convert your script encoding and console output to UTF8.

For non-ANSI Unicode characters to display properly in Scite console, AFAIK you need to switch to UTF8 in SciTEUser.properties:

add these lines:

code.page=65001
output.code.page=65001
 

Then this works:

ConsoleWrite("ÀÐØÞßƵɄʬЖЗДلنرشحჱჶẶỘ⅝⇈≽⋛⍓⍫♬✠⬔" & @LF) ; display ANSI-fied characters, mostly ?

_ConsoleWrite("ÀÐØÞßƵɄʬЖЗДلنرشحჱჶẶỘ⅝⇈≽⋛⍓⍫♬✠⬔" & @LF) ; works fine

Func _ConsoleWrite($s)
    ConsoleWrite(BinaryToString(StringToBinary($s, 4), 1))
EndFunc

You don't need any special setting for Windows controls:

MsgBox(0, "", "ÀÐØÞßƵɄʬЖЗДلنرشحჱჶẶỘ⅝⇈≽⋛⍓⍫♬✠⬔")
Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Make sure you have the right font. Not all fonts have all characters, and some fonts are used to display other kinds of symbols etc. If the font is wingdings, you will get some kind of up arrow for chr(225). Use for example guictrlsetfont to first select the font you want. To help you pick the font, if you open Notepad and select a font, you can keep the alt key pressed while you press "225" on the numeric keypad (make sure num lock is on), and depending on the font you may get some squiggle or "á".

Hmm, it looks like jchd answered at the same time.

Edited by rodent1

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Thank you guys, that helped. Didn't expect that i need to change encoding of script itself 

Also I had Russian language installed in OS as 2nd language, so character 225 was changed from "á" to russian "б", so I had to move my dev environment to new and clean VM.

Thank you one more time

Edited by Briksins

Share this post


Link to post
Share on other sites

AutoIt native strings are Unicode and with UTF8 script source you don't have to do anything else (except transform strings to UTF8 for SciTE console output as I said). You can have any language setting in Windows independantly: that only affect which ANSI codepage is used, but you're only concerned with Unicode so this doesn't affect your scripts.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

I am having a problem writing/creating a log file that contains non-ANSI characters. I used fileopen with $FO_UTF8_NOBOM parameter and passed it to FileWriteLine as suggested in the manual (https://www.autoitscript.com/autoit3/docs/intro/unicode.htm)

Here's my code:

#include <File.au3>
Local $hFileOpen = FileOpen($CmdLine[1], $FO_UTF8_NOBOM)
Local $sFileContent = FileRead($hFileOpen)
$LogReport = $CmdLine[2]
If _FileCreate($LogReport) = 0 Then
  MsgBox(0, 'Permission Denied', 'Could not create log file')
  Exit
EndIf

Local $hLogFileHandle = FileOpen($LogReport, $FO_UTF8_NOBOM)

; list of valid characters to be used in Regular Expression Pattern
$AccentedChars = "âãäåæçèéêëìíîïàñòóôõöøùúûüýÿāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥħĨĩĪīĬĭ"
$PunctuationMarks = "“”‘’'"";&_:–,\.\?\!"
$Braces = "\(\)\[\]<>"
$MathOperators = "/\-=$"
$OtherValidCharacters = "©"

; Look for non-ANSI character that is not on the list above
$asResult = StringRegExp($sFileContent, '([^0-9A-Za-z\s' & $AccentedChars & $PunctuationMarks & $Braces & $MathOperators & $OtherValidCharacters & '])++', 3)

$ListOfInvalidChars = ""
For $i = 0 to UBound($asResult)-1
  If StringInStr($ListOfInvalidChars, $asResult[$i]) = 0 Then $ListOfInvalidChars = $ListOfInvalidChars & $asResult[$i]
Next
if $ListOfInvalidChars <> "" Then
  ;write the list of characters found into the log file, hopefully as UTF8 so I could actually see what the character was and not just a ? character
  MsgBox(0, "Found", $ErrorMessage)
  FileWriteLine($hLogFileHandle, $ErrorMessage)
Else
  MsgBox(0, "Result", "No error")
Endif

FileClose($hLogFileHandle)

On the test file I am using, I placed a non-ANSI character. I know that it was able to detect that because of the Msgbox before writing it to the log file.

When the script is finished, the log file contains nothing.

I though it would be the Autoit version. I am using 3.3.10.2. Unicode support starts on version 3.2.4.0 so my version should be fine, right?

Any help would be appreciated.

Edited by leojarrabi

Share this post


Link to post
Share on other sites

I'd advise against no-BOM files, unless you're sure that the application(s) reading them are aware of the encoding. In the general case BOM files are more robust to encoding misinterpretation, but not all will accept BOMs (and Unicode is as old as 1991!).


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

You need to set either of the write modes too or you will only open the file in read mode.

 

$FO_UTF8_NOBOM (256) = Use Unicode UTF8 (without BOM) reading and writing mode.

I assume it means that the file is in write mode? Or am I wrong? So how do I open the file in UTF8 mode so I could write to it as well in UTF8 mode?

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

I'd advise against no-BOM files, unless you're sure that the application(s) reading them are aware of the encoding. In the general case BOM files are more robust to encoding misinterpretation, but not all will accept BOMs (and Unicode is as old as 1991!).

 

I will also use FileGetEncoding function to make sure that it will return 256 before it actually process the file. But for now, I just need to list the characters that are not listed in my "valid list".

Edited by leojarrabi

Share this post


Link to post
Share on other sites

I still haven't figure out why i can't write on the file in utf8 mode. When i open the file in append mode, the program could write on the file but the characters are inly showing as question mark.

Is this a bug in autoit?

Share this post


Link to post
Share on other sites

Read the help of FileOpen about its flags.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Read the help of FileOpen about its flags.

 

I see, it can be declared as a combination. I should have declared it as

Local $hLogFileinUTF8 = FileOpen($LogReport, $FO_APPEND + $FO_UTF8_NOBOM)

Thanks!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0