Sign in to follow this  
Followers 0
dleigh

French Accented Characters Getting Changed - how to prevent?

5 posts in this topic

I've been looking through the help and the forums but I can't really find a solution for my problem. :D

I'm parsing through files that contain French song lyrics. The problem is that my accented characters like:

é, è, ç, ê, etc.

are getting changed in the course of my script execution.

For instance:

é becomes é

À becomes Ã

I'm in France, but being American, I run a box with all the settings in American English. While I have some familiarity with codepage activity in web pages and the "setlocale" command in PHP, I'm really at a loss to figure out how to keep my accents from getting hammered as soon as the text lives in an AutoIt variable.

I've tried the OEM2ANSI function that was posted in the forums but that doesn't fix the problem it just changes the errors! :P

So, if any of you French people on the forum have an answer, SVP donnez le moi! (please give it to me)...or at least some hints as to where to keep looking.

Thanks (and MERCI!) to any and all help.


Share this post


Link to post
Share on other sites



Hi,

in this case it's better to work with the Func Chr(). You'll find all characters in the Appendicx of help file e.g. "é" is Chr(233).

You can also use Asc("é") and you'll get the resultat "233" I hope this was your problem. Salut Johannes


Johannes LorenzBensheim, Germanyjlorenz1@web.de[post="12602"]Highlightning AutoIt Syntax in Notepad++ - Just copy in your Profile/application data/notepad++[/post]

Share this post


Link to post
Share on other sites

It isn't exactly what I was looking for but it gave me additional ideas of what to look at. :P In looking at it more fully, I see from the appendix that what I'm expecting to look like:

À

is listed as ASCII 192. Somewhere in the reading of the file and simple use of string functions to extract sub-strings from the file, the Asc(192) gets changed to:

Ã

which is Asc(195)Asc(128). Here's my code that's generating this error. Does anyone know of specific functions (file reading and/or string manipulation, e.g. FileRead, StringMid, etc.) which have inherent struggles with extended ASCII characters (for that is what this problem is, in fact about...the "mishandling" of extended ASCII characters as found in the French language)?

CODE
; Includes & declares

#include <file.au3>

#include <array.au3>

#include <GUIConstants.au3>

Dim $file, $FileSize, $FileTime, $FirstLine, $search, $SongFile, $SongFileHandle, $SongLyrics, $SongTitle, $VerseMark

; Set the OpenSong song directory

FileChangeDir("C:\Documents and Settings\David\My Documents\OpenSong\Songs")

; Prime the file read pump

$search = FileFindFirstFile("*.*")

; Check if the search was successful

If $search = -1 Then

MsgBox(0, "Error", "No files/directories matched the search pattern")

Exit

EndIf

While 1

$file = FileFindNextFile($search)

If @error Then ExitLoop

If $file <> "_cache" Then

$FileSize = FileGetSize($file)

$FileTime = FileGetTime($file,0,1)

ParseSongFile()

MsgBox(0,"","Song File: " & $file & @crlf & _

" Size: " & $FileSize & @crlf & _

" Date: " & $FileTime & @CRLF & _

" Title: " & $SongTitle & @CRLF & _

" 1st line: " & $FirstLine & @CRLF & _

" lyrics: " & $SongLyrics)

endif

WEnd

; Close the search handle

FileClose($search)

exit

; Read the input file into an array

Func ParseSongFile()

$SongFileHandle = FileOpen(@WorkingDir & "\" & $file,0)

$SongFile = FileRead($SongFileHandle)

$SongTitle = StringMid($SongFile,StringInStr($SongFile,"<title>")+7,StringInStr($SongFile,"</title>")-StringInStr($SongFile,"<title>")-7)

$SongLyrics = StringMid($SongFile,StringInStr($SongFile,"<lyrics>")+8,StringInStr($SongFile,"</lyrics>")-StringInStr($SongFile,"<lyrics>")-8)

for $i = 1 to 15

$VerseMark = "[V" & $i & "]"

$SongLyrics = StringReplace($SongLyrics,$VerseMark & @LF,"")

Next

$SongLyrics = StringReplace($SongLyrics,@LF & @LF,@LF)

$FirstLine = StringLeft($SongLyrics,StringInStr($SongLyrics,@lf)-1)

EndFunc


Share this post


Link to post
Share on other sites

Just some additional stuff I've discovered since my last post (still seeking the final solution though! ;c)):

I saw that my character encoding issues were there from the moment of FileRead. I searched some more, in the AutoIt help and didn't find anything with respect to ways in which a file could be read or opened so as to deal with different encodings.

I'd been seeing the "correct" characters when I was viewing the file with NotePad++. Knowing that I'd seen NotePad++ correctly handle files that other text editors could not, I thought I'd open it in WordPad. Sure enough the "normal" accents that I saw in NotePad++ were replaced with bizarre 2 character replacements. The file is an XML file and it says that it's using UTF-8 encoding. Thinking that perhaps NotePad++ was actually reading that and changing how to display the file, I created a copy of the just the "offending" lines in the file and saved it under a new name to see how NotePad++ would react without the help of the XML encoding declaration. I also did the same thing using WordPad to make sure that NotePad++ didn't do some special sort of saving. In all cases, WordPad showed the "incorrect" characters and NotePad++ showed the correct ones.

If I look at the file in Filealyser in the hex dump, I see the hex codes of the two characters that are producing the "incorrect" results and the cheater text at the right shows the "incorrect" characters.

So, these files have some sort of special encoding that apparently needs a special treatment that apparently some applications like NotePad++ know how to do. Since NotePad++ is a form of SciTe like the AutoIt editor, I checked in SciTe and it works "correctly" as well. Does anyone know how to let AutoIt do the same processing on these special characters to arrive at, say ASC(195) instead of ASC(192)ASC(128), or must I simply do a global replace for all the accent possibilities in the French language?

Thanks for any help and especially for following this convoluted explanation process! ;c)


Share this post


Link to post
Share on other sites

I know it's an old topic, but I have the same problem. When working with XML or any webpage, french characters does not appears correctly...

Any idea?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0