dleigh Posted January 8, 2007 Share Posted January 8, 2007 I've been looking through the help and the forums but I can't really find a solution for my problem. I'm parsing through files that contain French song lyrics. The problem is that my accented characters like: é, è, ç, ê, etc. are getting changed in the course of my script execution. For instance: é becomes é À becomes à I'm in France, but being American, I run a box with all the settings in American English. While I have some familiarity with codepage activity in web pages and the "setlocale" command in PHP, I'm really at a loss to figure out how to keep my accents from getting hammered as soon as the text lives in an AutoIt variable. I've tried the OEM2ANSI function that was posted in the forums but that doesn't fix the problem it just changes the errors! So, if any of you French people on the forum have an answer, SVP donnez le moi! (please give it to me)...or at least some hints as to where to keep looking. Thanks (and MERCI!) to any and all help. Link to comment Share on other sites More sharing options...
jlorenz1 Posted January 8, 2007 Share Posted January 8, 2007 Hi, in this case it's better to work with the Func Chr(). You'll find all characters in the Appendicx of help file e.g. "é" is Chr(233). You can also use Asc("é") and you'll get the resultat "233" I hope this was your problem. Salut Johannes Johannes LorenzBensheim, Germanyjlorenz1@web.de[post="12602"]Highlightning AutoIt Syntax in Notepad++ - Just copy in your Profile/application data/notepad++[/post] Link to comment Share on other sites More sharing options...
dleigh Posted January 9, 2007 Author Share Posted January 9, 2007 It isn't exactly what I was looking for but it gave me additional ideas of what to look at. In looking at it more fully, I see from the appendix that what I'm expecting to look like:Àis listed as ASCII 192. Somewhere in the reading of the file and simple use of string functions to extract sub-strings from the file, the Asc(192) gets changed to:Ãwhich is Asc(195)Asc(128). Here's my code that's generating this error. Does anyone know of specific functions (file reading and/or string manipulation, e.g. FileRead, StringMid, etc.) which have inherent struggles with extended ASCII characters (for that is what this problem is, in fact about...the "mishandling" of extended ASCII characters as found in the French language)?CODE; Includes & declares#include <file.au3>#include <array.au3>#include <GUIConstants.au3>Dim $file, $FileSize, $FileTime, $FirstLine, $search, $SongFile, $SongFileHandle, $SongLyrics, $SongTitle, $VerseMark; Set the OpenSong song directoryFileChangeDir("C:\Documents and Settings\David\My Documents\OpenSong\Songs"); Prime the file read pump$search = FileFindFirstFile("*.*") ; Check if the search was successfulIf $search = -1 Then MsgBox(0, "Error", "No files/directories matched the search pattern") ExitEndIfWhile 1 $file = FileFindNextFile($search) If @error Then ExitLoop If $file <> "_cache" Then $FileSize = FileGetSize($file) $FileTime = FileGetTime($file,0,1) ParseSongFile() MsgBox(0,"","Song File: " & $file & @crlf & _ " Size: " & $FileSize & @crlf & _ " Date: " & $FileTime & @CRLF & _ " Title: " & $SongTitle & @CRLF & _ " 1st line: " & $FirstLine & @CRLF & _ " lyrics: " & $SongLyrics) endifWEnd; Close the search handleFileClose($search)exit; Read the input file into an arrayFunc ParseSongFile() $SongFileHandle = FileOpen(@WorkingDir & "\" & $file,0) $SongFile = FileRead($SongFileHandle) $SongTitle = StringMid($SongFile,StringInStr($SongFile,"<title>")+7,StringInStr($SongFile,"</title>")-StringInStr($SongFile,"<title>")-7) $SongLyrics = StringMid($SongFile,StringInStr($SongFile,"<lyrics>")+8,StringInStr($SongFile,"</lyrics>")-StringInStr($SongFile,"<lyrics>")-8) for $i = 1 to 15 $VerseMark = "[V" & $i & "]" $SongLyrics = StringReplace($SongLyrics,$VerseMark & @LF,"") Next $SongLyrics = StringReplace($SongLyrics,@LF & @LF,@LF) $FirstLine = StringLeft($SongLyrics,StringInStr($SongLyrics,@lf)-1)EndFunc Link to comment Share on other sites More sharing options...
dleigh Posted January 9, 2007 Author Share Posted January 9, 2007 Just some additional stuff I've discovered since my last post (still seeking the final solution though! ;c)):I saw that my character encoding issues were there from the moment of FileRead. I searched some more, in the AutoIt help and didn't find anything with respect to ways in which a file could be read or opened so as to deal with different encodings.I'd been seeing the "correct" characters when I was viewing the file with NotePad++. Knowing that I'd seen NotePad++ correctly handle files that other text editors could not, I thought I'd open it in WordPad. Sure enough the "normal" accents that I saw in NotePad++ were replaced with bizarre 2 character replacements. The file is an XML file and it says that it's using UTF-8 encoding. Thinking that perhaps NotePad++ was actually reading that and changing how to display the file, I created a copy of the just the "offending" lines in the file and saved it under a new name to see how NotePad++ would react without the help of the XML encoding declaration. I also did the same thing using WordPad to make sure that NotePad++ didn't do some special sort of saving. In all cases, WordPad showed the "incorrect" characters and NotePad++ showed the correct ones. If I look at the file in Filealyser in the hex dump, I see the hex codes of the two characters that are producing the "incorrect" results and the cheater text at the right shows the "incorrect" characters. So, these files have some sort of special encoding that apparently needs a special treatment that apparently some applications like NotePad++ know how to do. Since NotePad++ is a form of SciTe like the AutoIt editor, I checked in SciTe and it works "correctly" as well. Does anyone know how to let AutoIt do the same processing on these special characters to arrive at, say ASC(195) instead of ASC(192)ASC(128), or must I simply do a global replace for all the accent possibilities in the French language?Thanks for any help and especially for following this convoluted explanation process! ;c) Link to comment Share on other sites More sharing options...
dgz Posted May 10, 2009 Share Posted May 10, 2009 I know it's an old topic, but I have the same problem. When working with XML or any webpage, french characters does not appears correctly... Any idea? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now