H5N1 Posted October 20, 2010 Share Posted October 20, 2010 Hi all! I'm new here and i'm a total amateur. Here's my problem: I've downloaded a huge amount of czech text files. But on my system some letters aren't displayed correct (in this text file). For example: Text in file is "Dobøe", but correct it should be "Dobře". So i want to make a script who corrects the false letters automaticly. My beginner script: $szFile = "D:\Downloads\Czechtext.txt" $szText = FileRead($szFile,FileGetSize($szFile)) $szText = StringReplace($szText, "ø", "ř") $szText = StringReplace($szText, "Ø", "Ř") $szText = StringReplace($szText, "Ù", "ů") $szText = StringReplace($szText, "ù", "ů") $szText = StringReplace($szText, "ì", "ě") $szText = StringReplace($szText, "ò", "ň") $szText = StringReplace($szText, "È", "Č") $szText = StringReplace($szText, "è", "č") $szText = StringReplace($szText, "ï", "ď") FileDelete($szFile) FileWrite($szFile,$szText) I got no errors but the letters are still inorrect. After running the script, the letter is just a normal "r" insteat of "ř" (or "u" or "e" or "c", etc.). Where is my fault? Link to comment Share on other sites More sharing options...
jchd Posted October 21, 2010 Share Posted October 21, 2010 You definitely need to save your source files in UTF-8 + BOM format, so as to prevent any partial or erroneous Unicode to some ANSI codepage "translation" (= destruction, like what you've experienced). You can do that in Scite: File >> Encoding >> UTF8 + BOM and then modify (even a dummy action, like insert space then backspace) and save the file or else the change won't be made!When your source file is in Unicode (UTF-8 + BOM) you can check that replacing arbitrary Unicode characters, e.g.$szText = StringReplace($szText, "Skrýchov u Opařan", "БОЛЬШОЕ ГРИДИНО")$szText = StringReplace($szText, " فرنسيّ عربيّ", "เขาจะได้ไปเที่ยวเมืองลาว")will work flawlessly, even if the font you use in your editor doesn't display all these characters correctly. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
H5N1 Posted October 21, 2010 Author Share Posted October 21, 2010 First thanks a lot for your answer. But the problem has not been solved. First i just changed the encoding option to UTF-8 + BOM, done a dummy action and saved -> incorrect letters. Then i created a new au3 file, changed the encoding option and paste my code from the original file -> incorrect letters. After that i opend the (original ansi) text file, saved it as unicode, run the new (UTF-8 + BOM) script but the letters are still incorrect. By the way: I'm using AutoIt 3.3.6.1 and Win7. Link to comment Share on other sites More sharing options...
jchd Posted October 21, 2010 Share Posted October 21, 2010 This begs a question: Which ANSI codepage does your system use? Easy guess: latin-1 ANSI Explanation Open the charmap.exe applet to see that the following characters have the same encoding: ANSI Latin1 codepage = ANSI central Europe codepage øØùìòÈèï = řŘůěňČčď Here I mean 'ø' has encoding 0xF8 in Windows Latin1 ANSI, and 'ř' has the same 0xF8 encoding in Windows Central Europe ANSI. Same for 'Ø' and 'Ř', ... Your input czech text uses Central Europe ANSI codepage (not surprisingly), but you're using another codepage to display it. Temporarily change your system codepage to see that. Possible solution: read input in ANSI mode (with central Europe codepage active) and rewrite it in Unicode (UTF-8 + BOM is recommended) so that the characters get encoded correctly independantly of the system codepage. Use any Unicode-aware tool to display/process the Unicode output. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
H5N1 Posted October 21, 2010 Author Share Posted October 21, 2010 (edited) I feel so silly now. Changing the codepage was the right thing to do. Everthing is fine now. So it wasn't a fault by Autoit. Thank you very much! (or "Děkuji mnohokrát" ) Edited October 21, 2010 by H5N1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now