Watashi Posted August 6, 2014 Share Posted August 6, 2014 (edited) Here is the code I'm using to replace 1.2 million lines of code. This process takes about 5 minutes. However, after everything is correctly converted the way I want it, it doesn't write the Japanese, Korean, and Chinese within the file correctly. Instead the Asian characters are converted to characters similar to: ´óÐÍ¥¿¥ó¥¹, ¥¤¥ó¥Ù¥ó¥È¥ê’ˆ³ä, and ÀÛ·e¤´¤È¤Ë¥¯¥ê¥Æ¥£¥«¥ëÂÊ2%ÏòÉÏ;Î価10»ØÀÛ·e•r¡¢ÎäÆ÷¤ÎÕæ¤ÎÁ¦¤òÒý¤³ö¤¹¤³¤È¤¬¤Ç¤¤Þ¤¹¡£. But what it does do correctly is convert English characters correctly. My system is even set to Japanese Locale. I've been searching none stop for the past 5 hours trying to figure out how to fix this major problem of mine but not --with my very basic Auto-It knowledge-- able to find anything. Is Auto-It even capable of writing the Japanese, Korean, and Chinese characters (I have all 3 languages within my file) correctly? Auto-It v3.3.12.0; Here is the code: $szFile = "def.xml" $szText = FileRead ( $szFile, FileGetSize ( $szFile ) ) $szText = StringReplace ( $szText, "<arg", "<arg" ) $szText = StringReplace ( $szText, "/>", "/>" ) $szText = StringReplace ( $szText, "<p", "<p" ) $szText = StringReplace ( $szText, "</p>", "</p>" ) $szText = StringReplace ( $szText, "<br/>", "<br/>" ) $szText = StringReplace ( $szText, "<image", "<image" ) $szText = StringReplace ( $szText, "<font", "</font" ) $szText = StringReplace ( $szText, "</font>", "</font>" ) $szText = StringReplace ( $szText, ">", ">" ) $szText = StringReplace ( $szText, "<Image", "<Image" ) $szText = StringReplace ( $szText, "<BR/>", "<BR/>" ) $szText = StringReplace ( $szText, "<link", "<link" ) $szText = StringReplace ( $szText, "</link>", "</link>" ) $szText = StringReplace ( $szText, "<timer", "<timer" ) $szText = StringReplace ( $szText, "<sel-font", "<sel-font" ) $szText = StringReplace ( $szText, "</sel-font>", "</sel-font>" ) $szText = StringReplace ( $szText, "</sel-font>", "</sel-font>" ) $szText = StringReplace ( $szText, "<p>", "<p>" ) $szText = StringReplace ( $szText, "<p/>", "<p/>" ) $szText = StringReplace ( $szText, "</sel-font>", "</sel-font>" ) $szText = StringReplace ( $szText, "<alias>", "<alias>" ) $szText = StringReplace ( $szText, "</alias>", "</alias>" ) $szText = StringReplace ( $szText, "<text>", "<text>" ) $szText = StringReplace ( $szText, "</text>", "</text>" ) FileDelete ( $szFile ) FileWrite ( $szFile, $szText ) Edited August 6, 2014 by Watashi Link to comment Share on other sites More sharing options...
jchd Posted August 6, 2014 Share Posted August 6, 2014 Read up the help file under FileOpen() and especially its options. Unicode read mode is advised, provided your input is indeed Unicode. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Watashi Posted August 7, 2014 Author Share Posted August 7, 2014 (edited) I'll give it a read and look up some examples to see how it works. Will report back if any success or not. Edited August 7, 2014 by Watashi Link to comment Share on other sites More sharing options...
jchd Posted August 7, 2014 Share Posted August 7, 2014 (edited) FYI there are no "non-unicode" characters since Unicode maps every character, glyph, symbol or control / spacing used by humanity from ancient times to today (and beyond). I guess "unicode-8" means UTF8. This is an encoding convention for representing all Unicode characters in an unambiguous byte stream. Technically speaking, current AutoIt is not fully Unicode-aware since AutoIt native strings are using UCS-2 representation, which (roughly said) covers the subset of Unicode characters in the range 0x0000-0xFFFF (known as the Unicode BMP or Basic Multilingual Plane, or plane 0) and doesn't use surrogates. Using this character subset every character is represented by a single 16-bit coding unit. See this link for a short global presentation on Unicode. Hence any Unicode character in higher planes (that is in the range 0x10000-0x10FFFF) cannot be represented or manipulated directly with built-in functions. This limitation is a problem as the CJK Unified Ideographs uses plane 2 is being (slowly) adopted by a larger basis in the Han unification process. Edited August 7, 2014 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Watashi Posted August 7, 2014 Author Share Posted August 7, 2014 (edited) Hmm... then what would you call this? Whatever this is, that is what I need my script to output as. https://www.coscom.co.jp/learnjapanese801/japanesefont/nonunicode_win7.html https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/int_pr_select_language_version.mspx?mfr=true http://windows.microsoft.com/en-us/windows/change-system-locale#1TC=windows-7 Anyways, I'm trying to keep what is in Japanese, in Japanese, what is in Korean, in Korean, what is in Chinese, in Chinese, and this way isn't doing it for me either. Here is what I've converted everything to. expandcollapse popup#include <FileConstants.au3> #include <MsgBoxConstants.au3> Conversion() Func Conversion() ; Create a variable which holds the location of the file. $szFile = "def.xml" ; Open the file in UTF8 mode and check and see if the file exists, if no then error, if yes then continue. $szFileOpen = FileOpen($szFile, $FO_UTF8_FULL) If $szFileOpen = -1 Then MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.") Return False EndIf ; Read the open file after verifying filesize of the file and begin checking for text to replace. $szText = FileRead($szFile, FileGetSize ($szFile)) $szText = StringReplace($szText, "<arg", "<arg") $szText = StringReplace($szText, "/>", "/>") $szText = StringReplace($szText, "<p", "<p") $szText = StringReplace($szText, "</p>", "</p>") $szText = StringReplace($szText, "<br/>", "<br/>") $szText = StringReplace($szText, "<image", "<image") $szText = StringReplace($szText, "<font", "</font") $szText = StringReplace($szText, "</font>", "</font>") $szText = StringReplace($szText, ">", ">") $szText = StringReplace($szText, "<Image", "<Image") $szText = StringReplace($szText, "<BR/>", "<BR/>") $szText = StringReplace($szText, "<link", "<link") $szText = StringReplace($szText, "</link>", "</link>") $szText = StringReplace($szText, "<timer", "<timer") $szText = StringReplace($szText, "<sel-font", "<sel-font") $szText = StringReplace($szText, "</sel-font>", "</sel-font>") $szText = StringReplace($szText, "</sel-font>", "</sel-font>") $szText = StringReplace($szText, "<p>", "<p>") $szText = StringReplace($szText, "<p/>", "<p/>") $szText = StringReplace($szText, "</sel-font>", "</sel-font>") $szText = StringReplace($szText, "<alias>", "<alias>") $szText = StringReplace($szText, "</alias>", "</alias>") $szText = StringReplace($szText, "<text>", "<text>") $szText = StringReplace($szText, "</text>", "</text>") ; Close the open file so that it can be deleted and wrote anew. FileClose($szFileOpen) FileDelete($szFile) FileWrite($szFile, $szText) EndFunc Though this Displays FO_UTF8_FULL, I've tried: Read, Append,UTF8, UTF8_Full, and Unicode. Still, it writes all the Japanese, Chinese, and Korean text in similar fashion to "óÐÍ¥¿¥ó¥¹". What could I be doing wrong? Edited August 7, 2014 by Watashi Link to comment Share on other sites More sharing options...
jchd Posted August 7, 2014 Share Posted August 7, 2014 Read the helpfile under FileWrite, 4th Remark: When writing text AutoIt will write using ANSI by default. To write in Unicode mode the file must be opened with FileOpen() and the relevant flags. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Watashi Posted August 7, 2014 Author Share Posted August 7, 2014 (edited) I did which is why I first tried the Unicode flag. "Though this Displays FO_UTF8_FULL, I've tried: Read, Append,UTF8, UTF8_Full, and Unicode. Still, it writes all the Japanese, Chinese, and Korean text in similar fashion to "óÐÍ¥¿¥ó¥¹"." I even used the number alternative... My file reader to check the file is Sublime Text 3 Edited August 7, 2014 by Watashi Link to comment Share on other sites More sharing options...
AdmiralAlkex Posted August 7, 2014 Share Posted August 7, 2014 Then read it again because you're not getting it. You are not even using the file handle for anything, you just open and close it. FileWrite($szFileOpen, $szText) .Some of my scripts: ShiftER, Codec-Control, Resolution switcher for HTC ShiftSome of my UDFs: SDL UDF, SetDefaultDllDirectories, Converting GDI+ Bitmap/Image to SDL Surface Link to comment Share on other sites More sharing options...
Solution Iczer Posted August 7, 2014 Solution Share Posted August 7, 2014 (edited) FileClose($szFileOpen) FileDelete($szFile) $szFileOpen = FileOpen($szFile, $FO_UTF8 + $FO_OVERWRITE);<---- FileWrite($szFileOpen, $szText) also use handle for fileread, as says AdmiralClaws Edited August 7, 2014 by Iczer Link to comment Share on other sites More sharing options...
Watashi Posted August 7, 2014 Author Share Posted August 7, 2014 (edited) I see now what I was doing wrong and this is because I thought the function FileOpen was originally used with FileRead trailing behind it for replacing and writing directly to/from after FileOpen has opened the file. Furthermore, since I already deleted the file after I closed it, I figured it would then mean the file no longer existed. This tells me that you couldn't call FileOpen on a file that is no longer there in-which content only exists in memory via a variable(which I call a container since it holds stuff ). No need to get so upset AdmiralClaws, it was a simple misunderstanding on the logic behind how FileOpen is able to be called/used and the aforementioned is why it was misunderstood. Thank you kindly everybody. Edited August 7, 2014 by Watashi Link to comment Share on other sites More sharing options...
AdmiralAlkex Posted August 7, 2014 Share Posted August 7, 2014 Cat's don't get upset. They draw blood then go to sleep .Some of my scripts: ShiftER, Codec-Control, Resolution switcher for HTC ShiftSome of my UDFs: SDL UDF, SetDefaultDllDirectories, Converting GDI+ Bitmap/Image to SDL Surface Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now