Jump to content

Issue: FileRead or FileWrite not writing Asian text correctly...


Go to solution Solved by Iczer,

Recommended Posts

Here is the code I'm using to replace 1.2 million lines of code. This process takes about 5 minutes. However, after everything is correctly converted the way I want it, it doesn't write the Japanese, Korean, and Chinese within the file correctly. Instead the Asian characters are converted to characters similar to: ´óÐÍ¥¿¥ó¥¹, ¥¤¥ó¥Ù¥ó¥È¥ê’ˆ³ä, and ÀÛ·e¤´¤È¤Ë¥¯¥ê¥Æ¥£¥«¥ëÂÊ2%ÏòÉÏ;Î価10»ØÀÛ·e•r¡¢ÎäÆ÷¤ÎÕæ¤ÎÁ¦¤òÒý¤­³ö¤¹¤³¤È¤¬¤Ç¤­¤Þ¤¹¡£. But what it does do correctly is convert English characters correctly. My system is even set to Japanese Locale. I've been searching none stop for the past 5 hours trying to figure out how to fix this major problem of mine but not --with my very basic Auto-It knowledge-- able to find anything. 

Is Auto-It even capable of writing the Japanese, Korean, and Chinese characters (I have all 3 languages within my file) correctly?

Auto-It v3.3.12.0; Here is the code:

$szFile = "def.xml"
$szText = FileRead ( $szFile, FileGetSize ( $szFile ) )
$szText = StringReplace ( $szText, "<arg", "&lt;arg" )
$szText = StringReplace ( $szText, "/>", "/&gt;" )
$szText = StringReplace ( $szText, "<p", "&lt;p" )
$szText = StringReplace ( $szText, "</p>", "&lt;/p&gt;" )
$szText = StringReplace ( $szText, "<br/>", "&lt;br/&gt;" )
$szText = StringReplace ( $szText, "<image", "&lt;image" )
$szText = StringReplace ( $szText, "<font", "&lt;/font" )
$szText = StringReplace ( $szText, "</font>", "&lt;/font&gt;" )
$szText = StringReplace ( $szText, ">", "&gt;" )
$szText = StringReplace ( $szText, "<Image", "&lt;Image" )
$szText = StringReplace ( $szText, "<BR/>", "&lt;BR/&gt;" )
$szText = StringReplace ( $szText, "<link", "&lt;link" )
$szText = StringReplace ( $szText, "</link>", "&lt;/link&gt;" )
$szText = StringReplace ( $szText, "<timer", "&lt;timer" )
$szText = StringReplace ( $szText, "<sel-font", "&lt;sel-font" )
$szText = StringReplace ( $szText, "</sel-font>", "&lt;/sel-font&gt;" )
$szText = StringReplace ( $szText, "</sel-font>", "&lt;/sel-font&gt;" )
$szText = StringReplace ( $szText, "&lt;p>", "&lt;p&gt;" )
$szText = StringReplace ( $szText, "&lt;p/>", "&lt;p/&gt;" )
$szText = StringReplace ( $szText, "</sel-font>", "&lt;/sel-font&gt;" )
$szText = StringReplace ( $szText, "<alias&gt;", "<alias>" )
$szText = StringReplace ( $szText, "</alias&gt;", "</alias>" )
$szText = StringReplace ( $szText, "<text&gt;", "<text>" )
$szText = StringReplace ( $szText, "</text&gt;", "</text>" )
FileDelete ( $szFile )
FileWrite ( $szFile, $szText )
Edited by Watashi
Link to comment
Share on other sites

Read up the help file under FileOpen() and especially its options. Unicode read mode is advised, provided your input is indeed Unicode.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

FYI there are no "non-unicode" characters since Unicode maps every character, glyph, symbol or control / spacing used by humanity from ancient times to today (and beyond).

I guess "unicode-8" means UTF8. This is an encoding convention for representing all Unicode characters in an unambiguous byte stream.

Technically speaking, current AutoIt is not fully Unicode-aware since AutoIt native strings are using UCS-2 representation, which (roughly said) covers the subset of Unicode characters in the range 0x0000-0xFFFF (known as the Unicode BMP or Basic Multilingual Plane, or plane 0) and doesn't use surrogates. Using this character subset every character is represented by a single 16-bit coding unit. See this link for a short global presentation on Unicode.

Hence any Unicode character in higher planes (that is in the range 0x10000-0x10FFFF) cannot be represented or manipulated directly with built-in functions. This limitation is a problem as the CJK Unified Ideographs uses plane 2 is being (slowly) adopted by a larger basis in the Han unification process.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Hmm... then what would you call this? Whatever this is, that is what I need my script to output as.

https://www.coscom.co.jp/learnjapanese801/japanesefont/nonunicode_win7.html

https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/int_pr_select_language_version.mspx?mfr=true

http://windows.microsoft.com/en-us/windows/change-system-locale#1TC=windows-7

Anyways, I'm trying to keep what is in Japanese, in Japanese, what is in Korean, in Korean, what is in Chinese, in Chinese, and this way isn't doing it for me either.

Here is what I've converted everything to.

#include <FileConstants.au3>
#include <MsgBoxConstants.au3>

Conversion()

Func Conversion()
    ; Create a variable which holds the location of the file.
    $szFile = "def.xml"
    ; Open the file in UTF8 mode and check and see if the file exists, if no then error, if yes then continue.
    $szFileOpen = FileOpen($szFile, $FO_UTF8_FULL)
    If $szFileOpen = -1 Then
        MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.")
        Return False
    EndIf
    ; Read the open file after verifying filesize of the file and begin checking for text to replace.
     $szText = FileRead($szFile, FileGetSize ($szFile))
     $szText = StringReplace($szText, "<arg", "&lt;arg")
     $szText = StringReplace($szText, "/>", "/&gt;")
     $szText = StringReplace($szText, "<p", "&lt;p")
     $szText = StringReplace($szText, "</p>", "&lt;/p&gt;")
     $szText = StringReplace($szText, "<br/>", "&lt;br/&gt;")
     $szText = StringReplace($szText, "<image", "&lt;image")
     $szText = StringReplace($szText, "<font", "&lt;/font")
     $szText = StringReplace($szText, "</font>", "&lt;/font&gt;")
     $szText = StringReplace($szText, ">", "&gt;")
     $szText = StringReplace($szText, "<Image", "&lt;Image")
     $szText = StringReplace($szText, "<BR/>", "&lt;BR/&gt;")
     $szText = StringReplace($szText, "<link", "&lt;link")
     $szText = StringReplace($szText, "</link>", "&lt;/link&gt;")
     $szText = StringReplace($szText, "<timer", "&lt;timer")
     $szText = StringReplace($szText, "<sel-font", "&lt;sel-font")
     $szText = StringReplace($szText, "</sel-font>", "&lt;/sel-font&gt;")
     $szText = StringReplace($szText, "</sel-font>", "&lt;/sel-font&gt;")
     $szText = StringReplace($szText, "&lt;p>", "&lt;p&gt;")
     $szText = StringReplace($szText, "&lt;p/>", "&lt;p/&gt;")
     $szText = StringReplace($szText, "</sel-font>", "&lt;/sel-font&gt;")
     $szText = StringReplace($szText, "<alias&gt;", "<alias>")
     $szText = StringReplace($szText, "</alias&gt;", "</alias>")
     $szText = StringReplace($szText, "<text&gt;", "<text>")
     $szText = StringReplace($szText, "</text&gt;", "</text>")
    ; Close the open file so that it can be deleted and wrote anew.
    FileClose($szFileOpen)
    FileDelete($szFile)
    FileWrite($szFile, $szText)
EndFunc

Though this Displays FO_UTF8_FULL, I've tried: Read, Append,UTF8, UTF8_Full, and Unicode. Still, it writes all the Japanese, Chinese, and Korean text in similar fashion to "óÐÍ¥¿¥ó¥¹".

What could I be doing wrong? 

Edited by Watashi
Link to comment
Share on other sites

Read the helpfile under FileWrite, 4th Remark:

 

When writing text AutoIt will write using ANSI by default. To write in Unicode mode the file must be opened with FileOpen() and the relevant flags.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I did which is why I first tried the Unicode flag. 

"Though this Displays FO_UTF8_FULL, I've tried: Read, Append,UTF8, UTF8_Full, and Unicode. Still, it writes all the Japanese, Chinese, and Korean text in similar fashion to "óÐÍ¥¿¥ó¥¹"." 

 

I even used the number alternative... 

My file reader to check the file is Sublime Text 3

Edited by Watashi
Link to comment
Share on other sites

Then read it again because you're not getting it. You are not even using the file handle for anything, you just open and close it.

FileWrite($szFileOpen, $szText)

Link to comment
Share on other sites

I see now what I was doing wrong and this is because I thought the function FileOpen was originally used with FileRead trailing behind it for replacing and writing directly to/from after FileOpen has opened the file. Furthermore, since I already deleted the file after I closed it, I figured it would then mean the file no longer existed. This tells me that you couldn't call FileOpen on a file that is no longer there in-which content only exists in memory via a variable(which I call a container since it holds stuff :P).

No need to get so upset AdmiralClaws, it was a simple misunderstanding on the logic behind how FileOpen is able to be called/used and the aforementioned is why it was misunderstood. 

Thank you kindly everybody.

 

Edited by Watashi
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...