DeltaRocked Posted May 9, 2013 Posted May 9, 2013 (edited) hello, I wanted to convert GB2312 (chinese) character encoding to UTF8. I will describe the problem as mentioned below and also let me know as to where I am going wrong in understanding the character encoding (source code also included). The Subject headers of an email contain the following Eg1: =?GB2312?B?MzE2Njg3OTU4o6zA67K7v6q1xM/6Y8rb?= Eg2: =?utf-8?B?56ym54+C6bmm?= where the format is : =?CharSet?B/Q?Base64_encoded_string?= When it comes to displaying UTF8 or GB2312 *individually* in different emails is not a problem, however when I want to display both these character-encodings , only one of them will get displayed. This can be achieved by defining the charset in the email-msg body. Content-Type: text/html; charset="UTF-8" OR Content-Type: text/html; charset="GB2312"If all goes well you can view chinese characters. Base64_Decode(""MzE2Njg3OTU4o6zA67K7v6q1xM/6Y8rb) output: MzE2Njg3OTU4o6zA67K7v6q1xM/6Y8rb = 316687958£¬Àë²»¿ªµÄÏúcÊÛ Save the mentioned .7z and extract the .eml file and open this file in your fav. email-client The code I am using is as follows and the output when replaced in the eml file doesnt give me any chinese characters . #Include $hFile=FileOpen('utf_t.txt',256+2) $sText = '316687958£¬Àë²»¿ªµÄÏúcÊÛ' FileWrite($hFile,_ConvertAnsiToUtf8($sText)) FileClose($hFile) Func _ConvertAnsiToUtf8($sText) Local $tUnicode = _WinAPI_MultiByteToWideChar($sText) If @error Then Return SetError(@error, 0, "") Local $sUtf8 =_WinAPI_WideCharToMultiByte(DllStructGetPtr($tUnicode), 65001) If @error Then Return SetError(@error, 0, "") Return SetError(0, 0, $sUtf8) EndFunc ;==>_ConvertAnsiToUtf8Thanks in advance. Regards Del. [uPDATE] After searching found this:http://hi.baidu.com/qianyiyidu/item/579ee4a1f6ca1b3e030a4df0 //GB2312到UTF-8的转换 static int GB2312ToUtf8(const char* gb2312, char* utf8) { int len = MultiByteToWideChar(CP_ACP, 0, gb2312, -1, NULL, 0); wchar_t* wstr = new wchar_t[len+1]; memset(wstr, 0, len+1); MultiByteToWideChar(CP_ACP, 0, gb2312, -1, wstr, len); len = WideCharToMultiByte(CP_UTF8, 0, wstr, -1, NULL, 0, NULL, NULL); utf8 = new char[len+1]; memset(utf8, 0, len+1); WideCharToMultiByte(CP_UTF8, 0, wstr, -1, utf8, len, NULL, NULL); if(wstr) delete[] wstr; return len; } Edited May 15, 2013 by DeltaRocked
DeltaRocked Posted May 10, 2013 Author Posted May 10, 2013 (edited) Hello,Partially solved this issue based on the various resouces available in Autoit Forums itself. A big thanks to AZJIO for making available the encoding.au3 in one of the posts._EncodingToUnicode_API() was picked up from encoding.au3 which is available in the post over here,Another version by Arilvv can also be found over hereNote to Self: to get the conversion right, one needs to know the *correct* codepage identifier which is available herehttp://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspxImage of the conversion : http://imm.io/15phW#include $hFile=FileOpen("GB2312ToUnicode.txt",256+2) $sCodePage_Identifier=936 ;GB2312 ; refer to http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx ; for more information on the codepage identifier. ; Base64 of the below mentioned string : MzE2Njg3OTU4o6zA67K7v6q1xM/6Y8rb $sString='316687958£¬Àë²»¿ªµÄÏúcÊÛ' FileWrite($hFile,_EncodingToUnicode_API($sString,$sCode_Page)) FileClose("out2.txt") Func _EncodingToUnicode_API($sString,$sCodePage_Identifier) Local $BufferSize = StringLen($sString) * 2 Local $Buffer = DllStructCreate("byte[" & $BufferSize & "]") Local $Return = DllCall("Kernel32.dll", "int", "MultiByteToWideChar", _ "int", $sCodePage_Identifier, _ "int", 0, _ "str", $sString, _ "int", StringLen($sString), _ "ptr", DllStructGetPtr($Buffer), _ "int", $BufferSize) Local $UnicodeBinary = DllStructGetData($Buffer, 1) Local $UnicodeHex1 = StringReplace($UnicodeBinary, "0x", "") Local $StrLen = StringLen($UnicodeHex1) Local $UnicodeString, $UnicodeHex2, $UnicodeHex3 For $i = 1 To $StrLen Step 4 $UnicodeHex2 = StringMid($UnicodeHex1, $i, 4) $UnicodeHex3 = StringMid($UnicodeHex2, 3, 2) & StringMid($UnicodeHex2, 1, 2) $UnicodeString &= ChrW(Dec($UnicodeHex3)) Next $Buffer = 0 Return $UnicodeString EndFuncTODO:MIME-Decode for Subject and correctly identify the character encoding and complete the conversion. Edited May 10, 2013 by DeltaRocked
leuce Posted March 20, 2015 Posted March 20, 2015 HelloI also want to convert GB2312 to UTF8 and I would like to try the script that is mentioned in the second post. However, when I try to run it, AutoIt says: "Cannot parse #include". Any idea what might be wrong?ThanksSamuel
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now