Jump to content

Convert Unicode string to ANSI (WIN-12xx)


Myicq
 Share

Recommended Posts

I need to develop an app that talks to a non-unicode application. This application (graphic) can set font encoding on different objects, so my task is bascially

  • Read string in Excel 2003 or 2007. This string will be UniCode
  • Convert each character in the string to ANSI (Win12xx)
  • Either save the converted string to a text file or send the string by Ethernet socket
To give an example of point 2:

Take the string [Москва]

In Unicode this would be 041C 043E 0441 043A 0432 0430 (PROBABLY encoded as 1C 04 3E 04 etc since Excel uses Little Endian)

This can be converted to Win1251 according to http://en.wikipedia.org/wiki/Windows-1251

End result would be CC EE F1 EA E2 E0

My question is: has someone done a / know of a UDF to do this job (feed a string of LE bytes in, get a string of ANSI bytes out) - or is there a native Windows function ?

If someone could post a small example going from

$instring = Москва

I would be really happy for any help :D

Attached is a ZIP with two text files. Unicode BOM is the source format. Win1251 destination format.

moskva.zip

Edited by Myicq

I am just a hobby programmer, and nothing great to publish right now.

Link to comment
Share on other sites

Func _UnicodeToANSI($sString)
    #cs
        Local Const $SF_ANSI = 1
        Local Const $SF_UTF16_LE = 2
        Local Const $SF_UTF16_BE = 3
        Local Const $SF_UTF8 = 4
    #ce
    Local Const $SF_ANSI = 1, $SF_UTF8 = 4
    Return BinaryToString(StringToBinary($sString, $SF_UTF8), $SF_ANSI)
EndFunc   ;==>_UnicodeToANSI

Edited by guinness

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Presumably, the Excel string was obtained via COM, so it's a native (~UTF-16 LE) string, not UTF-8.

Even if it comes from a file, then it's probably be UTF-8 in the file, but at the time where it's read by AutoIt, it should become a native AutoIt (~UTF-16LE) string.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

An update:

Tried the code and it does not give the expected result.

Can any of you provide a correction to this code:

$o = FileOpen(@ScriptDir& "\out.txt",2)     ; ANSI, Write and replace
    $s = "русский язык"   ; read from Excel normally...
    FileWriteline ($o, _UnicodeToANSI( $s ))
    fileclose($o)

The resulting files contain the bytes from the original source string, so in this case from the start

0x40 0x04 0x43 0x04 0x41 0x04 ..

which is exactly the bytes from the unicode characters encoded in UTF16-LE.

Seems like the string was not converted at all.

I would have expected instead to see

0xF0 0xF3 ...

So could you provide a more complete example that shows how to get from UTF16 to a file in ANSI encoding ?

Thanks

Edited by Myicq

I am just a hobby programmer, and nothing great to publish right now.

Link to comment
Share on other sites

You'll find that the code below does what you're after. Instead of relying on the fact that your machine/user probably has the expected locale setting (which determines which ANSI codepage to use by default), I forced ANSI 1251 as you can see. Change that according to your needs.

$o = FileOpen(@ScriptDir& "\out.txt", 2)     ; ANSI, Write and replace
$s = "русский язык"   ; read from Excel normally...
FileWriteline ($o, _UTF16toANSI1251( $s ))
FileClose($o)
Func _UTF16toANSI1251($sString)
Local $aResult = DllCall("kernel32.dll", "int", "WideCharToMultiByte", "uint", 1251, "dword", 0, "wstr", $sString, "int", -1, _
        "ptr", 0, "int", 0, "ptr", 0, "ptr", 0)
Local $tText = DllStructCreate("char[" & $aResult[0] & "]")
$aResult = DllCall("Kernel32.dll", "int", "WideCharToMultiByte", "uint", 1251, "dword", 0, "wstr", $sString, "int", -1, _
       "ptr", DllStructGetPtr($tText), "int", $aResult[0], "ptr", 0, "ptr", 0)
Return(DllStructGetData($tText, 1))
EndFunc

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks for the response jchd. I can't really see through why your solution works (the DLL calls are quite complex), but since I need to convert between unicode and different codepages, I chose a different approach..

This code will give the correct result: a simple file containing just the ansi characters. Depending on the codepage, these characters look like gibberish, but the byte values are correct.

For the record, the machine to use the file can set codepage on a per-line base, so no worry there.

This solution may be a lot slower bcs of database calls, but speed is no issue and it's just a few lines.

; ===============
; Search for the data in XLS file
; ===============
Func _do_data()
;_ExcelReadSheetToArray($oExcel [, $iStartRow = 1 [, $iStartColumn = 1 [, $iRowCnt = 0 [, $iColCnt = 0 [, $iColShift = False]]]]])
$o = FileOpen(@ScriptDir& "\out.txt",2)  ; ANSI, Write and replace, BINARY
$rowstart = 9
$colstart = 1
$arr = _ExcelReadSheetToArray($oExcel, $rowstart, $colstart, 1) ;read 1 row only
filewriteline ($o, _u2a($arr[1][4]))
filewriteline ($o, _u2a($arr[1][5]))
fileclose($o)
EndFunc[/font]
 
 
;=============================================
; this function makes the conversion using an SQLite Database
; format of database:
; unicode(decimal), ansi(hexvalue), codepage, name of char, unicode(hex)
;=============================================
[font=courier new,courier,monospace]Func _u2a($u)
Local $aRow
; u = unicodestring, UTF16LE
local $b   ; will contain lookup numbers
local $ansi = "" ; will contain converted result
$b = StringToASCIIArray($u)
if IsArray($b) Then ; check for empty string here..
  for $i = 0 to ubound($b)-1
   _SQLite_QuerySingleRow(-1,"SELECT ansi FROM [u2a] where unicode='" & $b[$i] & "';", $aRow)
   $ansi = $ansi & chr(number(($arow[0])))
  next
endif
return $ansi
EndFunc

Database (.sqlite format) attached.

Unicode_2_ansi.zip

I am just a hobby programmer, and nothing great to publish right now.

Link to comment
Share on other sites

Ahem ... in your script you don't SELECT on cp at all, so expect random results!

If you actually select on char code and cp, then make a compound index on these.

But why don't you use the much simpler and faster function using WideCharToMultiByte and pass the codepage as well?

There is no magic in this function: this is what Windows uses to convert to/from any string representation. The first call is needed to get the output length, the second call perform the actual conversion. You can use all the codepages listed here.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...