Jump to content

_WinAPI_OemToChar - different approach


Recommended Posts

Nothing major, but even small improvements can be put up for discussion:
The function _WinAPI_OemToChar() is currently defined as follows:

Func _WinAPI_OemToChar($sStr)
    Local $aCall, $sRetStr = "", $nLen = StringLen($sStr) + 1, $iStart = 1

    While $iStart < $nLen
        $aCall = DllCall('user32.dll', 'bool', 'OemToCharA', 'str', StringMid($sStr, $iStart, 65536), 'str', '')
        If @error Or Not $aCall[0] Then Return SetError(@error + 10, @extended, '')
        $sRetStr &= $aCall[2]
        $iStart += 65536
    WEnd

    Return $sRetStr
EndFunc

Now - why the loop? Shouldn't one call be enough?
The problem is that the output buffer in the 2nd function parameter of OemToCharA was defined as an empty string of the type "str".
Here AutoIt implicitly creates a buffer of the size 65536 - regardless of whether the target string is only 1 character long or 1 million.
In the first case, you have only wasted memory.
In the second case, however, we have a buffer overflow and AutoIt would therefore crash if we wanted to convert a string with 65536 characters.

The problem was therefore tackled in the current implementation by splitting the string into chunks of 65536 characters, converting them individually and then reassembling them. 

My suggestion is as follows: The output buffer can also be defined directly with the correct size.
This way we don't waste memory with small strings and with large strings only one DllCall is necessary.
In addition to saving memory, this is actually also a few percent faster:

#include <WinAPIConv.au3>
#include <String.au3>

Global Const $aN = [1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7]
Global Const $iRUNS = 100
Global $iT, $iT1, $iT2

Global $sMeasureFormat = "%15.3f  ms", $dFormLen = StringLen(StringFormat($sMeasureFormat, 0))

Global $sHeader = StringFormat("\n% 11s\t% " & $dFormLen & "s\t% " & $dFormLen & "s% 13s\n", "String size", "_WinAPI_OemToChar", "_WinAPI_OemToChar2", "Speedup")
ConsoleWrite($sHeader & _StringRepeat("-", 64))

For $iN In $aN
    
    ; put some preparation stuff here
    Global $sString = _StringRepeat("ß", $iN)
    ConsoleWrite(StringFormat("\n% 11d", StringLen($sString)))

    ; the original _WinAPI_OemToChar()
    $iT = TimerInit()
    For $i = 1 To $iRUNS
        _WinAPI_OemToChar($sString)
    Next
    $iT1 = TimerDiff($iT)
    ConsoleWrite(StringFormat("\t" & $sMeasureFormat, $iT1 / $iRUNS))

    ; the modified _WinAPI_OemToChar
    $iT = TimerInit()
    For $i = 1 To $iRUNS
        _WinAPI_OemToChar2($sString)
    Next
    $iT2 = TimerDiff($iT)
    ConsoleWrite(StringFormat("\t" & $sMeasureFormat, $iT2 / $iRUNS))

    ConsoleWrite(StringFormat("\t%10.1f %%", (1 - $iT2 / $iT1) * 100))
Next
ConsoleWrite(@CRLF & @CRLF)



Func _WinAPI_OemToChar2($sStr)
    ; input string
    Local $tIn = DllStructCreate("CHAR [" & StringLen($sStr) + 1 & "]")
    DllStructSetData($tIn, 1, $sStr)

    ; output buffer
    Local $tOut = DllStructCreate("CHAR [" & StringLen($sStr) + 1 & "]")

    Local $aCall = DllCall("user32.dll", 'BOOL', 'OemToCharA', 'PTR', DllStructGetPtr($tIn), 'PTR', DllStructGetPtr($tOut))
    If @error Or Not $aCall[0] Then Return SetError(@error + 10, @extended, '')
        
    Return DllStructGetData($tOut, 1)
EndFunc   ;==>_WinAPI_OemToChar2


This as a small suggestion for discussion or if no arguments against it are found adoption in later AutoIt versions.

Link to comment
Share on other sites

Where does it come this buffer limitation in the current implementation anyway? It's just because someone insisted to pass data as an ANSI string instead of passing a pointer to a buffer where the data it's located? Your implementation looks like the way it should be.

When the words fail... music speaks.

Link to comment
Share on other sites

39 minutes ago, Andreik said:

Where does it come this buffer limitation in the current implementation anyway?

It is not so much a limitation as a choice with trade-offs.
Strings in C interfaces are pointers to fixed-size char arrays.
If a function that you want to call with DllCall expects a corresponding string as a parameter, you must pass it a corresponding string buffer.
However, this also means that every time you pass a string, you have to create a buffer of the appropriate size, fill it with the string and then pass it to the pointer to the DllCall function.

To make this a little easier, there is a shortcut in the DllCall function: "str" and "wstr". These data types do most of the work.
In the variant where a string is passed to the function, this would be simple: DllCall could simply create a buffer of the corresponding size of the string and it's done. However, this is no longer feasible with the variant where the output buffer is also to be designed so simply. Because: When the function is called, it is not yet clear how large the buffer must be so that the data to be written fits into it.
For this reason, a fixed size had to be specified and 65536 characters were chosen as a trade-off. This works for many cases and simplifies the handling with DllCall immensely. But there are also cases like this one, where you have to create the buffer yourself.

I can't really blame the AutoIt devs for this - that's ok.
 

41 minutes ago, argumentum said:

Yes, that's what I would have done next.
To keep the tracker clean, I wanted to discuss the whole thing here first.
It's not really unlikely that I'm overlooking some aspect of this and that there were specific reasons why it was designed this way.

But since you have responded to this and have not yet added such an aspect, there is a chance that I am not so wrong with my thoughts on this.

Apart from that, it's nothing really big or serious. I just happened to stumble across it again today. But even small improvements should not be ignored.

Link to comment
Share on other sites

5 minutes ago, AspirinJunkie said:

I can't really blame the AutoIt devs for this - that's ok.

It's not about blaming people but looping doesn't really looks the proper way. Have you considered a hybrid solution like checking the string size and pass data as str or ptr based on the length?

When the words fail... music speaks.

Link to comment
Share on other sites

A hybrid variant would of course be possible, but I don't see any advantage that would result from this.

The advantage of the str variant is to save code and make it clearer.

However, we need the manual string buffer anyway, so a hybrid variant would inflate the code function without any further advantage (at least none that I can see).

Link to comment
Share on other sites

Just aesthetics. It looks better than a loop like in the actual implementation but no further advantage over your proposed implementation.

When the words fail... music speaks.

Link to comment
Share on other sites

Btw: I did not manage to get the variant with OemToCharW instead of OemToCharA to work. So with WChar as data type for the output.

Various characters are destroyed - line breaks etc.

Maybe someone has a solution or explanation.

Link to comment
Share on other sites

MultiByteToWideChar and WideCharToMultiByte don't have any issue.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Func _CodepageToWstring($sCP, $iCodepage = Default)
    If $iCodepage = Default Then $iCodepage = 65001
    Local $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _
            "ptr", 0, "int", 0)
    Local $tWstr = DllStructCreate("wchar[" & $aResult[0] & "]")
    $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _
            "struct*", $tWstr, "int", $aResult[0])
    Return DllStructGetData($tWstr, 1)
EndFunc   ;==>_CodepageToWstring

Func _WstringToCodepage($sWstr, $iCodepage = Default)
    If $iCodepage = Default Then $iCodepage = 65001
    Local $aResult = DllCall("kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sWstr, "int", StringLen($sWstr), _
            "ptr", 0, "int", 0, "ptr", 0, "ptr", 0)
    Local $tCP = DllStructCreate("char[" & $aResult[0] & "]")
    $aResult = DllCall("Kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sWstr, "int", StringLen($sWstr), _
            "struct*", $tCP, "int", $aResult[0], "ptr", 0, "ptr", 0)
    Return DllStructGetData($tCP, 1)
EndFunc   ;==>_WstringToCodepage

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I thought about it a bit more and came up with the idea of using one and the same buffer for the strings. So to let OemToChar overwrite the input string directly.
This should halve the memory used for the strings and further increase performance:

Func _WinAPI_OemToChar($sStr)
    Local $tString = DllStructCreate("CHAR[" & StringLen($sStr) + 1 & "]")
    DllStructSetData($tString, 1, $sStr)

    Local $aCall = DllCall("user32.dll", 'BOOL', 'OemToCharA', "struct*", $tString, "struct*", $tString)
    If @error Or Not $aCall[0] Then Return SetError(@error + 10, @extended, '')
        
    Return DllStructGetData($tString, 1)
EndFunc   ;==>_WinAPI_OemToChar

Does anyone see a serious problem with this approach?

Link to comment
Share on other sites

OEM and default system/user codepage don't map 1-1, so some characters will be lost/emasculated.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Yes, size-wise. See how OEM and ANSI (both Latin 1) differ:

#include <Array.au3>
#include <WindowsConstants.au3>
#include <GUIConstantsEx.au3>
#include <EditConstants.au3>
#include <GuiEdit.au3>
#include <FontConstants.au3>

Local $a[17][3] = [["      OEM Latin 1 (DOS)", "    ANSI Latin 1 (Windows)", "        Unicode"]]

Local $aTabooCharsANSI = [0x81, 0x8D, 0x8F, 0x90, 0x9D]
Local $s, $t, $u, $v

For $i = 0 To 15
    For $j = 0 To 15
        $v = $i * 16 + $j
        If ($v < 0x20) Or ($v = 0x7F) Then
            $s &=  "⬚ "
            $t &=  "⬚ "
            $u &=  "⬚ "
        Else
            $s &= _CodepageToWstring(Chr($v), 850) & " "    ; OEM Multilingual Latin 1 ; occidental Europe (DOS)
            $t &= (_ArrayBinarySearch($aTabooCharsANSI, $v) > -1 ? "⬚" : _CodepageToWstring(Chr($v), 1252)) & " "     ; ANSI Latin 1 ; occidental Europe (Windows)
            $u &= ((($v > 0x7F) And ($v < 0xA0)) ? "⬚" : ChrW($i * 16 + $j)) & " "                            ; Unicode
        EndIf
    Next
    $a[$i+1][0] = $s
    $a[$i+1][1] = $t
    $a[$i+1][2] = $u
    $s = ""
    $t = ""
    $u = ""
Next


GUICreate("Codepages diff example", 936, 400)
GUISetFont(11, $FW_NORMAL, $GUI_FONTNORMAL, "DejaVu Sans Mono") ; else choose a Unicode fixed font, e.g. Lucida Sans TypeWriter (dirty)
Local $idEdit = GUICtrlCreateEdit(@CRLF, 20, 20, 886, 380)
GUISetState(@SW_SHOW)
For $i = 0 To 16
    GUICtrlSetData($idEdit, StringFormat("  %-32s%4s%-32s%4s%-32s\r\n", $a[$i][0], "", $a[$i][1], "", $a[$i][2]), 1)
Next
GUICtrlSetData($idEdit, @CRLF & "⬚ represents a control or unassigned character", 1)

_GUICtrlEdit_SetReadOnly($idEdit, True)

While 1
    If GUIGetMsg() = $GUI_EVENT_CLOSE Then ExitLoop
WEnd
GUIDelete()

Func _CodepageToWstring($sCP, $iCodepage = Default)
    If $iCodepage = Default Then $iCodepage = 65001
    Local $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _
            "ptr", 0, "int", 0)
    Local $tWstr = DllStructCreate("wchar[" & $aResult[0] & "]")
    $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _
            "struct*", $tWstr, "int", $aResult[0])
    Return DllStructGetData($tWstr, 1)
EndFunc   ;==>_CodepageToWstring

 

Edited by jchd
Fixed and Unicode added

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Yes, of course they are different. That's why we need the OemToChar in the first place so that we can convert them.

I still don't understand what this has to do with my question, whether the same memory area can be used for the input and output buffer when calling the OemToChar?

In the meantime, however, I have been able to answer this question myself:
>>Microsoft itself<< says that this is not a problem and we can avoid an extra buffer for the output, as I suspected:

Quote

If the OemToChar function is being used as an ANSI function, the string can be translated in place by setting the lpszDst parameter to the same address as the lpszSrc parameter.

 

Link to comment
Share on other sites

I know it's OK to convert on place in the case of single-byte codepages, but convert isn't the right term, since codepages don't contain the same characters.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Yes, but that is a completely different question. Anyone using the OemToChar function must be aware of this.

Here in the thread, on the other hand, it was only about how this should best be implemented in AutoIt.

A different form of implementation does not change the fact that the code pages cannot be converted congruently for all characters.

This property is just as present in the previous implementation as it is in my proposal.

So again: This thread is exclusively about how the OemToChar function should best be called in AutoIt. It was simply not about the usefulness of the function, its purposes and its pitfalls.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...