AspirinJunkie Posted March 2, 2024 Posted March 2, 2024 Nothing major, but even small improvements can be put up for discussion: The function _WinAPI_OemToChar() is currently defined as follows: Func _WinAPI_OemToChar($sStr) Local $aCall, $sRetStr = "", $nLen = StringLen($sStr) + 1, $iStart = 1 While $iStart < $nLen $aCall = DllCall('user32.dll', 'bool', 'OemToCharA', 'str', StringMid($sStr, $iStart, 65536), 'str', '') If @error Or Not $aCall[0] Then Return SetError(@error + 10, @extended, '') $sRetStr &= $aCall[2] $iStart += 65536 WEnd Return $sRetStr EndFunc Now - why the loop? Shouldn't one call be enough? The problem is that the output buffer in the 2nd function parameter of OemToCharA was defined as an empty string of the type "str". Here AutoIt implicitly creates a buffer of the size 65536 - regardless of whether the target string is only 1 character long or 1 million. In the first case, you have only wasted memory. In the second case, however, we have a buffer overflow and AutoIt would therefore crash if we wanted to convert a string with 65536 characters. The problem was therefore tackled in the current implementation by splitting the string into chunks of 65536 characters, converting them individually and then reassembling them. My suggestion is as follows: The output buffer can also be defined directly with the correct size. This way we don't waste memory with small strings and with large strings only one DllCall is necessary. In addition to saving memory, this is actually also a few percent faster: expandcollapse popup#include <WinAPIConv.au3> #include <String.au3> Global Const $aN = [1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7] Global Const $iRUNS = 100 Global $iT, $iT1, $iT2 Global $sMeasureFormat = "%15.3f ms", $dFormLen = StringLen(StringFormat($sMeasureFormat, 0)) Global $sHeader = StringFormat("\n% 11s\t% " & $dFormLen & "s\t% " & $dFormLen & "s% 13s\n", "String size", "_WinAPI_OemToChar", "_WinAPI_OemToChar2", "Speedup") ConsoleWrite($sHeader & _StringRepeat("-", 64)) For $iN In $aN ; put some preparation stuff here Global $sString = _StringRepeat("ß", $iN) ConsoleWrite(StringFormat("\n% 11d", StringLen($sString))) ; the original _WinAPI_OemToChar() $iT = TimerInit() For $i = 1 To $iRUNS _WinAPI_OemToChar($sString) Next $iT1 = TimerDiff($iT) ConsoleWrite(StringFormat("\t" & $sMeasureFormat, $iT1 / $iRUNS)) ; the modified _WinAPI_OemToChar $iT = TimerInit() For $i = 1 To $iRUNS _WinAPI_OemToChar2($sString) Next $iT2 = TimerDiff($iT) ConsoleWrite(StringFormat("\t" & $sMeasureFormat, $iT2 / $iRUNS)) ConsoleWrite(StringFormat("\t%10.1f %%", (1 - $iT2 / $iT1) * 100)) Next ConsoleWrite(@CRLF & @CRLF) Func _WinAPI_OemToChar2($sStr) ; input string Local $tIn = DllStructCreate("CHAR [" & StringLen($sStr) + 1 & "]") DllStructSetData($tIn, 1, $sStr) ; output buffer Local $tOut = DllStructCreate("CHAR [" & StringLen($sStr) + 1 & "]") Local $aCall = DllCall("user32.dll", 'BOOL', 'OemToCharA', 'PTR', DllStructGetPtr($tIn), 'PTR', DllStructGetPtr($tOut)) If @error Or Not $aCall[0] Then Return SetError(@error + 10, @extended, '') Return DllStructGetData($tOut, 1) EndFunc ;==>_WinAPI_OemToChar2 This as a small suggestion for discussion or if no arguments against it are found adoption in later AutoIt versions. SmOke_N 1
Andreik Posted March 2, 2024 Posted March 2, 2024 Where does it come this buffer limitation in the current implementation anyway? It's just because someone insisted to pass data as an ANSI string instead of passing a pointer to a buffer where the data it's located? Your implementation looks like the way it should be.
argumentum Posted March 2, 2024 Posted March 2, 2024 2 hours ago, AspirinJunkie said: .. if no arguments against it ... Open a trac ticket. Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
AspirinJunkie Posted March 2, 2024 Author Posted March 2, 2024 39 minutes ago, Andreik said: Where does it come this buffer limitation in the current implementation anyway? It is not so much a limitation as a choice with trade-offs. Strings in C interfaces are pointers to fixed-size char arrays. If a function that you want to call with DllCall expects a corresponding string as a parameter, you must pass it a corresponding string buffer. However, this also means that every time you pass a string, you have to create a buffer of the appropriate size, fill it with the string and then pass it to the pointer to the DllCall function. To make this a little easier, there is a shortcut in the DllCall function: "str" and "wstr". These data types do most of the work. In the variant where a string is passed to the function, this would be simple: DllCall could simply create a buffer of the corresponding size of the string and it's done. However, this is no longer feasible with the variant where the output buffer is also to be designed so simply. Because: When the function is called, it is not yet clear how large the buffer must be so that the data to be written fits into it. For this reason, a fixed size had to be specified and 65536 characters were chosen as a trade-off. This works for many cases and simplifies the handling with DllCall immensely. But there are also cases like this one, where you have to create the buffer yourself. I can't really blame the AutoIt devs for this - that's ok. 41 minutes ago, argumentum said: Open a trac ticket. Yes, that's what I would have done next. To keep the tracker clean, I wanted to discuss the whole thing here first. It's not really unlikely that I'm overlooking some aspect of this and that there were specific reasons why it was designed this way. But since you have responded to this and have not yet added such an aspect, there is a chance that I am not so wrong with my thoughts on this. Apart from that, it's nothing really big or serious. I just happened to stumble across it again today. But even small improvements should not be ignored. argumentum 1
Andreik Posted March 2, 2024 Posted March 2, 2024 5 minutes ago, AspirinJunkie said: I can't really blame the AutoIt devs for this - that's ok. It's not about blaming people but looping doesn't really looks the proper way. Have you considered a hybrid solution like checking the string size and pass data as str or ptr based on the length?
AspirinJunkie Posted March 2, 2024 Author Posted March 2, 2024 A hybrid variant would of course be possible, but I don't see any advantage that would result from this. The advantage of the str variant is to save code and make it clearer. However, we need the manual string buffer anyway, so a hybrid variant would inflate the code function without any further advantage (at least none that I can see).
Andreik Posted March 2, 2024 Posted March 2, 2024 Just aesthetics. It looks better than a loop like in the actual implementation but no further advantage over your proposed implementation.
AspirinJunkie Posted March 2, 2024 Author Posted March 2, 2024 Btw: I did not manage to get the variant with OemToCharW instead of OemToCharA to work. So with WChar as data type for the output. Various characters are destroyed - line breaks etc. Maybe someone has a solution or explanation.
jchd Posted March 2, 2024 Posted March 2, 2024 MultiByteToWideChar and WideCharToMultiByte don't have any issue. AspirinJunkie 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
jchd Posted March 3, 2024 Posted March 3, 2024 Func _CodepageToWstring($sCP, $iCodepage = Default) If $iCodepage = Default Then $iCodepage = 65001 Local $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _ "ptr", 0, "int", 0) Local $tWstr = DllStructCreate("wchar[" & $aResult[0] & "]") $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _ "struct*", $tWstr, "int", $aResult[0]) Return DllStructGetData($tWstr, 1) EndFunc ;==>_CodepageToWstring Func _WstringToCodepage($sWstr, $iCodepage = Default) If $iCodepage = Default Then $iCodepage = 65001 Local $aResult = DllCall("kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sWstr, "int", StringLen($sWstr), _ "ptr", 0, "int", 0, "ptr", 0, "ptr", 0) Local $tCP = DllStructCreate("char[" & $aResult[0] & "]") $aResult = DllCall("Kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sWstr, "int", StringLen($sWstr), _ "struct*", $tCP, "int", $aResult[0], "ptr", 0, "ptr", 0) Return DllStructGetData($tCP, 1) EndFunc ;==>_WstringToCodepage This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
AspirinJunkie Posted March 3, 2024 Author Posted March 3, 2024 I thought about it a bit more and came up with the idea of using one and the same buffer for the strings. So to let OemToChar overwrite the input string directly. This should halve the memory used for the strings and further increase performance: Func _WinAPI_OemToChar($sStr) Local $tString = DllStructCreate("CHAR[" & StringLen($sStr) + 1 & "]") DllStructSetData($tString, 1, $sStr) Local $aCall = DllCall("user32.dll", 'BOOL', 'OemToCharA', "struct*", $tString, "struct*", $tString) If @error Or Not $aCall[0] Then Return SetError(@error + 10, @extended, '') Return DllStructGetData($tString, 1) EndFunc ;==>_WinAPI_OemToChar Does anyone see a serious problem with this approach?
jchd Posted March 3, 2024 Posted March 3, 2024 OEM and default system/user codepage don't map 1-1, so some characters will be lost/emasculated. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
AspirinJunkie Posted March 3, 2024 Author Posted March 3, 2024 Concrete example of this problem? As I understand it, the characters in both the input and the output are exactly 1 byte in size. A direct replacement should therefore work.
jchd Posted March 3, 2024 Posted March 3, 2024 (edited) Yes, size-wise. See how OEM and ANSI (both Latin 1) differ: expandcollapse popup#include <Array.au3> #include <WindowsConstants.au3> #include <GUIConstantsEx.au3> #include <EditConstants.au3> #include <GuiEdit.au3> #include <FontConstants.au3> Local $a[17][3] = [[" OEM Latin 1 (DOS)", " ANSI Latin 1 (Windows)", " Unicode"]] Local $aTabooCharsANSI = [0x81, 0x8D, 0x8F, 0x90, 0x9D] Local $s, $t, $u, $v For $i = 0 To 15 For $j = 0 To 15 $v = $i * 16 + $j If ($v < 0x20) Or ($v = 0x7F) Then $s &= "⬚ " $t &= "⬚ " $u &= "⬚ " Else $s &= _CodepageToWstring(Chr($v), 850) & " " ; OEM Multilingual Latin 1 ; occidental Europe (DOS) $t &= (_ArrayBinarySearch($aTabooCharsANSI, $v) > -1 ? "⬚" : _CodepageToWstring(Chr($v), 1252)) & " " ; ANSI Latin 1 ; occidental Europe (Windows) $u &= ((($v > 0x7F) And ($v < 0xA0)) ? "⬚" : ChrW($i * 16 + $j)) & " " ; Unicode EndIf Next $a[$i+1][0] = $s $a[$i+1][1] = $t $a[$i+1][2] = $u $s = "" $t = "" $u = "" Next GUICreate("Codepages diff example", 936, 400) GUISetFont(11, $FW_NORMAL, $GUI_FONTNORMAL, "DejaVu Sans Mono") ; else choose a Unicode fixed font, e.g. Lucida Sans TypeWriter (dirty) Local $idEdit = GUICtrlCreateEdit(@CRLF, 20, 20, 886, 380) GUISetState(@SW_SHOW) For $i = 0 To 16 GUICtrlSetData($idEdit, StringFormat(" %-32s%4s%-32s%4s%-32s\r\n", $a[$i][0], "", $a[$i][1], "", $a[$i][2]), 1) Next GUICtrlSetData($idEdit, @CRLF & "⬚ represents a control or unassigned character", 1) _GUICtrlEdit_SetReadOnly($idEdit, True) While 1 If GUIGetMsg() = $GUI_EVENT_CLOSE Then ExitLoop WEnd GUIDelete() Func _CodepageToWstring($sCP, $iCodepage = Default) If $iCodepage = Default Then $iCodepage = 65001 Local $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _ "ptr", 0, "int", 0) Local $tWstr = DllStructCreate("wchar[" & $aResult[0] & "]") $aResult = DllCall("kernel32.dll", "int", "MultiByteToWideChar", "uint", $iCodepage, "dword", 0, "str", $sCP, "int", StringLen($sCP), _ "struct*", $tWstr, "int", $aResult[0]) Return DllStructGetData($tWstr, 1) EndFunc ;==>_CodepageToWstring Edited March 3, 2024 by jchd Fixed and Unicode added This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
AspirinJunkie Posted March 3, 2024 Author Posted March 3, 2024 Yes, of course they are different. That's why we need the OemToChar in the first place so that we can convert them. I still don't understand what this has to do with my question, whether the same memory area can be used for the input and output buffer when calling the OemToChar? In the meantime, however, I have been able to answer this question myself: >>Microsoft itself<< says that this is not a problem and we can avoid an extra buffer for the output, as I suspected: Quote If the OemToChar function is being used as an ANSI function, the string can be translated in place by setting the lpszDst parameter to the same address as the lpszSrc parameter.
jchd Posted March 3, 2024 Posted March 3, 2024 I know it's OK to convert on place in the case of single-byte codepages, but convert isn't the right term, since codepages don't contain the same characters. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
AspirinJunkie Posted March 3, 2024 Author Posted March 3, 2024 Yes, but that is a completely different question. Anyone using the OemToChar function must be aware of this. Here in the thread, on the other hand, it was only about how this should best be implemented in AutoIt. A different form of implementation does not change the fact that the code pages cannot be converted congruently for all characters. This property is just as present in the previous implementation as it is in my proposal. So again: This thread is exclusively about how the OemToChar function should best be called in AutoIt. It was simply not about the usefulness of the function, its purposes and its pitfalls.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now