Sign in to follow this  
Followers 0
Ascend4nt

Fix for unreadable '?' text from WinGetText, ControlGetText

7 posts in this topic

#1 ·  Posted (edited)

_WinGetTextFixed & _StringFixANSIInWide

 

Update: Text problems no longer exist as of v3.3.9.11

 

Sick of getting rows of '?' from WinGetText, ControlGetText, and other AutoIt functions? The problem lies in the way the text is encoded - AutoIt 'sees' a Windows wide string (UTF-16), but in reality it grabs a string of 8-bit ANSI characters and shoves them into a 16-bit wide string. Here's a small example:

Original string "x4" with null-term as an ANSI string sequence:

0x78 0x34 0x00

Same string interpreted as Unicode (reversal is due to little-endian encoding):

0x3478 0x00

The latter string produces the Unicode character "㑸" (hopefully that shows up properly as an asian character)

So, as you can see, from the software side, "x4" compacted into a UTF-16 string still produces an acceptable string, even though it wasn't intended to be an asian character.

Now, where the '?'s show up is during conversion of the same string back to ANSI. You can see the result of this by doing a ConsoleWrite(), which itself causes an implicit Wide-ANSI conversion to occur. Also, pasting text into another text editor that's using an ANSI code page will give the same results. (A workaround for the conversion is to set the encoding to Unicode)

Of course, even if you can display the wide characters correctly, it's not the intended result - you want what the window in question produces. Plus, there are many invalid UTF-16 code points that can be produced, and certain values will be interpreted as a pair of UTF-16 code points.. which just gets messy.

So how do we fix this?

The easiest workaround, when the format of the string is known to be ANSI beforehand is:

$sStr = BinaryToString(StringToBinary($sStr, 2), 1)

However, determining whether a window or control will give you ANSI or Wide characters is another issue.. I had written the _StringFixANSIInWide function below to try and detect ANSI-in-Wide situations, and return the corrected string.. Unfortunately its a bit naive in its implementation (see update next paragraph), but it works okay for most situations. I can see it failing only when a Wide string contains nothing but >8-bit characters (usually there is a lot of 7-bit ASCII characters encoded in the lower part of true Wide strings). In theory that is an extremely unlikely situation (unless there are no ASCII characters in a string).

Update - _WinGetTextFixed() alternative:

Now I've identified the core issue is with the call to SendMessage (using WM_GETTEXT): The function will return either ANSI or Unicode (wide) characters, and its up to our code to determine, based on the length, whether it is an ANSI or wide string. We can then return the correct string. See _WinGetTextFixed() below.

Note that AutoIt's WinGetText() function returns text for more than just the window - it returns text for various controls also. While this can be beneficial, it also introduces issues when different controls return Wide or ANSI characters. The result is basically a contaminated string, meaning it packs ANSI characters into UTF-16 sequences and includes legitimate Wide characters in the same string. This is unacceptable and really hard to work with. That's a reason there's now a 'force conversion' parameter in _StringFixANSIInWide. I would really really emphasize that it's best to not use WinGetText() and instead opt for _WinGetTextFixed instead.

I may suggest this behavior be fixed in the AutoIt source code, but it looks like all the Devs are gone :o

Anyway, enjoy!

Example: Notepad 2

$hWnd = WinGetHandle("[CLASS:Notepad2]")
$hControl = ControlGetHandle($hWnd,"","[CLASS:Scintilla]")
; Previous fix:
;~ $sText = _StringFixANSIInWide(ControlGetText($hWnd,"",$hControl))
$sText = _WinGetTextFixed($hControl)
ConsoleWrite("Text:"&$sText&@CRLF)

Example: SciTE

$hWnd = WinGetHandle("[CLASS:SciTEWindow]")
$hControl=ControlGetHandle($hWnd,"","[CLASS:Scintilla; INSTANCE:1]")
; Previous fix:
;~ $sText = _StringFixANSIInWide(ControlGetText($hWnd,"",$hControl))
$sText = _WinGetTextFixed($hControl)
ConsoleWrite("Text:"&$sText&@CRLF)

Example: Programmer's Notepad 2

$hWnd = WinGetHandle("[REGEXPTITLE:Programmer's Notepad]")
$hControl = ControlGetHandle($hWnd, "", "[CLASS:ScintillaWindowImpl; INSTANCE:1]")
; Previous fix:
;~ $sText = _StringFixANSIInWide(ControlGetText($hWnd,"",$hControl))
$sText = _WinGetTextFixed($hControl)

ConsoleWrite("Text:"&$sText&@CRLF)
; ===================================================================================================
; Func Func _WinGetTextFixed($hWnd)
;
; Function to get Text of a window or control
;  This is an alternative to AutoIt's 'WinGetTitle', 'WinGetText', and 'ControlGetText',
;  which have issues with reading ANSI text from some windows
;
; Author: Ascend4nt
; ===================================================================================================

Func _WinGetTextFixed($hWnd)
    If Not IsHWnd($hWnd) Then Return SetError(1,0,'')

    Local $aRet, $stWideText, $stANSIText, $sText
    Local $nGetTextLen, $nHalfLen

    ; WM_GETTEXTLENGTH 0x0E
    $aRet = DllCall("user32.dll", "long", "SendMessageW", "hwnd", $hWnd, "uint", 0x0E, "wparam", 0, "lparam", 0)
    If @error Then Return SetError(2, @error, '')
    If Not $aRet[0] Then Return SetError(3, 0, '')

    $nGetTextLen = $aRet[0]

;~     ConsoleWrite("WM_GETTEXTLENGTH return:"&$nGetTextLen&@CRLF)

    ; Create a union structure, add 2 characters - 1 for null-term, 1 to handle odd-count cases
    $stWideText = DllStructCreate("wchar["&$nGetTextLen + 2&"]")
    If @error Then Return SetError(4, 0, '')
    $stANSIText = DllStructCreate("char["&($nGetTextLen+2)*2&"]", DllStructGetPtr($stWideText))

    ; WM_GETTEXT
    $aRet = DllCall("user32.dll", "long", "SendMessageW", "hwnd", $hWnd, "uint", 0x0D, "wparam", $nGetTextLen + 1, "ptr", DllStructGetPtr($stWideText))
    If @error Then Return SetError(2, @error, '')
    If Not $aRet[0] Then Return SetError(3, 0, '')

    $nGetTextLen = $aRet[0]

    ; Get text as WIDE characters 1st
    $sText = DllStructGetData($stWideText, 1)

;~     ConsoleWrite("$nGetTextLen = "&$nGetTextLen&", $nHalfLen = "&$nHalfLen&", StringLen() = "&StringLen($sText)&@CRLF)

    ; Determine if the wide string length is half the supposed returned text length
    ; - If so, it's an ANSI string
    $nHalfLen = ($nGetTextLen + BitAND($nGetTextLen, 1) ) / 2
    If (StringLen($sText) - $nHalfLen < 2) Then
        ; Retrieve text correctly as ANSI
        $sText = DllStructGetData($stANSIText, 1)
    EndIf

    Return $sText
EndFunc
; ======================================================================================================
; Func _StringFixANSIInWide($sStr, $bForceCnvt = False)
;
; Function to fix a common issue where ANSI text is embedded in UTF-16 strings
; Problem occurs in 'WinGetText', 'ControlGetText', 'WinGetTitle'
; and some COM functions using 'bstr' types
;
; Easiest method, when you *know* the text is ANSI:
; BinaryToString(StringToBinary($sStr, 2), 1)
;
; *However*, if it is unknown what the string holds, we need to look
; for null characters (0's) in the string
;
; Alternatives:'WideCharToMultiByte' API call, which does the same replacements as below
; However, on Vista+, WC_ERR_INVALID_CHARS can be used to error-out on illegal characters
;
; Author: Ascend4nt
; ======================================================================================================

Func _StringFixANSIInWide($sStr, $bForceCnvt = False)
    If $sStr = '' Then Return ''
    Local $nLen, $stStrVer, $stBinVer, $sTmp, $nReplacements

    ; This fails to work in many mixed-ANSI/UTF-16 scenarios (as seen in WinGetText):
;~ If $bForceCnvt Then
;~ Return BinaryToString(StringToBinary($sStr, 2), 1)
;~ EndIf

    $nLen = StringLen($sStr)

    ; Struct for string (+1 for null-term)
    $stStrVer = DllStructCreate("wchar [" & $nLen + 1 & "]")
    ; Create a union, granting us binary 1-byte access to the wide chars
    $stBinVer = DllStructCreate("byte [" & $nLen * 2 & "]", DllStructGetPtr($stStrVer))

    ; Set String in structure
    DllStructSetData($stStrVer, 1, $sStr)

    ; Load string as binary data, convert to ANSI string
    ; AND Replace 0's with 0xFFFD (the Unicode 'REPLACEMENT CHARACTER')
    $sTmp = StringReplace(BinaryToString(DllStructGetData($stBinVer, 1)), ChrW(0), ChrW(0xFFFD), 0, 2)
    $nReplacements = @extended

    ; Trim off null-terminator and any other trailing 0's at the end (all converted to 0xFFFD's)
    While (StringRight($sTmp, 1) = ChrW(0xFFFD))
        $sTmp = StringTrimRight($sTmp, 1)
        $nReplacements -= 1
    WEnd

    ; If no replacements remaining, then every byte contains data, so its a safe bet the string is ANSI
    ; Also, in mixed-ANSI/UTF-16 situations (sometimes seen in WinGetText), allow a force
    If ($nReplacements = 0 Or $bForceCnvt) Then
        Return $sTmp
        ; Same result as:
        ;Return BinaryToString(StringToBinary($sStr, 2), 1)
    Else
        Return $sStr
    EndIf
EndFunc   ;==>_StringFixANSIInWide

*updates:

- Added a 'force' parameter for scenarios where WinGetText() will return a mix of ANSI and Unicode text in the same string. The result will contain some '?'s in these scenarios, but there's really nothing you can do without modifying the AutoIt source code.

update: problem no longer exists as of v3.3.9.11 (see BugTracker ticket # 2362)

Edited by Ascend4nt
1 person likes this

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

One of those things I've always noticed and wondered about, but never enough to really ask why :)! Great function, definitely a keeper for the snippet box, much appreciated m8 :)... Despite everything else, this imho might still be worth a bug ticket.

Edited by KaFu

Share this post


Link to post
Share on other sites

Actually, as it turns out, I was a bit wrong in assuming the '?'s were invalid characters. I was able to see what the output really looked like by changing the encoding in Notepad2 and doing a paste..

So basically, a lot of the 2-byte ANSI characters are perfectly valid UTF-16 combinations, and they generally show up as Eastern language characters in a Unicode viewer. The '?'s are what results when converting from those eastern language codes to an ANSI version, which is what you see in most text editors in their default locale settings (and in SciTE's output).

Soo now I'm thinking MultiByteToWideChar may be a better option.. and just 'fool' AutoIt into thinking it is accepting a "wstr" as the 3rd parameter in the DLLCall. The issue I'm a bit uncertain about is deciding what to put as the Locale info.. I'd need to find out the encoding the given window uses - which in most cases would be the system (or user?) code page. Hmm.. I'll have to see if there's a Windows API function for detecting the encoding of a window. If anyone knows it, let me know (otherwise I'll be poking around the API for a bit)

Also, another problem is that MultiByteToWideChar will either drop illegal code points or fail completely (depending on the MB_ERR_INVALID_CHARS flag.. Oh, and of course any mixed ANSI/Wide strings would give some funky results for the Wide string portion.. which is why I'm leaning towards telling people just to avoid WinGetText on problem windows, since you're basically getting a contaminated string by doing that.

Hmm.. for now I suppose I'll leave the function intact, as it works relatively well for Controls, but there's still a minimal chance that embedded ANSI can produce a completely valid UTF-16 string..

Share this post


Link to post
Share on other sites

Maybe it's related to the code page and language defined in the resources?

FileGetVersion($sFile, "DefaultLangCodepage")

From Help-File for FileGetVersion():

"Another special stringname is "DefaultLangCodepage" can be used to retrieve the default language and codepage.

The language and codepage can be used if needed to differentiate the "stringname" i.e. "080904b0Comments" (see MSDN StringFileInfo in VerQueryValue function)."

Share this post


Link to post
Share on other sites

Thanks for the idea KaFu, but I don't know that that would work well with multilanguage apps or text editors that let you change the encoding..

However, I've identified the problem - its the call to SendMessage (with WM_GETTEXT). It returns either an ANSI or Wide string, and the length of the returned string. I've found its pretty easy once we know the string length - we can just compare the wide string length to this and determine if it should be read as ANSI if its approximately half that returned length.

I've updated the first post with the new _WinGetTextFixed() function.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Edit: I'll try some debugging with those examples.  I see that I made some notes in the source about scite being bugged and returning weird stuff.

Edited by Jon

Share this post


Link to post
Share on other sites

Hopefully this is all fixed in 3.3.9.11.

I've fixed my window handling class so that all calls for WM_GETTEXT use it (there were a few places I'd missed).  I've checked WinGetText() and ControlGetText() with it and Scite.

Same fixes for Au3Info too.

2 people like this

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • houser747
      By houser747
      I have previously used _IEFormElementGetObjByName and _IEFormElementSetValue to enter text into a search box on a form and then submit the form.
      I am now trying to enter text into a search box which is not part of a form. 
      Here is the HTML from the website that i'm trying to enter the data on and then submit the search.
      <div class="row">
          <div class="form-group col-xs-12">
              <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_lblSearchText" for="input-search">Registreringsbeteckning</span>
              <div class="input-group col-xs-12">
                  <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_preSearchText" class="input-group-addon">SE -</span>
                  <input name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$txtSearchText" type="text" value="DTH" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_txtSearchText" class="form-control" />
              </div>
          </div>
      </div>
      <div class="row">
          <div class="form-group col-xs-12">
              <label class="sr-only" for="">Sök</label>
              <input type="submit" name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$btnSearch" value="Sök" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_btnSearch" class="btn btn-primary ladda-button" data-style="expand-right" />
          </div>
      </div>
      Many thanks in advance
      cheers
      Roger
    • cheeroke
      By cheeroke
      Hi all,
      I got this code and would like to be able to change Baud Rate and instead of sending character by character i would like to be able (if possible) to send whole string. But i don't know how to change it.
      I am taking input from file and processing whole line (this is done in FilesHandling.au3).
      To execute this i am just calling SendData("FileName", int) in "main" script.
      Any help very appreciated.
      #include <WinAPI.au3> #include <Array.au3> #include "FilesHandling.au3" ;init DLL function, we need handle to call the function $h = DllCall("Kernel32.dll", "hwnd", "CreateFile", "str", "\\.\COM19", "int", BitOR($GENERIC_READ,$GENERIC_WRITE), "int", 0, "ptr", 0, "int", $OPEN_EXISTING, "int", $FILE_ATTRIBUTE_NORMAL, "int", 0) $handle=$h[0] Func SendData($FileName, $LineNumber) ;string to be send $c = readFile($FileName, $LineNumber) $cLenght = StringLen($c) $aArray = StringSplit($c, "") ;_ArrayDisplay($aArray, "", Default, 64) For $i = 1 To $cLenght writeChar($handle, $aArray[$i], $cLenght) Next ;move to next line writeChar($handle, @CR,1) EndFunc ;write a single char func writeChar($handle,$c,) $stString = DLLStructCreate("char str") $lpNumberOfBytesWritten = 0 DllStructSetData($stString, 1, $c) $res = _WinAPI_WriteFile($handle, DllStructGetPtr($stString, "str"), 1,$lpNumberOfBytesWritten) if ($res<>true) then ConsoleWrite ( _WinAPI_GetLastErrorMessage() & @LF) EndIf EndFunc  
    • FroVN
      By FroVN
      Hi, i have a problem :" can't set the name of file with a special character like: \;/;";|;...  have anyway to short the StringInSrt and Stringreplace? i am using this code but too long
      $title=InputBox(0,'','','')
         if StringInStr($title,'\') or StringInStr($title,'/') or StringInStr($title,':') or StringInStr($title,'*') or StringInStr($title,'?') or StringInStr($title,'"') or StringInStr($title,'<') or StringInStr($title,'>') or StringInStr($title,'|') Then
             $title=StringReplace($title,'\','-')
              $title=StringReplace($title,'/','-')
               $title=StringReplace($title,':','-')
                $title=StringReplace($title,'*','-')
                 $title=StringReplace($title,'?','-')
                  $title=StringReplace($title,'"','-')
                   $title=StringReplace($title,'<','-')
                    $title=StringReplace($title,'>','-')
                     $title=StringReplace($title,'|','-')
         EndIf
       
    • robcull
      By robcull
      Hello all! I have had some issues reading text from different types of windows, occasionally, specifically with controlgettext. 
      **Before I begin, I know there are better ways to do what I attempt in the example below. That's not the point of this post. The point is my issues with controlgettext. 
      I am about to cite an example with an application you may be familiar with called SpeedFan (v4.52). My problem is not specific to speedfan, it is simply the most recent and easily reproducible example I can think of. 
      So, the goal of the script below is to get a string of text containing the current fan RPMs from the highlighted control in the screenshot below (see "speedfan_control_details.png").

      Now, here's a simple script for grabbing the window handle and reading the text from that control: 
      $wintitle = "SpeedFan 4.52" $controlID = "197934" ;will be reformatted as "[ID:######]" $hwnd = wingethandle($wintitle) if @error<>0 then msgbox(0, "WinGetHandle", "FAILURE. @error="&@error) Exit EndIf $text = ControlGetText($hwnd, "", "[ID:"&$controlID&"]") if @error=1 then msgbox(0, "ControlGetText", "FAILURE. @error="&@error) ;failure returns "" and @error=1 Exit EndIf msgbox (0, "ControlGetText", "SUCCESS. @error="&@error &@CRLF& "$text="&$text) ;success returns string and @error=0 You'll see that the ControlGetText operation runs without error, however it does not capture any text from the control. If you explore the other controls in this one window, you'll find mixed results across the board. Neither the temps nor voltages can be read, while the log field and some other elements can be read. Even when you read the text from the whole window, those elements are not included in the visible nor hidden texts. 
       
      I have run into this issue many times in the past- inconsistencies in the ability of autoit to interact with certain controls. What is it which makes this text different than any other readable texts? Is there an alternate method of reading the text in the window/control which could work? Any and all info to help me solve this mystery and satisfy my curiosity would be greatly appreciated. 
      Thanks  -Rob C
      PS: Running Autoit v3.3.14.2 on Win7 Ultimate x64
    • thoms
      By thoms
      Hello Forum,
      I'm trying to vertically center text in label controls, but no way. The search on the forum returns no result, or I don't search the right way
      When I insert a button and a label of same size close to each other, the text is centered on the button, but top aligned on the label, or edit. Which doesn't look really aesthetic
      Any idea is welcome
      Thanks in advance,
      Thoms