Jump to content

capturing text on "ill behaved" screens or "controls"


joseLB
 Share

Recommended Posts

Hi

At ill behaved controls, I mean, controls that seems to be controls but are not really controls or whatever,Au3Info gets at best title, class, ID, etc., but is not able to get the text in the control.

This happens more frequently than desirable in real life. Two examples: windows explorer or any browser, like Chrome. On my specific case, a vb program that I have no source code, but au3info reports ID, class, etc.

Note that in a browser, like Chrome, if I select with mouse an area of screen and copy, I can paste it in notepad.... So, seems to me that standard windows functions can see/read the text that we are seeing at screen.

Question: after many uncessfull searches at forum, I could not find a way to get that kind of text, and I´m quite sure that long time ago I read a post about reading any text that is "visible" at a screen. There is a way, even if some garbage come along with text?

I devise a not elegant and unreliable way to do it: (ex: chrome) -> 1-move mouse to about the beginning of the area you want. 2-make left button down. 3-move mouse to about the final area you want. 3-send <ctrlC> 4- you have the text at clipboard...

There is a better way?

There is a way to read more complex controls, like windows explorer (<ctr>C would not work on this case and many others.

Thanks

Jose

Link to comment
Share on other sites

How about ctrl-a? It highlights every avalible test.

try bringing the window in focus, and then send ctrl-a, and then ctrl-c.

edit: i was thinking to narrow. try using the ControlGetText function; look for it in the help file.

Edited by mischieftoo
Link to comment
Share on other sites

jose,

Is your question ONLY related to text in browsers?

There are Java (NOT Javascript!) programs, which prevent reading of the screen and the only way to copy the text for further processing is OCR.

But normally, you can't even copy such text by mouse-marking ...

Edit: ok, not the ONLY way

here is a thread about getting to the guts of Java content with AutoIT

Edited by guwguw
Link to comment
Share on other sites

@mischieftoo

@guwguw

Thanks for the answers.

In fact this is a general question about how to read the most possible text in any "ill behaved" screens, including browers, programs, etc.

So, if the page (in case of browsers) has javascripts, this is just another detail....

But my goal is in general programs (also includes browsers), as most real life programs are hard to automate due to dificulty to read what we see at the screen. So, I imagined that a general thread about how to get text on these screens should be of general interest. I´m quite sure this is a must for much or us that uses AutoIt.

Specifically, at this moment I have a program developed in VB on this situation, where the screen that follows has a list of items that I need read for automation (at a speech recognition/synthesis system). At this case, neither drag, ctrl A, CTRLC or whatever works....

Posted Image

>>>> Window <<<<

Title: Sugestão de pesquisas

Class: ThunderRT6FormDC

Position: 56, 252

Size: 417, 259

Style: 0x16C80000

ExStyle: 0x00000100

Handle: 0x00160A7C

>>>> Control <<<<

Class: ThunderRT6PictureBoxDC

Instance: 2

ClassnameNN: ThunderRT6PictureBoxDC2

Name:

Advanced (Class): [CLASS:ThunderRT6PictureBoxDC; INSTANCE:2]

ID: 4

Text:

Position: 8, 41

Size: 377, 228

ControlClick Coords: 282, 32

Style: 0x56010000

ExStyle: 0x00000004

Handle: 0x0008056E

>>>> Mouse <<<<

Position: 349, 350

Cursor ID: 0

Color: 0xFFFFFF

>>>> StatusBar <<<<

>>>> ToolsBar <<<<

>>>> Visible Text <<<<

Fechar

>>>> Hidden Text <<<<

Link to comment
Share on other sites

Jjust a guess/sugestion, if we can see the text at the monitor, probably the text resides somewhere.... so, some kind of capture of an area of the screen, in text mode (not image like _screencap..... udf), getting whatever text avaiable, is viable?

Jose

Link to comment
Share on other sites

Unfortunately what we see at the screen is not just 'text' even when it appears to be text. For instance, in VB applications of yore you might come across the old MSFlexGrid control displaying data in a spreadsheet-like layout. That data can only be programmatically interfaced with either inside of the compiled application itself or by subclassing the control with other interesting programming jiggery. Without getting a hand inside the active thread of the application with the FlexGrid your only other option is OCR. Documents displayed in browsers are also a lovely can of worms because of their existence as a rendered document object model. The text we get from control-c is the result of inner hoodoo the browsers perform and can vary from one browser to the next depending on how they choose to interpret the DOM into plain text (for instance the differences in the way Internet Explorer and Firefox will paste text from some tables). I've not worked with them before but assuming the ThunderRT6PictureBox is a VB6 PictureBox then (from MSDN) "The Visual Basic 6.0 PictureBox control is a container control; in addition to displaying pictures it can be used to group and display other controls..." so assuming also that the text visible in the box is not text rendered as an image but is a collected control which has its text stored inside itself then it may be as frustratingly difficult to interact with as the FlexGrid.

Link to comment
Share on other sites

For cases where we are testing an application we own, we end up putting testability code into the app, e.g. make it able to pop up a dialog containing the raw text in an edit box. If you don't 'own' the application, then it's OCR time, and my experience is that OCR is poor, especially for technical apps with weird technical non-dictionary mumbo-jumbo names.

Link to comment
Share on other sites

thanks both you very much.

So, in resume, it´s impossible to generically get text from screen.... :). And OCR is not a good option, first because it has no good quality at screend definition (tipically 200-300 dpi is needed ), and second it is complex to install, and normally it´s slow.

Just asking a little more, before give up..... I suppose that most of these controls, at the end, calls a win api or similar to send their text to video. I doubt that each control renders their image. So, probably the render is done in some kind of win internal routines. If yes, these routines could be intercepted?

Jose

Link to comment
Share on other sites

Hey, i'm having problem to access the text info. on selected area...

i've used controlgettext but it returning only first part say here for this screenshot (Selected 1 item (916.00 :)) .

I want to access all the text info. is there any method to do it?

post-62603-0-42902800-1327304085_thumb.p

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...