Jump to content

Screen Scrape or PixelCheckSum?


Recommended Posts

  • Moderators

(Forgive the NOoBnEsS please)

Ok, been a while since I've posted: I'm running into a problem with one of my scripts reading an application. It's a desktop application that I use every day. It has 2 boxes that seem to be like the ToolTip, but inside the GUI. A Control ID seems to be present but it's a tabbed GUI and it pulls up all the Tabs information in the window info. Showing as static and or control hidden on a mouseover it dissapears.

The information changes often: And I have the script 95% done, and for 3 days now I can't figure out what to do to get the value of the numbers (rates).

I've tried to get focus/gettext/and I don't know how the .dll works to pull from it, I even tried to do a variable, with all the pixels that make each number (that was quite tedious). I just can't read the text.

My thought was and my question, since I know nothing of screen scrapes at all.

I've seen programs that check the x,y,xy coords specifically pulling text that was static. These programs were written in other languages of course, but still the do that.

So would it be possible to do something like pixelchecksum to a specific variable (lock it so to speak) for each value? I was thinking if it could tell when it's changed than it must be locking the information somehow. Then maybe do a PixelSearch to check all the coords for those specific pixels. I don't understand the string functions, but something tells me if it is possible, you will have me learning them. :)

If not, is it possible to do a screen scrape at those coords, to see the text, (if so could you lead me in the right direction?)

Sorry, I don't have a script for an example. But nothing I've tried works.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • Moderators

Would it be possible to post a screenshot?  Blot out any numbers you don't want us to see, but leave enough so we have some idea what size box we are talking about...

<{POST_SNAPBACK}>

Yes, I have a meeting to go to that I'm already 5 minutes late for. But as soon as that's done most definately!!

Thanks

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • Moderators

Would it be possible to post a screenshot?  Blot out any numbers you don't want us to see, but leave enough so we have some idea what size box we are talking about...

<{POST_SNAPBACK}>

Ok, I marked the white boxes w/ a red rectangle, that would be the aboslute max it would have to read. As a matter of fact, I would be happy with half that.

Thanks

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Fixed-width rectangle is good, but I imagine the font is proportional instead fixed-width font :( Numbers might be the same width though :(

I used PixelChecksum in a similar situation, but I was lucky since the text could only be one of six possible values. (Once I obtained all the possible checksums, I just had to do a simple comparison.)

Ideas:

Compute the PixelChecksum for each digit (which *IMPORTANT* requires determining the width of each digit) .... for example:

(I'd use a program like mspaint.exe with 400% zoom factor to determine the pixel widths and heights of the characers.)

;Checksums for each digit; probably actually want to use an array

Global $d0 = 1193931446 ;replace with whatever the real value is

Global $d1 = 517404083

....

Global $d9 = 3836273605

Include decimal point and dollar sign, if applicable.

I really hope the text is right-or-left justified instead of centered.

For justified text, just start at the inside edge of the rectangle and compare against all your checksums. When you get a match, advance your checksum rectangle by the width of the matching digit.

For centered text, you might just have to advance your checksum region one pixel at a time until you get a match.

Hope this helps

Example assuming each digit has a pixel witdh of 10. Height of rectangle is 21.

;Coordinates relative to top-left corner of rectangle

Dim $WIDTH = 10, $HEIGHT = 21
Dim $x = 0, $y = 0
$Captured_Text = ""
For $column = 0 to 14
   For $d= 0 to 9
      If PixelCheckSum($x * $column, 0, $WITDH, $HEIGHT) = $digit[$d] Then
          $Captured_Text = $Captured_Text & String($d)
          ExitLoop
      EndIf
   Next
Next

$result = $captured_text
Edited by CyberSlug
Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!
Link to comment
Share on other sites

  • Moderators

Fixed-width rectangle is good,

Sorry, I understand now what you were asking. Yes the numbers are not the width and or height of the rectangle :(

I used PixelChecksum in a similar situation, but I was lucky since the text could only be one of six possible values.  (Once I obtained all the possible checksums, I just had to do a simple comparison.)

Box 1 is a simple one like your in the above, only 9 possible numbers that are right most of the box.

But, ouch... the last box has %'s in it. it can be 100% - 99.99% etc.. all the way to 0.00%.

So for the last box -- I would have to find the width and length of every possible combination or could i just find the widest width and longest height and anything within that ignore (I don't really need that, just the 1st 2 whole numbers).

Compute the PixelChecksum for each digit (which *IMPORTANT* requires determining the width of each digit) .... for example:

(I'd use a program like mspaint.exe with 400% zoom factor to determine the pixel widths and heights of the characers.)

  ;Checksums for each digit; probably actually want to use an array

  Global $d0 = 1193931446 ;replace with whatever the real value is

  Global $d1 = 517404083

  Global $d9 = 3836273605

Include decimal point and dollar sign, if applicable.

Ugh, I was afraid of that: Can you give me an example of an array to get the width and height? "Knowledge of some, master of none" :)

Hope this helps

Of course it does!!, I'm glad I wasn't dreaming. And haven't known you to give any bad advice :( .

For $column = 0 to 14

I don't understand what that is? the 0 to 14 I think is throwing me off.

But your awesome, A REAL AUTOIT JEDI!! :

I imagine I will be here all day at this.

Edited by ronsrules

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

I have a use for this code myself, and I *think* it will work for you, too. You just have to figure out the right starting point.

Black MS Sans Serif 8 pt text on white background. Assumes NO font smoothing. If both conditions are met, the following code should successfully report the text in the example dialog window...

Parts you need to adapt:

For $i = 0 to 100 ;$i is the width of the rectangular text box

$x = 100-$i ;$x is the x coordinate of the box's right edge

$y = 12 ;the top coord of the box

Opt("PixelCoordMode", 2);relative to client
Opt("TrayIconDebug", 1)

#include <GuiConstants.au3>
GuiCreate("Example", 200, 100)
GuiSetBkColor(0xFFFFFF)
GuiCtrlCreateLabel("1234567.890%", 10, 10)
GuiSetState()


Dim $DIGIT[10]
$DIGIT[0] = 1039045995
$DIGIT[1] = 964669520
$DIGIT[2] = 493068639
$DIGIT[3] = 1681432927
$DIGIT[4] = 1059429221
$DIGIT[5] = 1121686891
$DIGIT[6] = 1053922667
$DIGIT[7] = 3335891542
$DIGIT[8] = 4262564462
$DIGIT[9] = 1024169323
$space = 1103132553
$decimalPoint = 2138338444
$percentSign = 4280800300

$result = ""

WinActivate("Example")
WinWaitActive("Example")
sleep(100)

$HEIGHT = 9
$WIDTH = 6;digit width is six, percent is 8, period is 3

For $i = 0 to 100
    $x = 100-$i
    $y = 12
    $checksum = PixelChecksum($x,$y, $x+$WIDTH,$y+$HEIGHT) ; 6x9 rectangle
    For $d = 0 to 9
        If $checksum = $DIGIT[$d] Then
            $result = $d & $result
            ContinueLoop(2)
        EndIf
    Next
        
    $checksum = PixelChecksum($x,$y, $x+3,$y+$HEIGHT) ;decimal point
    If $checksum = $decimalPoint Then
        $result = '.' & $result
        ContinueLoop(1)
    EndIf

    $checksum = PixelChecksum($x,$y, $x+8,$y+$HEIGHT) ;check for % symbol
    If $checksum = $percentSign Then $result = '%' & $result
    
Next

MsgBox(4096,"OCR !!!", $result)
Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!
Link to comment
Share on other sites

  • Moderators

Man, I feel so damn stupid... I should of got that before I ever posted:

Um................HELL NO!!

This example is 100% no doubt absolutely.........INCREDIBLE!!!

Although, I've not been able to reproduce this yet into my own program... I've must of looked at your example 100 times. And all I can say is WOW.

Those of you that want an internal OCR or a Screen Scrape... HOLY COW!, I'm sure you're much faster than myself. But...wonderful I have to say.

I just have to learn now, how to transfer my numbers into the Dim#'s you've produced.

But: I feel a "PixelGetText" is in order after this!!

Can I kiss A$S anymore?

Yes: Absolutely 100% wonderful CyberSlug!

If you were a woman, and I was not married, whoooo hooooo! :(

LMAO... J/K

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

To answer the question how I came up with the values for the DIGITS array, attached is an initial attempt to partially automate the process....

Code is poorly commented so here are the instructions:

1) You first need to obtain the the widths of each character in the font. (I imagine there is a better way to get the info with GETTEXTMETRIC structre or something...)

Open first script, modify variables at beginning if neseccary.

Run first script, resize window, click its close button. Results copied to clipboard.

2) Open second script, modify variables at beginning if neseccary, paste results where indicated. Run script..... New results copied to clipboard.

3) Open final script, modify variables at beginning if neseccary, past results where indicated.

IMPORTANT:

You need to find the following line and change it to include all the possible widths:

$generalWidthsOfChars = StringSplit("6,8,2,3,5","") ;in order from most common to least (guesstimating here...)

See if OCR semi-works....

OCR_attempt.zip

Edited by CyberSlug
Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!
Link to comment
Share on other sites

  • Moderators

See if OCR semi-works....

<{POST_SNAPBACK}>

I just got home and couldn't wait to play around. Been looking forward to it all day, that I don't think I got my other work done thinking of it.

After 5 Mins:

Amazingly I understand this. Re-did my backgrounds and foreground (text color), and it worked well.

$label = GuiCtrlCreateLabel( "the quick brown fox jumps over a lazy dog" , 0, 0, 300, 50); Returns this as result:theguickbrownfoxiumdpsonnveralaxydog

$space = "??????" (looking into this, bunches all text together)

Very nice, now I'm going to compare your "Test_Text" to yesterdays post on the "Numbers", the text ran alot slower for result, I'm sure because the area was much larger.

:( Way to go!

Edited by ronsrules

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • Moderators

Ok, I've got the text to read perfectly: Unfortunately, I only need numbers LOL, can't seen to make it work on the values of the numbers w/ the "foreground" Hex: 0x191970 for the text color. The background hex of 1 box is like you have Hex:

The foreground of the text of box 2 is also Hex: 0x191970 but the background color is Hex: 0xF8F8FF

This doesn't matter, because I toggled back and forth between all 3 scripts to get values for:

0

1

2

3

4

5

6

7

8

9

$space

$decimal

$percentage

Man, I know just by the text version you gave that this is awesome, but........... :(

I know that my value of each must be off, but for the love of God I can't figure out where.

Thanks a bunch for you effort!!

Edited by ronsrules

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

From a PM:

The tutorial was wonderful, the scripts (1/2/3) simple but effective. 1 or 2 errors w/ a 9 here and there: if there is 99 or 94 (I get 994 or 944) (repeats second number for some reason) when those are together.

This is the part of the tutorial I never got to :( My code is inefficient.

Inefficient code from my tutorial:

For $x = 453 to 413 step -1;move right to left
    
    $p = PixelChecksum($x, $y, $x+6, $y+$HEIGHT-1)
    If $p = $ONE Then $output = "1" & $output
    If $p = $ZERO Then $output = "0" & $output
    
    $p = PixelChecksum($x, $y, $x+2, $y+$HEIGHT-1)
    If $p = $DECIMAL Then $output = "." & $output
    If $p = $COLON Then $output = ":" & $output
        
Next

This code performs a checksum on a rectangular region moving from right to left.

I compute a 6-pixel wide checksum. If that checksum matches $ONE or $ZERO then I update my output accordingly.

I also compute a 2-pixel wide checksum. If that matches $DECIMAL or $COLON then I update the output.

This code is inefficient for three reasons:

1) When $p = $ONE matches, I still compute the 2-pixel wide checksum.

2) When $p = $ONE matches, I still compare $p = $TWO.

3) When $p = $ONE matches, I still advance by only one pixel when I could advance by 6 pixels (the width of the matched digit).

Normally this ineffient code wastes time, but it could also cause problems if some random region just happens to have the same checksum as a certain digit... The Cheksums might be colliding.....

See attached screenshot that shows a flow chart of computing a PixelCheckSum on "94" The green highlighted region represents the area of the Checksum.

It could just so happen that the middle green region (representing half of digit 9 and half of digit 4) has the same checksum as the digit 9 !!!!

The best way to avoid this problem is to increment $x by the width of the any matched digit instead of incrementing just one pixel at a time.

It could also be the case that the a 2-pixel wide checksum region covering the left side of the digit 9 has the same checksum as the digit 9. *See Note 1

THE BEST SOLUTION IN AVODVING THESE PROBLEMS DUE TO THE INEFFICIENT ALGORITHM:

- Increment $x by the width of the last match (as I already mentioned).

- ContinueLoop as soon as the first match occurs.

Improved code is attached.

*Note 1: It might be necessary to change the order of your checksums to avoid collisions: You might need to perform 2-pixel wide checksums before 6-pixel wide checksums or vice versa.....

Even worse is when two different characters have the same checksum. I believe I ran case with MS Sans Serif 8 where the letter "x" and leter "n"--even though they are visually different--have the same checksum! This is quite annoying. You would have to check the color of the center pixel, or something, to tell those letters apart.

I hope this makes sense....

ImprovedCode.au3

Edited by CyberSlug
Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!
Link to comment
Share on other sites

  • Moderators

*Note 1: It might be necessary to change the order of your checksums to avoid collisions:  You might need to perform 2-pixel wide checksums before 6-pixel wide checksums or vice versa.....

Even worse is when two different characters have the same checksum.  I believe I ran case with MS Sans Serif 8 where the letter "x" and leter "n"--even though they are visually different--have the same checksum!  This is quite annoying.  You would have to check the color of the center pixel, or something, to tell those letters apart.

I hope this makes sense....

<{POST_SNAPBACK}>

Makes perfect sense! I'll test the 2 pixel (makes sense to start with that) to the 6 and then the 7 then let you know the results tomorrow. I have several pixel widths going on, no problems with any so far except the 99 and 94 that I stated before.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • 2 weeks later...
  • Moderators

Just wanted to give a heads up to everyone...

The OCR is working well. The only issue I've found is no spacing when recognizing letters.

Cyber, It's wierd, but... I tried the 2 to 6 and 6 to 2 ... one program runs faster and more accurately one way, and the other the other way.

Also, (I haven't tried yet) but, wouldn't it be possible to do this w/ absolute screen coords, or even active window coords?

Any way... Cyber needs an applause!!

Edit: Don't ask me why, too anal I guess

Edited by ronsrules

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

The OCR is working well.  The only issue I've found is no spacing when recognizing letters.

- Did you compute the checksum for the space character? (I'd use mspaint to determine the width of the space--usually about 3 pixels.)

- You can use any coordinate system you want. Absolute screen coordinates are probably better in some cases.

Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!
Link to comment
Share on other sites

  • Moderators

- Did you compute the checksum for the space character?  (I'd use mspaint to determine the width of the space--usually about 3 pixels.)

Blah --- I thought I did -- But I remember I kept getting a -200 in the 2nd part script you had.

- You can use any coordinate system you want.  Absolute screen coordinates are probably better in some cases.

<{POST_SNAPBACK}>

I thoughts so, better safe than sorry --- would of tried anyway, but if you say it's so, then it's so.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...