Jump to content

UWPOCR - Windows Platform Optical character recognition API Implementation


Recommended Posts

Link to comment
Share on other sites

On 4/17/2022 at 11:55 AM, KaFu said:

Try “fa-IR”? Also check if it is supported by your OS and install additional language pack if required (and available).



Yes, I tried fa-IR, farsi-IR, Per-IR, persian-IR, fa, persian, per.

None of them worked. I've also installed additional persian lang. pack on my Windows 11.

Link to comment
Share on other sites

I've installed Persian and tested it.

The GetText throws an error 7 here: _UWPOCR_Log("FAIL __UWPOCR_GetText -> WaitForAsync IOcrResult")

So the OCR engine does not seem to respond, that's where it lost me :), sorry, have no further clue.

Here's a test sentence:


Edited by KaFu
Link to comment
Share on other sites

  • 1 month later...

I used this udf to OCR the content in Command Prompt. After I adjusted the ClearType settings in Control Panel. The recognition rate becomes very poor. Now I can't get back to to the initial result even I disabled it. Is there any requirement stated in the API reference?

Noticed the OCR on number string are not very accurate.

Cleartype off.jpg

Link to comment
Share on other sites

Don't know about the ClearType setting, but maybe using a different font type and size for the command prompt will increase accuracy? Create a shortcut to cmd.exe, in the right-click properties you can adjust the layout settings.

Link to comment
Share on other sites

  • 4 weeks later...

Is there a min. size the picture has to be? Because for small pictures(139x26 in my case) it doesnt work.
However if i make the picture bigger without changing the size of the text, it will detect it properly.

I have attached both files.
The first one (Test.jpg) doesnt work.
The second bigger one (Test2.jpg) does work

Any clue whats going on?



Edited by Patrik96
Link to comment
Share on other sites

  • 2 months later...



I am interested in trying this out on my own program however I have a quick question.


I will be trying to use this on an application (would prefer not to take screenshots) so I would use the second example from @mLipok

1) if I am trying to find the location of a given text, ex “hello” and then get those location details to eventually left click center of that word, how would I add that?

Edited by Nick3399
Link to comment
Share on other sites

  • 2 weeks later...

Superb toolset, many thanks.

In my process, I'm trying to read small black text/white background boxes placed on a page wide graphic. The decode is about 75% reliable and I'm looking for tips to improve that. The code is still way to messy to post here. The process is, in summary:

1. Use a WebCapture routine to capture the full page to a 1280x768 bitmap on a hidden window.
2. Convert bmp, using handle, to image using _GDIPlus_BitmapCreateFromHBITMAP($hBmp)
3. Crop image to extract the required box, using _GDIPlus_BitmapCloneArea()
4. Create a 100x200 blank white canvas and merge the cropped image into the middle of it, using _GDIPlus_ImageGetGraphicsContext() and  
 _GDIPlus_GraphicsDrawImage(). I do this because the OCR is unhappy about small images (but will detect small text on a large enough file!)
5. Finally using _UWPOCR_GetText() to extract the text

I have tried enlarging the cropped image to a larger size, using _GDIPlus_ImageResize() instead of step 4 but this introduced extraneous noise (randomly coloured pixels) around the character edges, which affected decode reliability.

Any suggestions on process techniques to maximise the OCR reliability, whilst retaining the inherent simplicity of using the built in Win10 OCR capabilities?

I'm not, on this occasion, looking for coding; it's more about suggestions on whether, for example, I should try capturing a bigger web page first, or if there's a way of specifying the font/size/content type which would give the OCR module a tighter focus, etc. 



Win10 x64
Autoit (compiling to x86)

Link to comment
Share on other sites

It would be great to see the input image to be sure what suggestions to give you.




Link to comment
Share on other sites

Thanks Dany,

Typical source snip attached as file TIM2.bmp, with GDIPlus_ImageT.jpg showing how it appears just before submitting to UWPOCR for processing, as follows:

#Include <UWPOCR.au3>

    Local $sTIMTextResult = _UWPOCR_GetText(@ScriptDir & "\GDIPlus_ImageT.jpg", "en-GB", False);True)
    msgbox(0,"Capture Time", $sTIMTextResult)

    $sTIMTextResult = StringReplace($sTIMTextResult," ","")
    $sTIMTextResult = StringReplace($sTIMTextResult,":","")
    $sTIMTextResult = StringReplace($sTIMTextResult,".","")
    $sTIMTextResult = StringReplace($sTIMTextResult,"O","0")
    $sTIMTextResult = StringReplace($sTIMTextResult,"C","0")
    $sTIMTextResult = StringReplace($sTIMTextResult,"I","1")

    $sTIMTextResult = StringLeft($sTIMTextResult,2) & ":" & StringMid($sTIMTextResult,3,2) & ":"  & StringMid($sTIMTextResult,5,2)

    msgbox(0,"Modified Capture Time", $sTIMTextResult)

I've used the cleanup code with some success to allow for misreads of the colons and 0/1 as O,C or I.

Since posting yesterday, I further experimented, a bit more systematically and found that:

- So long as the overall image was big enough, UWPOCR would at least try to decode the image
- The size of the text in the image, or the image itself, made little difference to the success
- Language setting and "UseOCRLine" parameters made no observable difference

What seems to be most promising at the moment is that I have just introduced a filter to force each pixel in the source image to either black or white, based on R,G and B being all greater than 240 being white, otherwise black. The decode reliability has increased dramatically. I can get away with the extra time used because the function is called infrequently and has a long window of opportunity to complete; also the images involved are comparatively small. So I think I have a solution to the immediate problem.

I believe that my source image may be not pure black and white and that's what is at the root of the poor decoding. To me, that points to the Windows 10 native OCR being weak - the UDF is certainly working well and is remarkably easy to understand, use and integrate. I'd certainly be interested in your thoughts.




//Edit: Well that theory has just been blown out of the water. I realised that the attached images were captured after I had applied the filter. So I turned it off to run new images... and the OCR is working fine !?! So new images attached without filter described above.






Edited by g0gcd
Updated information
Link to comment
Share on other sites

Hello @g0gcd What I would do is to append your image to an image with a similar text pattern so that the OCR engine can get a better result.

So you will end up with a joined image like this one:


then process it with the OCR.

Test Code:

#include <ScreenCapture.au3>
#include <GDIPlus.au3>
#include "..\UWPOCR.au3"


Func _Example()

    ;hImage/hBitmap GDI
    Local $hTimer = TimerInit()
    Local $sImageFilePath = @ScriptDir & "\JoinedImage.jpg"
    Local $sImageTIM2FilePath = @ScriptDir & "\TIM2.bmp"
    Local $sText = "0123456789"
    Local Const $iW = 270, $iH = 40
    Local $hImageToProcess = _GDIPlus_ImageLoadFromFile($sImageTIM2FilePath)
    Local $hBitmap = _GDIPlus_BitmapCreateFromScan0($iW, $iH) ;create an empty bitmap
    Local $hBmpCtxt = _GDIPlus_ImageGetGraphicsContext($hBitmap) ;get the graphics context of the bitmap
    _GDIPlus_GraphicsSetSmoothingMode($hBmpCtxt, $GDIP_SMOOTHINGMODE_HIGHQUALITY)
    _GDIPlus_GraphicsClear($hBmpCtxt, 0xFFFFFFFF) ;clear bitmap with color white
    _GDIPlus_GraphicsDrawString($hBmpCtxt, $sText, 0, 0, "Arial", 18)  ;draw some text to the bitmap
    _GDIPlus_GraphicsDrawImage($hBmpCtxt, $hImageToProcess,140, -16)
    _GDIPlus_ImageSaveToFile($hBitmap, $sImageFilePath) ;save bitmap to disk
    Local $sOCRTextResult = _UWPOCR_GetText($sImageFilePath)
    MsgBox(0, "Time Elapsed: " & TimerDiff($hTimer),StringStripWS(StringReplace( $sOCRTextResult,$sText,""), $STR_STRIPALL))
EndFunc   ;==>_Example




Link to comment
Share on other sites

That's inspired! ✔️

I'll try that approach and let you know how it goes.

Brilliant, thanks



Thank you so much DanyFirex!

By experiment I have found:

1. The canvas (combined image) itself needs to be of a significant size. I found that 200x200 pixels was the minimum; any less than this caused intermittent decoding irrespective of the image quality. My solution uses 900 wide by 300 tall.

2. The helper text provided assistance wherever it was placed on the canvas but best improvement came with placing it immediately to the left of the source image text, with about 1 or 2 "spaces" gap between the helper text and the source image text. No further improvement came from adding alphabetic characters or punctuation to the helper text. (Note: en-gb used)

3. The helper text helped immensely whatever font and size was used for it, but in my case, the best improvement came from using the same font and size as the source (Arial, 27pt).

4. I found that the source image text font size should be between 12 and 30. Too big and the OCR misreads and too small, the OCR doesn't see small characteristic differences. My source was 9pt from the capture process, which I enlarged to 27pt.

5. With poor source images, it does help to force pixels into pure black/white before merging into the canvas. There may be a UDF out there that does that but I did it by hand as I needed to find the edge of my source "white" box within a coloured image anyway.

6. The OCR seems to "like" a substantial white border around the two elements. If the helper text was too close to any edge, I had problems decoding. Between 50 and 100 pixels seemed to be a minimum acceptable border. (Note, this may also explain 1.)

7. I inserted __UWPOCR_Initialize() before each _UWPOCR_GetText() function call. (I process several boxes on each run). I can't see why this might be helpful but, whilst running repeated, frequent, testing I found several results were corrupted with previous or non-sensical values. I will continue to hunt my code for an error on my part, where I haven't cleared a variable, or have inadvertently re-used it! 

I hope these notes are helpful to anyone else who is struggling with decoding a less than perfect source image. My infinite thanks to DanyFirex for the pointer towards using "helper" text, as that unlocked a massive improvement in reliability.

Best Regards / Saludos




Edited by g0gcd
Update with findings
Link to comment
Share on other sites

  • 4 months later...
  • 1 month later...

Hello, I'm wondering if it's possible to obtain the coordinates of text detected by this OCR. I have written a piece of code that captures a screenshot from an emulator and then uses this OCR to extract the text from that image. However, I am unsure how to retrieve the exact location or coordinates of the identified text. Could you please advise me on how to accomplish this task? also what am asking is possible, then what if there is multiple texts, how can i get the coordinates of a specified text. the coordinates should be relative to the image. if you have a better idea on how to do this, let me also do



#include <UWPOCR.au3>

Run("adb -s emulator-5554 shell screencap -p /storage/emulated/0/picz/screenshot.png", @SW_HIDE)
Run("adb -s emulator-5554 pull /storage/emulated/0/picz/screenshot.png " & @ScriptDir & "\try.png", @SW_HIDE)

Local $sOCRTextResult = _UWPOCR_GetText(@ScriptDir & "\try.png", Default, True)
MsgBox(0, "OCR", $sOCRTextResult)


Link to comment
Share on other sites

if you want to move mouse to the location of the text and you found example 4 was complicated, then here is a simpler code example to let you move mouse to the location of text, change line 15 to the text you want. @Werty @CommZ3 maybe its not efficient but you don't have to use gdi+ if you don't know about it.

a shorter code

#include <ScreenCapture.au3>
#include <UWPOCR.au3>

_ScreenCapture_Capture(@ScriptDir & "\screenshot1.png", 0, 0, @DesktopWidth, @DesktopHeight)
Local $aWords = _UWPOCR_GetWordsRectTo2DArray(@ScriptDir & "\screenshot1.png")

Local $sTargetWord = "New"
For $i = 0 To UBound($aWords) - 1
    If $aWords[$i][0] = $sTargetWord Then MouseMove($aWords[$i][1] + $aWords[$i][3] / 2, $aWords[$i][2] + $aWords[$i][4] / 2)

a longer code with explanation

#include <ScreenCapture.au3>
#include <UWPOCR.au3>

; Set the path and file name for the screenshot
Local $sScreenshotPath = @ScriptDir & "\screenshot1.png"

; Capture a screenshot of the desktop and save it to a file
_ScreenCapture_Capture($sScreenshotPath, 0, 0, @DesktopWidth, @DesktopHeight)

; Extract the text from the screenshot using UWPOCR
Local $sText = _UWPOCR_GetText($sScreenshotPath)

; Set the target word to be searched
Local $sTargetWord = "change_this_to_word_to_find"

; Look for the target word
Local $aWords = _UWPOCR_GetWordsRectTo2DArray($sScreenshotPath)

; Loop through each word in the array
For $i = 0 To UBound($aWords) - 1
    ; If the word matches the target word
    If $aWords[$i][0] = $sTargetWord Then
        ; Calculate the center of the rectangle enclosing the word
        Local $iX = $aWords[$i][1] + $aWords[$i][3] / 2
        Local $iY = $aWords[$i][2] + $aWords[$i][4] / 2
        ; Move the mouse to the center of the rectangle
        MouseMove($iX, $iY)
        ; Exit the loop


Edited by ScorpX
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Create New...