Sign in to follow this  
Followers 0
awayne

Tesseract OCR on Windows 7

6 posts in this topic

#1 ·  Posted (edited)

All,

I am revisiting a problem I am still having last week and if anyone has Tesseract OCR installed on windows 7 and the Tesseract.au3 UDF and can test for me I would be greatly appreciative this has been bugging me for about a week now.  My goal is to use the Tesseract UDF screencapture function to read in text.

#include <Tesseract.au3>

$OCR_Result = _TesseractScreenCapture(0,"",1,1,294,121,377,166,0)

MsgBox(0, "Result: ", $OCR_Result)

The version of Tesseract I am using can be found here if anyone wants to install and test with the attached UDF.

https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-setup-3.02.02.exe

 

 

OK I have it partially working on Windows 7 here is what I had to do.

In the UDF I had to change the following lines.

 

CHANGE:

Global $tesseract_temp_path = "C:"

TO:

Global $tesseract_temp_path = "C:Temp"

AND created folder Temp


CHANGE:

ShellExecuteWait("C:Program Files (x86)Tesseract-OCRtesseract.exe", $capture_filename & " " & $ocr_filename)

TO:
ShellExecuteWait("C:Program Files (x86)Tesseract-OCRtesseract.exe", $capture_filename & " " & $ocr_filename, "", "", @SW_HIDE)

 

The new downside is I will have extra characters etc which I can just regex out if they are about the same everytime so I will consider my problem solved.

Tesseract.au3

Edited by awayne

Share this post


Link to post
Share on other sites



Ok I am posting my solution for 3.0 and windows 7 and the updated UDF again

Tesseract.au3 the attached edited is working in windows 7 this should save someone a huge headache so I am hoping people find this thread.

#include <Tesseract.au3>

$OCR_Result = _TesseractScreenCapture(0,"",1,3,295, 73, 373, 112,0)

$file = "outfile.txt"

FileOpen($file, 1)

FileWrite($file, $OCR_Result)

FileClose($file)

I typed the text word "text" in a notepad document and was able to grab the coordinates in the form of a rectangle and print the word text to an output file.  To see the edits to the original udf please see the post above toward the end.

 

Again the edits to the below also I forgot to mention to get tesseract v3 working with Windows 7 I also had to change three path settings to tesseract in the file.

 

Tesseract.au3

Share this post


Link to post
Share on other sites

awayne,

I was prepared to let you run another thread on this subject, but I rather hoped that you would follow the advice I gave you in your first thread. Please do so from now on. :naughty:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

My apology I had just solved a puzzling problem for the community and wanted to share; so, if anyone uses Tesseract.au3 udf on windows 7 they can use the corrected version above and it should work out of box.

:)

Thanks for all your contributions though to the community and to autoit Melba23.   You are very active on this forum so your dedication is appreciated in helping us new people learn.

Share this post


Link to post
Share on other sites

some correction for preview (as TIFF viewer is not in W7)

somewhere around line 176 this should be added / replaced

; If the captures are to be displayed
    if $show_capture = 1 Then

        ; $Obj1 = ObjCreate("Preview.Preview.1")
        ; $Obj1_ctrl = GUICtrlCreateObj($Obj1, 0, 0, 640, 480)
        ; $Obj1.ShowFile ($capture_filename, 1)

        _GDIPlus_Startup()

        Local $hBitmap =_GDIPlus_BitmapCreateFromFile($capture_filename)
        $iW = _GDIPlus_ImageGetWidth($hBitmap)
        $iH = _GDIPlus_ImageGetHeight($hBitmap)

;~     _WinAPI_DeleteObject($hHBmp) ;release GDI bitmap resource because not needed anymore

        Local $hBitmap_Scaled = _GDIPlus_ImageResize($hBitmap, 640, 480) ;resize image

        local $hGUI=GUICreate("Tesseract Screen Capture.  Note: image displayed is not to scale", 640, 480, 0, 0, $WS_SIZEBOX + $WS_SYSMENU)  ; will create a dialog box that when displayed is centered
        GUISetState(@SW_SHOW)

        Local $hGraphics = _GDIPlus_GraphicsCreateFromHWND($hGUI) ;create a graphics object from a window handle
        _GDIPlus_GraphicsDrawImage($hGraphics, $hBitmap_scaled, 0, 0) ;display scaled image

        While 1
            Switch GUIGetMsg()
                Case $GUI_EVENT_CLOSE
                    ExitLoop
            EndSwitch
        WEnd

    ;cleanup resources
    _GDIPlus_GraphicsDispose($hGraphics)
    _GDIPlus_BitmapDispose($hBitmap)
    _GDIPlus_BitmapDispose($hBitmap_Scaled)
    _GDIPlus_Shutdown()

    _GDIPlus_Startup ()

and add 

#include <GUIConstantsEx.au3>

and testfunction 

$OCR_Result = _TesseractScreenCapture(0,"",1,1,294,121,377,166,1)

Share this post


Link to post
Share on other sites

Hi Junkew, 
Can you please post the updated tesseract.au3 file? I attempted to modify the one I got from awayne's post above, but had no luck with it....

---

Otherwise: can someone tell me how the coordinates in _TesseractScreenCapture have to be specified? 
I tried to get a specific portion of a HD screens (1920*1080), but couldn't figure out how the 4 indent parameters had to be specified. Eg on the attached AutoitSCreen.png screenshot (opened it in paint and made it full screen) I tried to restrict the recognition process to the word Autoit (marked with red rectangle), but had no luck on it. 

Theoretically the coordinates should be: 

left_indent: 969
top_indent: 350
right_indent: 1920 - 1116  (width of the image - indent)
bottom_indent: 1042 - 394 (height of the image - indent)

 

so the call is like: $OCR_Result = _TesseractScreenCapture(0,"",2,1,969, 350, 1920 - 1116, 1042 - 394,0)

but the result  - OCRArea.tif is totally off (disabled the deletion of the temporary file in the code to grab what screenshot the OCR took), nowhere near the area I wanted. 

 

 

AutoitScreen.png

OCR-area.tif

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0