Jump to content
seangriffin

Tesseract (Screen OCR) UDF

Recommended Posts

Idk if this post is still alive, but i can`t get it to work.

$Text = _TesseractWinCapture($win_title,"",0,"",1,2,0,0,0,0,1)

"D:\OneDrive\Documents\AutoIT\OCR\Tesseract.au3" (349) : ==> Variable must be of type "Object".:
$Obj1.ShowFile ($capture_filename, 1)
$Obj1^ ERROR

 

any ideas?

Share this post


Link to post
Share on other sites

This is the latest place to get Tesseract: https://github.com/tesseract-ocr/tesseract

I believe as of me posting this the latest "stable" version is 3.05 but they also have a 4.00 alpha.

I'm not sure if anyone is still having the issue with Preview.Preview.1 but I managed to fix that a while ago when we had updated all of the PCs in the office.

On 5/17/2017 at 2:29 PM, KemyKo said:

Idk if this post is still alive, but i can`t get it to work.

$Text = _TesseractWinCapture($win_title,"",0,"",1,2,0,0,0,0,1)

"D:\OneDrive\Documents\AutoIT\OCR\Tesseract.au3" (349) : ==> Variable must be of type "Object".:
$Obj1.ShowFile ($capture_filename, 1)
$Obj1^ ERROR

 

any ideas?

You most likely need to change the tesseract path in the UDF. Someone explained how to do it on page 6 of this thread.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites
On 22.5.2017 at 3:57 PM, anthonyjr2 said:

This is the latest place to get Tesseract: https://github.com/tesseract-ocr/tesseract

I believe as of me posting this the latest "stable" version is 3.05 but they also have a 4.00 alpha.

I'm not sure if anyone is still having the issue with Preview.Preview.1 but I managed to fix that a while ago when we had updated all of the PCs in the office.

You most likely need to change the tesseract path in the UDF. Someone explained how to do it on page 6 of this thread.

Hi, 

i have the same issue. that is what i did:

I changed

;ShellExecuteWait(@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe", $capture_filename & " " & $ocr_filename)

To
 

    ShellExecuteWait("C:Program Files (x86)\Tesseract-OCR\tesseract.exe", $capture_filename & " " & $ocr_filename, "", "", @SW_HIDE)

Example Code:

#include <Tesseract.au3>

$omg = _TesseractScreenCapture(0,"",1,1,377,630,493,670,1)
MsgBox(0, "Test:", $omg)


Global $Chrome = WinWait("[TITLE:Google - Google Chrome")
If $Chrome = 0 Then Exit 2

WinActivate($Chrome)

ConsoleWrite('"' & _TesseractScreenCapture(0, "", 1, 5, 781, 113, 832, 136, 1) & '"' & @LF)

 

Error:

>"C:\Program Files (x86)\AutoIt3\SciTE\..\autoit3.exe" /ErrorStdOut "C:\Users\admin\Desktop\TesseractExample\Example1.au3"    
"C:\Program Files (x86)\AutoIt3\Include\Tesseract.au3" (187) : ==> Variable must be of type "Object".:
$Obj1.ShowFile ($capture_filename, 1)
$Obj1^ ERROR
>Exit code: 1    Time: 5.532

 

Any idea how to fix it?

 

=>> Tesseract download: tesseract download for windows

=>>Tesseract.au3 file (is located in C:\Program Files (x86)\AutoIt3\Include)

=>> Im using Windows 10 64Bit

Share this post


Link to post
Share on other sites
On 2/13/2009 at 9:03 AM, seangriffin said:

This UDF provides text capturing support for applications and controls using Tesseract - an OCR engine currently developed by Google.

 

Tesseract was originally developed as proprietary software at Hewlett-Packard between 1985 until 1995. After ten years without any development taking place, Hewlett Packard and UNLV released it as open source in 2005. Tesseract is currently developed by Google and released under the Apache License, Version 2.0.

 

Tesseract is considered one of the most accurate free software OCR engines currently available. It was one of the top 3 engines in the 1995 UNLV Accuracy test.

 

My main goal in developing this UDF is to provide AutoIT users with a free Screen OCR solution that competes with other commercial (payed) technologies like Microsoft Office Document Imaging (MODI) and Textract.

 

REQUIREMENTS:

 

  • AutoIt3 3.2 or higher
  • Tesseract 2.01 or above

INSTALLATION:

To install Tesseract:

 

LIST OF FUNCTIONS:

 

DEMONSTRATION:

<Under Construction>

 

EXAMPLES:

_TesseractControlCapture.au3_TesseractControlFind.au3

 

DOWNLOAD:

 

Latest Version - v0.6 (17/03/09)

Tesseract.au3

Hi Seangriffin,

you have done awesome work. but while using the above sample scripts i am getting an error.

Could you please help me out. Thanks in advance.

I am highlighting the error message. 

 

Error message : "D:\AutoIt3\OCR_AutoIt\Tesseract\Tesseract.au3" (968) : ==> Subscript used on non-accessible variable.:
$hBMP = _WinAPI_CreateCompatibleBitmap($hDC, ($pos[2] * $scale) - ($right_indent * $scale), ($pos[3] * $scale) - ($bottom_indent * $scale))
$hBMP = _WinAPI_CreateCompatibleBitmap($hDC, ($pos^ ERROR

Share this post


Link to post
Share on other sites
On 8/11/2016 at 7:28 AM, MaximusCZ said:

Hello,

I am trying to run tesseract, but I cant find any exe installator anywhere.

All the links seems to be broken/moved and on official repo I can only download source / binaries that doesnt install.

 

Can anyone upload theirs copy of working installator of tesseract for all the people that come looking for it? 

 

Thanks!

Ver 3.02 in: https://sourceforge.net/projects/tesseract-ocr-alt/files/latest/download?source=files

And more dowloads in: https://sourceforge.net/projects/tesseract-ocr-alt/files/

 

Share this post


Link to post
Share on other sites

Some example code to test if concept still works used version 3.05.1 output.txt will contain the result (on my system about 30 seconds to analyze full screen which you probably normally should not do)

Tweaking with scaling somewhere above 4.0 scaling gives some nice results certainly on black/white areas.

However 8.0 is much to high.

#include <GUIConstantsEx.au3>
#include <ScreenCapture.au3>

local $TIF_FILENAME="c:\temp\test.tif"
Local $TESS_PARAMS= $TIF_FILENAME & " output"
Local $TESS_EXE=@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe"
local $TWorkDir="C:\temp\"
Local $iScale = 8.0 ;1.0 is without any scaling

SaveTiffImage()
sleep(1000) ; Little time for saving the file
AnalyzeImage()

Func SaveTiffImage()
    _GDIPlus_Startup()
    Local Const $iW = @DesktopWidth , $iH = @DesktopHeight 

    Local $hHBmp = _ScreenCapture_Capture("", 0, 0, $iW, $iH) ;create a GDI bitmap by capturing 1/16 of desktop
    Local $hBitmap = _GDIPlus_BitmapCreateFromHBITMAP($hHBmp) ;convert GDI bitmap to GDI+ bitmap
    _WinAPI_DeleteObject($hHBmp) ;release GDI bitmap resource because not needed anymore

    Local $hBitmap_Scaled = _GDIPlus_ImageScale($hBitmap, $iScale, $iScale, $GDIP_INTERPOLATIONMODE_NEARESTNEIGHBOR) ;scale image by 275% (magnify)

    ; Save resultant image
    _GDIPlus_ImageSaveToFile($hBitmap_Scaled, $TIF_FILENAME)

    ;cleanup resources
     _GDIPlus_BitmapDispose($hBitmap)
    _GDIPlus_BitmapDispose($hBitmap_Scaled)
    _GDIPlus_Shutdown()
EndFunc   ;==>Example

func AnalyzeImage()
    Local $hTimer = TimerInit() ; Begin the timer and store the handle in a variable.
    shellexecutewait($TESS_EXE, $TESS_PARAMS, $TWorkDir)
    Local $fDiff = TimerDiff($hTimer) ; Find the difference in time from the previous call of TimerInit. The variable we stored the TimerInit handlem is passed as the "handle" to TimerDiff.
    consolewrite( ($fDiff/1000) &  "seconds passed")
EndFunc

 

Share this post


Link to post
Share on other sites

Hey guys! Kudos on all the good work! I am need of a little help. I am currently using imagesearch to find "pictures" of word and if the search is true I have a popup followed by a screencapture. My issue is that the pdf reports I am running this on change in size depending on how large the report is. I figured OCR would be my best bet. I know this thread is a wee bit old, but would it be possible to get assistance on this? I am just wanting to find specific words or numbers on the screen. Thanks in advance

Share this post


Link to post
Share on other sites
On 5/2/2018 at 1:48 AM, junkew said:

Some example code to test if concept still works used version 3.05.1 output.txt will contain the result (on my system about 30 seconds to analyze full screen which you probably normally should not do)

Tweaking with scaling somewhere above 4.0 scaling gives some nice results certainly on black/white areas.

However 8.0 is much to high.

#include <GUIConstantsEx.au3>
#include <ScreenCapture.au3>

local $TIF_FILENAME="c:\temp\test.tif"
Local $TESS_PARAMS= $TIF_FILENAME & " output"
Local $TESS_EXE=@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe"
local $TWorkDir="C:\temp\"
Local $iScale = 8.0 ;1.0 is without any scaling

SaveTiffImage()
sleep(1000) ; Little time for saving the file
AnalyzeImage()

Func SaveTiffImage()
    _GDIPlus_Startup()
    Local Const $iW = @DesktopWidth , $iH = @DesktopHeight 

    Local $hHBmp = _ScreenCapture_Capture("", 0, 0, $iW, $iH) ;create a GDI bitmap by capturing 1/16 of desktop
    Local $hBitmap = _GDIPlus_BitmapCreateFromHBITMAP($hHBmp) ;convert GDI bitmap to GDI+ bitmap
    _WinAPI_DeleteObject($hHBmp) ;release GDI bitmap resource because not needed anymore

    Local $hBitmap_Scaled = _GDIPlus_ImageScale($hBitmap, $iScale, $iScale, $GDIP_INTERPOLATIONMODE_NEARESTNEIGHBOR) ;scale image by 275% (magnify)

    ; Save resultant image
    _GDIPlus_ImageSaveToFile($hBitmap_Scaled, $TIF_FILENAME)

    ;cleanup resources
     _GDIPlus_BitmapDispose($hBitmap)
    _GDIPlus_BitmapDispose($hBitmap_Scaled)
    _GDIPlus_Shutdown()
EndFunc   ;==>Example

func AnalyzeImage()
    Local $hTimer = TimerInit() ; Begin the timer and store the handle in a variable.
    shellexecutewait($TESS_EXE, $TESS_PARAMS, $TWorkDir)
    Local $fDiff = TimerDiff($hTimer) ; Find the difference in time from the previous call of TimerInit. The variable we stored the TimerInit handlem is passed as the "handle" to TimerDiff.
    consolewrite( ($fDiff/1000) &  "seconds passed")
EndFunc

 

  • Hi!

Thaks you works great.

Do you know how i can OCR with other lang, not eng?

Share this post


Link to post
Share on other sites

Hi all,

I'm currently working with V0.6 of this library and the padding logic isn't bahaving like I'd expect.
(The captured area doesn't match the coordinates I entered and the image is stretched)

Therefore I looked at the code and found what I think is a mistake in the padding logic.
The diff for my fix is attached.

Cheers,
siiikooo0743

 

paddingLogicFix.patch

Share this post


Link to post
Share on other sites

I rewrite this code, because it didn't work for me. I had the problem, that the functions didn't capture the correct area.

I added some changes from airday, don134.

The changes are marked with ";Start code" and ";End code", the original code is comment out.

I rewrite the code to use coordinates like Pixelsearch, idea from hendrikhe.

Add _GDIPlus_ImageScale to CaptureToTiff function, I didn't find or understand the original scale function.

Tesseract.au3

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...