# API Call for Tesseract

## Recommended Posts

Several years back someone wrote an API for Tesseract

As great as that UDF is it is just a screen scraping UDF (that I'm pretty sure has memory leaks). I have very high resolution images (that I am hoping will help increase the accuracy) that I would like to read but I would like to read them directly from a file. Effectivly my program will save a .tiff (or some other format, will play around to see what has higher accuracy) open up Tesseract and read the .txt file that it creates.

I believe you can call Tesseract via an API call with the image path in the parameter. Tesseract than creates a .txt file in the same directory (with the same name). I looked over the UDF and this seems to be happening but I am unable to get this thing to run.

I have never really used parameters in execuables like this and was hoping someone could lend me a hand figuring it out.. I was hoping it would be as easy as...

ShellExecuteWait(@ProgramFilesDir & "\tesseract\tesseract.exe", "C:\OCRTEST.TIF C:\OCRTEST")

But alas it is not.

Thanks!

##### Share on other sites

So after more research I feel more confident that my initial assumption for the shell call was right however when I run this script I cannot get a text file created. It will create it using the UDF, implying that it is up and running but it will not fire off if I use my own shell command on my own file.

This was posted by JohnOne last year, I slightly adjusted it to fit my needs. I can tell that the Tesseract executable is getting called but the text extraction is never created.

Any Suggestions?

$s_Image_InputFile = @ScriptDir & "\OCRTEST.tif"$s_OCR_OutputFile = @ScriptDir & "\in"
$result = _TessOcr($s_Image_InputFile, $s_OCR_OutputFile) MsgBox(0,"Result",$result)
Func _TessOcr($in_image,$out_file)
Local $Read ShellExecuteWait(@ProgramFilesDir & "\tesseract\tesseract.exe", '"' &$in_image & '" "' & $out_file & '"', Default, Default, @SW_HIDE) If @error Then MsgBox(0,"Error","ShellExecuteWait Error") Exit EndIf If FileExists($out_file & ".txt") Then
$Read = FileRead($out_file & ".txt")
FileDelete($out_file & ".txt") Else$Read = "No file created"
EndIf
Return $Read EndFunc ;==>_TessOcr #### Share this post ##### Link to post ##### Share on other sites You might need #requireadmin to run in that location Monkey's are, like, natures humans. #### Share this post ##### Link to post ##### Share on other sites Thanks for the input but during the time between posts I basically scrapped that old udf file. I downloaded the newest version of Tesseract and was playing around with it and found some cool stuff with page formating. Here is a snit bit I used for proof of concept. ;http://code.google.com/p/tesseract-ocr/wiki/FAQ #Include <Array.au3>$s_Image_InputFile = "C:\temp\test.tif"
$s_OCR_OutputFile = "C:\temp\in.txt"$result = _TessOcr($s_Image_InputFile,$s_OCR_OutputFile)
$array = StringSplit($result, @CRLF)
_ArrayDisplay($array) ;MsgBox(0,"Result",$result)
Func _TessOcr($in_image,$out_file)
Local $Read ShellExecuteWait(@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe", '"' &$in_image & '" "' & $out_file & '" ' & '"-l eng"' & '" ' &'" -psm 6"') If @error Then MsgBox(0,"Error","ShellExecuteWait Error") Exit EndIf If FileExists($out_file & ".txt") Then
$Read = FileRead($out_file & ".txt")
;FileDelete($out_file & ".txt") Else$Read = "No file created"
EndIf
EndFunc   ;==>_TessOcr

##### Share on other sites

HI, Boogieoompa

Great idea to simplify calling tesseract from Autoit!  It works perfectly with my  sample texts in english. But I've encountered strange problem while trying implementation with my russian language texts.

Sample input in Russian (russian cubes for tesseract installed, the image for testing purposes is w/b tif with no noize and simple layout - just a couple of paragraphes) produces garbage with a lot of incorrectly recognized and/or capitalized symbols. The strange part is that if I recognize exactly the same image the usual way i. e. running tesseract from command prompt (with the same parameters -l rus and -psm 6 as in autoit script) the result is nearly perfect. Any ideas how it could be explained?

##### Share on other sites

Free bump.

Is there an official standalone Tesseract.exe that can be used in the fashion above or only an installer?

Monkey's are, like, natures humans.

## Create an account

Register a new account

×

• Wiki

• Back

• Git