Followers 0

# Using Tesseract as DLL

## 13 posts in this topic

Just a quick question, since I read all the threads about but something is still missing (by my side..)

Is it possible to use Tesseract as a stand alone Library (i.e. DLL) without installing it?

I mean, if I build my script and give to a friend (complete with eventually the Dlls), will he be able to use it without installing Tesseract?

Thanks a lot,

Marco

##### Share on other sites

Uh, still noone? ^^

##### Share on other sites

Can you post a link to this tesseract dll?

Monkey's are, like, natures humans.

##### Share on other sites

Well, I guess there is NO Tesseract DLL

.

What I'm saying is that (and I guess it's confirmed) Tesseract can't be used unless it's installed.

And I'm looking for something that needs no installation

I found, for example, GOCR (http://jocr.sourceforge.net/). Good product put he doesn't work with Jpg (_screencapture) so I need "another" tool to work with.. and I got Topng (http://www.softpedia.com/get/Multimedia/Audio/Audio-Convertors/ToPNG.shtml).

But this triple passage (_screencapture -> Topng -> Gocr) is time-consuming and the end results are not good.

I found also in another program a DLL (ocrdll.dll) but I know anything about that and don't know how to call it.

So, what I'm looking for is a good tool to OCR .jpg done as API or as stand-alone solution

Hope someone can intervene here to clarify a bit more the situation.

Regards,

Marco

##### Share on other sites

As far as I recall you dont need to "install" tesseract at all, I've seen it on the forum, just cant remember where.

You can use its commandline

it was something like

ShellExecute("tesseract.exe", "imagefilename.tiff txtfilename.txt")

Monkey's are, like, natures humans.

##### Share on other sites

I think not, I tried from another computer where Tesseract is not installed and if i run it (standard run from Dos Prompt) I get an error: impossible to run... because leptonlib.dll is not installed...

That confirms that if you don't install it before you can't use it.

Meanwhile i found that ocrdll.dll has several interesting commands (I used dllviewer to get them)

Commands are:

getletter
getlettertemplate
lettersize
readnextchar

But what I'm missing is how to call them. I tried

#include <array.au3>
$image = @ScriptDir & "\test1.jpg" dim$result
$dll = DllOpen("ocrDll.dll")$result = DllCall($dll,"int","readNextChar","hwnd",0, "img",$image)
DllClose($dll) msgbox(0,"...",$result)
;~  getletter
;~  getlettertemplate
;~  lettersize
;~  readnextchar

Maybe you (or someonelse) can help me on that

ocrDll.dll

##### Share on other sites

There are all sorts of different versions of teseract , and I am certain one of them needs only the executable and a few data files in the same dir.

If I can find the version I'll give you a link.

Meanwhile, that call to the dl library certainly will not work, I'm no dll expert but autoit dllcall certainly does not have a type 'img'

It might be a handle to a hbitmap or a hundred other things, and without the documentation you are flogging a dead horse.

Monkey's are, like, natures humans.

##### Share on other sites

Here is a link to the files along with a very basic example.

Tesseract command line example about 1.4MB

$s_Image_InputFile = @ScriptDir & "\test.bmp"$s_OCR_OutputFile = @ScriptDir & "\in"

$result = _TessOcr($s_Image_InputFile, $s_OCR_OutputFile) MsgBox(0,"Result",$result)

Func _TessOcr($in_image,$out_file)
Local $Read ShellExecuteWait(@ScriptDir & "\tesseract.exe", '"' &$in_image & '" "' & $out_file & '"', Default, Default, @SW_HIDE) If @error Then MsgBox(0,"Error","ShellExecuteWait Error") Exit EndIf If FileExists($out_file & ".txt") Then
$Read = FileRead($out_file & ".txt")
FileDelete($out_file & ".txt") Else$Read = "No file created"
EndIf
EndFunc   ;==>_TessOcr

All the files needed to run the code are in the rar, you should just be able to run as is, nothing needs installing

famous last words

Monkey's are, like, natures humans.

##### Share on other sites

Is there a way to train it? Since I tried to let him ocr some bmp with terrible results.

I saw in wiki something about training, and I tried, but got the following:

G:\AutoIT\tesse>tesseract /eng/fontfile.tif junk nobatch box.train
read_variables_file:Can't open ./tessdata/configs/box.trainCould not open file,
nobatch

So surely 2.04 it's just a .exe and needs nothing else, but for training I suppose i need some more files, isn't it?

And I had to test it on another PC. My one (with previous installed and uninstalled Tesseract 3.0) gives errors since Tesseract.exe looks for installation dir (due to registry keys I think).

##### Share on other sites

Hmm I tried training in another way:

C:\Users\___\Desktop\tess>tesseract \tessdata\configs\eng.arial.tif junk nobatch eng.arial.box

I get the error:

...
...
Could not open file, nobatch

I tried to give him a file to ocr but results are bad, so definitely I need to train it in some ways.

M.

##### Share on other sites

Well I dont know about all that, Its a tesseract issue you are dealing with, not autoit.

I suggest you look there.

Monkey's are, like, natures humans.

##### Share on other sites

Do you know if there are dictionaries already done for Tesseract?

M.

##### Share on other sites

Is it trainable ?