Sign in to follow this  
Followers 0
marko001

Using Tesseract as DLL

13 posts in this topic

Just a quick question, since I read all the threads about but something is still missing (by my side..)

Is it possible to use Tesseract as a stand alone Library (i.e. DLL) without installing it?

I mean, if I build my script and give to a friend (complete with eventually the Dlls), will he be able to use it without installing Tesseract?

Thanks a lot,

Marco

Share this post


Link to post
Share on other sites



Uh, still noone? ^^

Share this post


Link to post
Share on other sites

Well, I guess there is NO Tesseract DLL

.

What I'm saying is that (and I guess it's confirmed) Tesseract can't be used unless it's installed.

And I'm looking for something that needs no installation

I found, for example, GOCR (http://jocr.sourceforge.net/). Good product put he doesn't work with Jpg (_screencapture) so I need "another" tool to work with.. and I got Topng (http://www.softpedia.com/get/Multimedia/Audio/Audio-Convertors/ToPNG.shtml).

But this triple passage (_screencapture -> Topng -> Gocr) is time-consuming and the end results are not good.

I found also in another program a DLL (ocrdll.dll) but I know anything about that and don't know how to call it.

So, what I'm looking for is a good tool to OCR .jpg done as API or as stand-alone solution

Hope someone can intervene here to clarify a bit more the situation.

Regards,

Marco

Share this post


Link to post
Share on other sites

As far as I recall you dont need to "install" tesseract at all, I've seen it on the forum, just cant remember where.

You can use its commandline

it was something like

ShellExecute("tesseract.exe", "imagefilename.tiff txtfilename.txt")


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

I think not, I tried from another computer where Tesseract is not installed and if i run it (standard run from Dos Prompt) I get an error: impossible to run... because leptonlib.dll is not installed...

That confirms that if you don't install it before you can't use it.

Meanwhile i found that ocrdll.dll has several interesting commands (I used dllviewer to get them)

Commands are:

getletter
 getlettertemplate
 lettersize
 readnextchar

But what I'm missing is how to call them. I tried

#include <array.au3>
$image = @ScriptDir & "\test1.jpg"
dim $result
$dll = DllOpen("ocrDll.dll")
$result = DllCall($dll,"int","readNextChar","hwnd",0, "img", $image)
DllClose($dll)
msgbox(0,"...",$result)
;~  getletter
;~  getlettertemplate
;~  lettersize
;~  readnextchar

but got no results. I'll include the dll aswell.

Maybe you (or someonelse) can help me on that

ocrDll.dll

Share this post


Link to post
Share on other sites

There are all sorts of different versions of teseract , and I am certain one of them needs only the executable and a few data files in the same dir.

If I can find the version I'll give you a link.

Meanwhile, that call to the dl library certainly will not work, I'm no dll expert but autoit dllcall certainly does not have a type 'img'

It might be a handle to a hbitmap or a hundred other things, and without the documentation you are flogging a dead horse.


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

Here is a link to the files along with a very basic example.

Tesseract command line example about 1.4MB

$s_Image_InputFile = @ScriptDir & "\test.bmp"
$s_OCR_OutputFile = @ScriptDir & "\in"

$result = _TessOcr($s_Image_InputFile, $s_OCR_OutputFile)

MsgBox(0,"Result",$result)

Func _TessOcr($in_image, $out_file)
    Local $Read
    ShellExecuteWait(@ScriptDir & "\tesseract.exe", '"' & $in_image & '" "' & $out_file & '"', Default, Default, @SW_HIDE)
    If @error Then
        MsgBox(0,"Error","ShellExecuteWait Error")
        Exit
    EndIf
    If FileExists($out_file & ".txt") Then
        $Read = FileRead($out_file & ".txt")
        FileDelete($out_file & ".txt")
    Else
        $Read = "No file created"
    EndIf
    Return $Read
EndFunc   ;==>_TessOcr

All the files needed to run the code are in the rar, you should just be able to run as is, nothing needs installing

famous last words :huh2:


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

Is there a way to train it? Since I tried to let him ocr some bmp with terrible results.

I saw in wiki something about training, and I tried, but got the following:

G:\AutoIT\tesse>tesseract /eng/fontfile.tif junk nobatch box.train
read_variables_file:Can't open ./tessdata/configs/box.trainCould not open file,
nobatch

So surely 2.04 it's just a .exe and needs nothing else, but for training I suppose i need some more files, isn't it?

And I had to test it on another PC. My one (with previous installed and uninstalled Tesseract 3.0) gives errors since Tesseract.exe looks for installation dir (due to registry keys I think).

Share this post


Link to post
Share on other sites

Hmm I tried training in another way:

C:\Users\___\Desktop\tess>tesseract \tessdata\configs\eng.arial.tif junk nobatch eng.arial.box

I get the error:

...
...
read_variables_file:variable not found: v
read_variables_file:variable not found: e
read_variables_file:variable not found: l
read_variables_file:variable not found: o
read_variables_file:variable not found: c
read_variables_file:variable not found: a
read_variables_file:variable not found: l
read_variables_file:variable not found: I
read_variables_file:variable not found: '
read_variables_file:variable not found: m
read_variables_file:variable not found: b
read_variables_file:variable not found: e
read_variables_file:variable not found: i
read_variables_file:variable not found: n
read_variables_file:variable not found: g
read_variables_file:variable not found: T
read_variables_file:variable not found: O
read_variables_file:variable not found: w
read_variables_file:variable not found: e
read_variables_file:variable not found: b
Could not open file, nobatch

I tried to give him a file to ocr but results are bad, so definitely I need to train it in some ways.

M.

Share this post


Link to post
Share on other sites

Do you know if there are dictionaries already done for Tesseract?

M.

Share this post


Link to post
Share on other sites

Is it trainable ?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0