Jump to content

Recommended Posts

Posted

Just a quick question, since I read all the threads about but something is still missing (by my side..)

Is it possible to use Tesseract as a stand alone Library (i.e. DLL) without installing it?

I mean, if I build my script and give to a friend (complete with eventually the Dlls), will he be able to use it without installing Tesseract?

Thanks a lot,

Marco

Posted

Well, I guess there is NO Tesseract DLL

.

What I'm saying is that (and I guess it's confirmed) Tesseract can't be used unless it's installed.

And I'm looking for something that needs no installation

I found, for example, GOCR (http://jocr.sourceforge.net/). Good product put he doesn't work with Jpg (_screencapture) so I need "another" tool to work with.. and I got Topng (http://www.softpedia.com/get/Multimedia/Audio/Audio-Convertors/ToPNG.shtml).

But this triple passage (_screencapture -> Topng -> Gocr) is time-consuming and the end results are not good.

I found also in another program a DLL (ocrdll.dll) but I know anything about that and don't know how to call it.

So, what I'm looking for is a good tool to OCR .jpg done as API or as stand-alone solution

Hope someone can intervene here to clarify a bit more the situation.

Regards,

Marco

Posted

I think not, I tried from another computer where Tesseract is not installed and if i run it (standard run from Dos Prompt) I get an error: impossible to run... because leptonlib.dll is not installed...

That confirms that if you don't install it before you can't use it.

Meanwhile i found that ocrdll.dll has several interesting commands (I used dllviewer to get them)

Commands are:

getletter
 getlettertemplate
 lettersize
 readnextchar

But what I'm missing is how to call them. I tried

#include <array.au3>
$image = @ScriptDir & "\test1.jpg"
dim $result
$dll = DllOpen("ocrDll.dll")
$result = DllCall($dll,"int","readNextChar","hwnd",0, "img", $image)
DllClose($dll)
msgbox(0,"...",$result)
;~  getletter
;~  getlettertemplate
;~  lettersize
;~  readnextchar

but got no results. I'll include the dll aswell.

Maybe you (or someonelse) can help me on that

ocrDll.dll

Posted

There are all sorts of different versions of teseract , and I am certain one of them needs only the executable and a few data files in the same dir.

If I can find the version I'll give you a link.

Meanwhile, that call to the dl library certainly will not work, I'm no dll expert but autoit dllcall certainly does not have a type 'img'

It might be a handle to a hbitmap or a hundred other things, and without the documentation you are flogging a dead horse.

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Posted

Here is a link to the files along with a very basic example.

Tesseract command line example about 1.4MB

$s_Image_InputFile = @ScriptDir & "\test.bmp"
$s_OCR_OutputFile = @ScriptDir & "\in"

$result = _TessOcr($s_Image_InputFile, $s_OCR_OutputFile)

MsgBox(0,"Result",$result)

Func _TessOcr($in_image, $out_file)
    Local $Read
    ShellExecuteWait(@ScriptDir & "\tesseract.exe", '"' & $in_image & '" "' & $out_file & '"', Default, Default, @SW_HIDE)
    If @error Then
        MsgBox(0,"Error","ShellExecuteWait Error")
        Exit
    EndIf
    If FileExists($out_file & ".txt") Then
        $Read = FileRead($out_file & ".txt")
        FileDelete($out_file & ".txt")
    Else
        $Read = "No file created"
    EndIf
    Return $Read
EndFunc   ;==>_TessOcr

All the files needed to run the code are in the rar, you should just be able to run as is, nothing needs installing

famous last words :huh2:

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Posted

Is there a way to train it? Since I tried to let him ocr some bmp with terrible results.

I saw in wiki something about training, and I tried, but got the following:

G:\AutoIT\tesse>tesseract /eng/fontfile.tif junk nobatch box.train
read_variables_file:Can't open ./tessdata/configs/box.trainCould not open file,
nobatch

So surely 2.04 it's just a .exe and needs nothing else, but for training I suppose i need some more files, isn't it?

And I had to test it on another PC. My one (with previous installed and uninstalled Tesseract 3.0) gives errors since Tesseract.exe looks for installation dir (due to registry keys I think).

Posted

Hmm I tried training in another way:

C:\Users\___\Desktop\tess>tesseract \tessdata\configs\eng.arial.tif junk nobatch eng.arial.box

I get the error:

...
...
read_variables_file:variable not found: v
read_variables_file:variable not found: e
read_variables_file:variable not found: l
read_variables_file:variable not found: o
read_variables_file:variable not found: c
read_variables_file:variable not found: a
read_variables_file:variable not found: l
read_variables_file:variable not found: I
read_variables_file:variable not found: '
read_variables_file:variable not found: m
read_variables_file:variable not found: b
read_variables_file:variable not found: e
read_variables_file:variable not found: i
read_variables_file:variable not found: n
read_variables_file:variable not found: g
read_variables_file:variable not found: T
read_variables_file:variable not found: O
read_variables_file:variable not found: w
read_variables_file:variable not found: e
read_variables_file:variable not found: b
Could not open file, nobatch

I tried to give him a file to ocr but results are bad, so definitely I need to train it in some ways.

M.

  • 1 year later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...