marko001 Posted June 1, 2011 Posted June 1, 2011 Just a quick question, since I read all the threads about but something is still missing (by my side..) Is it possible to use Tesseract as a stand alone Library (i.e. DLL) without installing it? I mean, if I build my script and give to a friend (complete with eventually the Dlls), will he be able to use it without installing Tesseract? Thanks a lot, Marco
JohnOne Posted June 2, 2011 Posted June 2, 2011 Can you post a link to this tesseract dll? AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
marko001 Posted June 2, 2011 Author Posted June 2, 2011 Well, I guess there is NO Tesseract DLL.What I'm saying is that (and I guess it's confirmed) Tesseract can't be used unless it's installed.And I'm looking for something that needs no installation I found, for example, GOCR (http://jocr.sourceforge.net/). Good product put he doesn't work with Jpg (_screencapture) so I need "another" tool to work with.. and I got Topng (http://www.softpedia.com/get/Multimedia/Audio/Audio-Convertors/ToPNG.shtml).But this triple passage (_screencapture -> Topng -> Gocr) is time-consuming and the end results are not good.I found also in another program a DLL (ocrdll.dll) but I know anything about that and don't know how to call it.So, what I'm looking for is a good tool to OCR .jpg done as API or as stand-alone solutionHope someone can intervene here to clarify a bit more the situation.Regards,Marco
JohnOne Posted June 2, 2011 Posted June 2, 2011 As far as I recall you dont need to "install" tesseract at all, I've seen it on the forum, just cant remember where. You can use its commandline it was something like ShellExecute("tesseract.exe", "imagefilename.tiff txtfilename.txt") AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
marko001 Posted June 2, 2011 Author Posted June 2, 2011 I think not, I tried from another computer where Tesseract is not installed and if i run it (standard run from Dos Prompt) I get an error: impossible to run... because leptonlib.dll is not installed... That confirms that if you don't install it before you can't use it. Meanwhile i found that ocrdll.dll has several interesting commands (I used dllviewer to get them) Commands are: getletter getlettertemplate lettersize readnextchar But what I'm missing is how to call them. I tried #include <array.au3> $image = @ScriptDir & "\test1.jpg" dim $result $dll = DllOpen("ocrDll.dll") $result = DllCall($dll,"int","readNextChar","hwnd",0, "img", $image) DllClose($dll) msgbox(0,"...",$result) ;~ getletter ;~ getlettertemplate ;~ lettersize ;~ readnextchar but got no results. I'll include the dll aswell. Maybe you (or someonelse) can help me on thatocrDll.dll
JohnOne Posted June 2, 2011 Posted June 2, 2011 There are all sorts of different versions of teseract , and I am certain one of them needs only the executable and a few data files in the same dir. If I can find the version I'll give you a link. Meanwhile, that call to the dl library certainly will not work, I'm no dll expert but autoit dllcall certainly does not have a type 'img' It might be a handle to a hbitmap or a hundred other things, and without the documentation you are flogging a dead horse. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
JohnOne Posted June 2, 2011 Posted June 2, 2011 Here is a link to the files along with a very basic example.Tesseract command line example about 1.4MB$s_Image_InputFile = @ScriptDir & "\test.bmp" $s_OCR_OutputFile = @ScriptDir & "\in" $result = _TessOcr($s_Image_InputFile, $s_OCR_OutputFile) MsgBox(0,"Result",$result) Func _TessOcr($in_image, $out_file) Local $Read ShellExecuteWait(@ScriptDir & "\tesseract.exe", '"' & $in_image & '" "' & $out_file & '"', Default, Default, @SW_HIDE) If @error Then MsgBox(0,"Error","ShellExecuteWait Error") Exit EndIf If FileExists($out_file & ".txt") Then $Read = FileRead($out_file & ".txt") FileDelete($out_file & ".txt") Else $Read = "No file created" EndIf Return $Read EndFunc ;==>_TessOcrAll the files needed to run the code are in the rar, you should just be able to run as is, nothing needs installingfamous last words AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
marko001 Posted June 2, 2011 Author Posted June 2, 2011 Is there a way to train it? Since I tried to let him ocr some bmp with terrible results. I saw in wiki something about training, and I tried, but got the following: G:\AutoIT\tesse>tesseract /eng/fontfile.tif junk nobatch box.train read_variables_file:Can't open ./tessdata/configs/box.trainCould not open file, nobatch So surely 2.04 it's just a .exe and needs nothing else, but for training I suppose i need some more files, isn't it? And I had to test it on another PC. My one (with previous installed and uninstalled Tesseract 3.0) gives errors since Tesseract.exe looks for installation dir (due to registry keys I think).
marko001 Posted June 3, 2011 Author Posted June 3, 2011 Hmm I tried training in another way: C:\Users\___\Desktop\tess>tesseract \tessdata\configs\eng.arial.tif junk nobatch eng.arial.box I get the error: ... ... read_variables_file:variable not found: v read_variables_file:variable not found: e read_variables_file:variable not found: l read_variables_file:variable not found: o read_variables_file:variable not found: c read_variables_file:variable not found: a read_variables_file:variable not found: l read_variables_file:variable not found: I read_variables_file:variable not found: ' read_variables_file:variable not found: m read_variables_file:variable not found: b read_variables_file:variable not found: e read_variables_file:variable not found: i read_variables_file:variable not found: n read_variables_file:variable not found: g read_variables_file:variable not found: T read_variables_file:variable not found: O read_variables_file:variable not found: w read_variables_file:variable not found: e read_variables_file:variable not found: b Could not open file, nobatch I tried to give him a file to ocr but results are bad, so definitely I need to train it in some ways. M.
JohnOne Posted June 3, 2011 Posted June 3, 2011 Well I dont know about all that, Its a tesseract issue you are dealing with, not autoit. I suggest you look there. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
marko001 Posted June 3, 2011 Author Posted June 3, 2011 Do you know if there are dictionaries already done for Tesseract? M.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now