Sign in to follow this  
Followers 0
boxme

Getting Tesseract to Work

5 posts in this topic

I'm trying to get Tesseract to work using the example script here: https://www.autoitscript.com/forum/topic/174483-tesseract-simple-example/ Downloading the script and running it with the example image just gives me a blank readout. Someone else had the same problem here: https://www.autoitscript.com/forum/topic/174476-single-dll-file-for-ocr/#comment-1263034 but doesn't provide an explanation of how they fixed it. Has anyone else experienced this problem and know of a fix? 

Share this post


Link to post
Share on other sites



I think you have to copy the files in the tessdata folder to "C:\Program Files\Tesseract\tessdata".

Share this post


Link to post
Share on other sites

I think you have to copy the files in the tessdata folder to "C:\Program Files\Tesseract\tessdata".

Tried creating a folder there and copying the data but it's still showing a blank readout when I run it. 

Share this post


Link to post
Share on other sites

It does not have to be "C:\Program Files". It depends on your Windows version. I tested on my old XP.

Open a "Command Prompt" in the folder where you have unpacked TesseractExample.zip. The folder where tesseract.exe is located.

Run this command (with your own path):

C:\WINDOWS\Temp\AutoIt\Tesseract\TesseractExample>tesseract.exe image.bmp output

Then you'll see an error like this:

Unable to load unicharset file C:\Program Files\Tesseract\tessdata/eng.unicharset

The error message includes the path where tesseract.exe expect eng.unicharset to be installed.

Share this post


Link to post
Share on other sites

It does not have to be "C:\Program Files". It depends on your Windows version. I tested on my old XP.

Open a "Command Prompt" in the folder where you have unpacked TesseractExample.zip. The folder where tesseract.exe is located.

Run this command (with your own path):

C:\WINDOWS\Temp\AutoIt\Tesseract\TesseractExample>tesseract.exe image.bmp output

Then you'll see an error like this:

Unable to load unicharset file C:\Program Files\Tesseract\tessdata/eng.unicharset

The error message includes the path where tesseract.exe expect eng.unicharset to be installed.

Thank you, this worked perfectly!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • newITman
      By newITman
      HI All!
      Im new here and interested in  tesseract ocr.
      There are many examples in the forum but too difficult to me .
      I just want to see how its working in few line cod .
      I have installed  tesseract and microsoft office 2003 .
      My cod:
      $ImageToReadPath = @MyDocumentsDir & "\GDIPlus_Image10.jpg"
      $ResultTextPath = @MyDocumentsDir & "\Result"
      $OutPutPath = $ResultTextPath & "auto.txt"
      ;$TesseractExePath = @ProgramsDir & "\Tesseract.exe"
      $TesseractExePath =@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe"
      ShellExecuteWait($TesseractExePath, '"' & $ImageToReadPath & '" "' & $ResultTextPath & '"', "", "", @SW_HIDE)
      If @error Then
          Exit MsgBox(0, "Error", @error)
      EndIf
      MsgBox(0, "Result", FileRead($OutPutPath))
      FileDelete($OutPutPath)
       
      Please help me.
      my picture:

    • nbg15
      By nbg15
      Hello everybody..
       
      i have this picture here *attached* and this script here: 
       
      $ImageToReadPath = @MyDocumentsDir & "\GDIPlus_Image2.jpg" $ResultTextPath = @MyDocumentsDir & "\Result" $OutPutPath = $ResultTextPath & ".txt" $TesseractExePath = @MyDocumentsDir & "\Tesseract.exe" ShellExecuteWait($TesseractExePath, '"' & $ImageToReadPath & '" "' & $ResultTextPath & '"', "", "", @SW_HIDE) If @error Then Exit MsgBox(0, "Error", @error) EndIf MsgBox(0, "Result", FileRead($OutPutPath)) FileDelete($OutPutPath)  
      but tesseract doesnt recognized the correct word... and gives me trash back...

      this is the image >> 
      and the result was >> "samm" 

      the image was an normal jpg and generated with this code here:
       
      _ScreenCapture_Capture(@MyDocumentsDir & "\GDIPlus_Image2.jpg", 712,268,853,284)
      Could anybody give me a hint what i can do better to get this easy image to text?
       
      thank u very much!!!
       
       
      Edit: i also tried to capture the screen as bmp with a higher resolution... nothing changed... 
       
       
      _ScreenCapture_SetBMPFormat(4) _ScreenCapture_Capture(@MyDocumentsDir & "\GDIPlus_Image.bmp", 712,279,853,295)  
    • JohnOne
      By JohnOne
      There has been many questions about using tesseract of late.
      Here is a very basic example which works for me, along with the exact version of standalone tesseract executable and English language data used
      I found it some time ago at a time I thought I needed it, I do not recall from where.
      $ImageToReadPath = @ScriptDir & "\image.bmp" $ResultTextPath = @ScriptDir & "\Result" $OutPutPath = $ResultTextPath & ".txt" $TesseractExePath = @ScriptDir & "\Tesseract.exe" ShellExecuteWait($TesseractExePath, '"' & $ImageToReadPath & '" "' & $ResultTextPath & '"', "", "", @SW_HIDE) If @error Then Exit MsgBox(0, "Error", @error) EndIf MsgBox(0, "Result", FileRead($OutPutPath)) FileDelete($OutPutPath) Some Answers:
      The files contained in the download, only support English language.
      From the only documentation I got with this version...
      Original Binaries and Source can be found here: http://code.google.com/p/tesseract-ocr/ I do not know where to get other languages support.
      I do not know if there is a later standalone version.
      I do not know why it does not read your image accurately.
      It does not have a virus in it.
      You can search the forums or internet to learn how to create / cut / copy / paste, or otherwise manipulate your own images.
      TesseractExample.zip
    • bazinguuh
      By bazinguuh
      Hi,
      I've been trying this UDF from
      I'm trying to get the text from a combobox in notepad font property but it only save a tiff file and does not return the array of texts.
      Thanks ahead.
      ShellExecute("notepad.exe") WinWaitActive("Untitled - Notepad") $hWnd = "[CLASS:Edit; INSTANCE:1]" Send("!O") Send("F") WinWaitActive("Font") _TesseractControlFind("Font", "", "[CLASS:ComboBox; INSTANCE:5]", "Western", 1, 0, "", 1, 1, 1, 5, 2, 0, 0, 0, 0, 1)