Jump to content
Sign in to follow this  
boxme

Getting Tesseract to Work

Recommended Posts

boxme

I'm trying to get Tesseract to work using the example script here: https://www.autoitscript.com/forum/topic/174483-tesseract-simple-example/ Downloading the script and running it with the example image just gives me a blank readout. Someone else had the same problem here: https://www.autoitscript.com/forum/topic/174476-single-dll-file-for-ocr/#comment-1263034 but doesn't provide an explanation of how they fixed it. Has anyone else experienced this problem and know of a fix? 

Share this post


Link to post
Share on other sites
LarsJ

Share this post


Link to post
Share on other sites
boxme

I think you have to copy the files in the tessdata folder to "C:\Program Files\Tesseract\tessdata".

Tried creating a folder there and copying the data but it's still showing a blank readout when I run it. 

Share this post


Link to post
Share on other sites
LarsJ

It does not have to be "C:\Program Files". It depends on your Windows version. I tested on my old XP.

Open a "Command Prompt" in the folder where you have unpacked TesseractExample.zip. The folder where tesseract.exe is located.

Run this command (with your own path):

C:\WINDOWS\Temp\AutoIt\Tesseract\TesseractExample>tesseract.exe image.bmp output

Then you'll see an error like this:

Unable to load unicharset file C:\Program Files\Tesseract\tessdata/eng.unicharset

The error message includes the path where tesseract.exe expect eng.unicharset to be installed.

Share this post


Link to post
Share on other sites
boxme

It does not have to be "C:\Program Files". It depends on your Windows version. I tested on my old XP.

Open a "Command Prompt" in the folder where you have unpacked TesseractExample.zip. The folder where tesseract.exe is located.

Run this command (with your own path):

C:\WINDOWS\Temp\AutoIt\Tesseract\TesseractExample>tesseract.exe image.bmp output

Then you'll see an error like this:

Unable to load unicharset file C:\Program Files\Tesseract\tessdata/eng.unicharset

The error message includes the path where tesseract.exe expect eng.unicharset to be installed.

Thank you, this worked perfectly!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • Miliardsto
      By Miliardsto
      Hello I wondering how to read more complicated text from image with Tesseract or other method.
      I used this script and it works with simple text on white background
       
      I need to read text which looks like this

      What I need to do?
      Download fonts or something from there https://github.com/tesseract-ocr/tesseract
      gimme some info please
       
       
    • PuneetTewani
      By PuneetTewani
      #include <IE.au3>
      #include <Tesseract.au3>
      #include <MsgBoxConstants.au3>
      #include <Math.au3>
      #include <FileConstants.au3>
      #include <StringConstants.au3>
      #include <File.au3>
      #include <ScreenCapture.au3>
      #include <sound.au3>
      #Include <WinAPI.au3>
      #include <Date.au3>
       
      $OCR_Result = _TesseractScreenCapture(0,"",1,2,220,660,500,730,1)
      $OCR_Result1 = _TesseractScreenCapture(0,"",1,2,220,660,500,730,1)
      $OCR_Result2 = _TesseractScreenCapture(0,"",1,2,220,660,500,730,1)
      $OCR_Result3 = _TesseractScreenCapture(0,"",1,2,220,660,500,730,1)

      $sound = _SoundStatus("C:\ExpertAdvisorBuyAlert.wav")
      while _nowtime < 3.30 pm
          If $sound = True Then
             if $OCR_Result1 > $OCR_Result2
             
          EndIf
      EndIf
      Wend
      Trying to ocr some values on chart in real time(once per minute) and buy/sell securities on basis of alert generated in my software.
      I am struck onto few steps.
      1. On Tesseract Screen Capture indentation parameters. How can we determine the exact parameters if I just want numeric values only.
      2. The Tesseract Screen Capture generates and error Obj1 on line 185 which needs to be resolved.
      3. Sometimes lines get overlapped with values. What to do in that case.
      3. Detecting the sound as and when it approaches and then comparing the ocr values to decide on either buy or sell.
      The values that needs to be fetched are encircled.

    • newITman
      By newITman
      HI All!
      Im new here and interested in  tesseract ocr.
      There are many examples in the forum but too difficult to me .
      I just want to see how its working in few line cod .
      I have installed  tesseract and microsoft office 2003 .
      My cod:
      $ImageToReadPath = @MyDocumentsDir & "\GDIPlus_Image10.jpg"
      $ResultTextPath = @MyDocumentsDir & "\Result"
      $OutPutPath = $ResultTextPath & "auto.txt"
      ;$TesseractExePath = @ProgramsDir & "\Tesseract.exe"
      $TesseractExePath =@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe"
      ShellExecuteWait($TesseractExePath, '"' & $ImageToReadPath & '" "' & $ResultTextPath & '"', "", "", @SW_HIDE)
      If @error Then
          Exit MsgBox(0, "Error", @error)
      EndIf
      MsgBox(0, "Result", FileRead($OutPutPath))
      FileDelete($OutPutPath)
       
      Please help me.
      my picture:

    • nbg15
      By nbg15
      Hello everybody..
       
      i have this picture here *attached* and this script here: 
       
      $ImageToReadPath = @MyDocumentsDir & "\GDIPlus_Image2.jpg" $ResultTextPath = @MyDocumentsDir & "\Result" $OutPutPath = $ResultTextPath & ".txt" $TesseractExePath = @MyDocumentsDir & "\Tesseract.exe" ShellExecuteWait($TesseractExePath, '"' & $ImageToReadPath & '" "' & $ResultTextPath & '"', "", "", @SW_HIDE) If @error Then Exit MsgBox(0, "Error", @error) EndIf MsgBox(0, "Result", FileRead($OutPutPath)) FileDelete($OutPutPath)  
      but tesseract doesnt recognized the correct word... and gives me trash back...

      this is the image >> 
      and the result was >> "samm" 

      the image was an normal jpg and generated with this code here:
       
      _ScreenCapture_Capture(@MyDocumentsDir & "\GDIPlus_Image2.jpg", 712,268,853,284)
      Could anybody give me a hint what i can do better to get this easy image to text?
       
      thank u very much!!!
       
       
      Edit: i also tried to capture the screen as bmp with a higher resolution... nothing changed... 
       
       
      _ScreenCapture_SetBMPFormat(4) _ScreenCapture_Capture(@MyDocumentsDir & "\GDIPlus_Image.bmp", 712,279,853,295)  
    • JohnOne
      By JohnOne
      There has been many questions about using tesseract of late.
      Here is a very basic example which works for me, along with the exact version of standalone tesseract executable and English language data used
      I found it some time ago at a time I thought I needed it, I do not recall from where.
      $ImageToReadPath = @ScriptDir & "\image.bmp" $ResultTextPath = @ScriptDir & "\Result" $OutPutPath = $ResultTextPath & ".txt" $TesseractExePath = @ScriptDir & "\Tesseract.exe" ShellExecuteWait($TesseractExePath, '"' & $ImageToReadPath & '" "' & $ResultTextPath & '"', "", "", @SW_HIDE) If @error Then Exit MsgBox(0, "Error", @error) EndIf MsgBox(0, "Result", FileRead($OutPutPath)) FileDelete($OutPutPath) Some Answers:
      The files contained in the download, only support English language.
      From the only documentation I got with this version...
      Original Binaries and Source can be found here: http://code.google.com/p/tesseract-ocr/ I do not know where to get other languages support.
      I do not know if there is a later standalone version.
      I do not know why it does not read your image accurately.
      It does not have a virus in it.
      You can search the forums or internet to learn how to create / cut / copy / paste, or otherwise manipulate your own images.
      TesseractExample.zip
×