Jump to content

Tesseract edits strings?


Recommended Posts

 

Hey there, im currently working with tesseract.exe on Autoit. The tool has to read a word on the screen and then compare it to the sourcecode of other websites how often this word is written there. When the word is just typed in by hand to the Variable, the Regex function tells me that it found 26. But when tesseract is writing it to the variable Regex is not able to find it in Sourcecode. That means that 2 Strings which looks identical aren´t. Has anybody the same problem? What can I do?

suprisingly the handwritten string is larger than the other one.

 

Edited by lordsocke
Link to comment
Share on other sites

  • Moderators
7 minutes ago, lordsocke said:

Hello, I am currently working with tesseract.exe together with Autoit. In my program a read out word in a page source code should be searched. Unfortunately, whenever I return the variable to the regex, I get back "no matches". I write the variable however by hand on the variable bsp. $ substring = "hello" 

I get 26 hits. But why not if I take the variable directly? 

For the test I once compared both strings with each other. Surprisingly, the hand-typed string is larger than the other.

What can I do to get both strings equal

@lordsocke as this is an English-speaking forum, please use Google to translate your questions before posting. Alternatively, we do have the German forum: http://www.autoit.de
 

Edited by JLogan3o13

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

Post your script, we're not mind readers.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

OCR (tesseract is an OCR application) is not perfect.  I've had to automate annotation data in images, and at best, I'd just verify the text is at least 95% accurate.  there is a 'typo' function somewhere in this forum that will tell you how similar two strings are.

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

func tesseract()
_ScreenCapture_Capture(@DesktopDir&"\Image1.jpg",1223, 452,1647, 509)
Local $img_filename = "C:\Users\bla\Desktop\Image1.jpg"
$ocr_filename = "C:\Users\bla\Desktop\Image1"
$ocr_filename_and_ext ="C:\Users\bla\Desktop\Image1.txt"
Local $iPID = Run(@ComSpec & " /C " & "tesseract.exe """ & $img_filename & """ """ & $ocr_filename & """", @ProgramFilesDir & "\Tesseract-OCR", @SW_HIDE, $STDERR_CHILD + $STDOUT_CHILD)
ProcessWaitClose($iPID)
FileOpen($ocr_filename_and_ext)
$a1= FileRead($ocr_filename_and_ext)
FileClose($ocr_filename_and_ext)
EndFunc

tesseract()

FileWriteline("bla.txt",$a1)
FileClose("bla.txt")

FileOpen("bla.txt")
$substring = FileRead("bla.txt")
FileClose("bla.txt")


$compare = StringCompare($substring,"string")
ConsoleWrite($compare)

 

Link to comment
Share on other sites

  • Developers

Why are you using those FileOpen() and FileClose() statements in your script as they don't do anything for you?

Jos 

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

On 3.3.2018 at 5:02 PM, Jos said:

Why are you using those FileOpen() and FileClose() statements in your script as they don't do anything for you?

Jos 

no, they don´t it was just a try to build a work around in wich I wrote the string into a txt file and read it out afterwards. I hoped that could fix the higher size of the string

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...