Jump to content
lordsocke

Tesseract edits strings?

Recommended Posts

lordsocke
Posted (edited)

 

Hey there, im currently working with tesseract.exe on Autoit. The tool has to read a word on the screen and then compare it to the sourcecode of other websites how often this word is written there. When the word is just typed in by hand to the Variable, the Regex function tells me that it found 26. But when tesseract is writing it to the variable Regex is not able to find it in Sourcecode. That means that 2 Strings which looks identical aren´t. Has anybody the same problem? What can I do?

suprisingly the handwritten string is larger than the other one.

 

Edited by lordsocke

Share this post


Link to post
Share on other sites
JLogan3o13
Posted (edited)
7 minutes ago, lordsocke said:

Hello, I am currently working with tesseract.exe together with Autoit. In my program a read out word in a page source code should be searched. Unfortunately, whenever I return the variable to the regex, I get back "no matches". I write the variable however by hand on the variable bsp. $ substring = "hello" 

I get 26 hits. But why not if I take the variable directly? 

For the test I once compared both strings with each other. Surprisingly, the hand-typed string is larger than the other.

What can I do to get both strings equal

@lordsocke as this is an English-speaking forum, please use Google to translate your questions before posting. Alternatively, we do have the German forum: http://www.autoit.de
 

Edited by JLogan3o13

√-1 2^3 ∑ π, and it was delicious!

How to get your question answered on this forum!

Share this post


Link to post
Share on other sites
BrewManNH

Post your script, we're not mind readers.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites
jdelaney
Posted (edited)

OCR (tesseract is an OCR application) is not perfect.  I've had to automate annotation data in images, and at best, I'd just verify the text is at least 95% accurate.  there is a 'typo' function somewhere in this forum that will tell you how similar two strings are.

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
lordsocke
func tesseract()
_ScreenCapture_Capture(@DesktopDir&"\Image1.jpg",1223, 452,1647, 509)
Local $img_filename = "C:\Users\bla\Desktop\Image1.jpg"
$ocr_filename = "C:\Users\bla\Desktop\Image1"
$ocr_filename_and_ext ="C:\Users\bla\Desktop\Image1.txt"
Local $iPID = Run(@ComSpec & " /C " & "tesseract.exe """ & $img_filename & """ """ & $ocr_filename & """", @ProgramFilesDir & "\Tesseract-OCR", @SW_HIDE, $STDERR_CHILD + $STDOUT_CHILD)
ProcessWaitClose($iPID)
FileOpen($ocr_filename_and_ext)
$a1= FileRead($ocr_filename_and_ext)
FileClose($ocr_filename_and_ext)
EndFunc

tesseract()

FileWriteline("bla.txt",$a1)
FileClose("bla.txt")

FileOpen("bla.txt")
$substring = FileRead("bla.txt")
FileClose("bla.txt")


$compare = StringCompare($substring,"string")
ConsoleWrite($compare)

 

Share this post


Link to post
Share on other sites
lordsocke

Okay, I found the problem. Tesseract made 2 invisible letters on the end of the String.  This works fine. Don´t know why.

$substring = StringTrimRight ($substring, 2)

Share this post


Link to post
Share on other sites
lordsocke
On 3.3.2018 at 5:02 PM, Jos said:

Why are you using those FileOpen() and FileClose() statements in your script as they don't do anything for you?

Jos 

no, they don´t it was just a try to build a work around in wich I wrote the string into a txt file and read it out afterwards. I hoped that could fix the higher size of the string

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×