Sign in to follow this  
Followers 0
YellowLab

Best Text Match to Known List

2 posts in this topic

I have tried to search the forums and did not find an answer to what I was looking for, likely I was searching for the wrong information. If I overlooked something please point me in the right direction.

I am using Tesseract on a scanned pdf image to convert that to text. The results are very good, but not 100% perfect. I do have some extra knowledge: the scanned text comes from a finite list. Is there a way to "check" the OCR result against the known list and return the closest match?

Again, if this is addressed somewhere else please point me in the proper direction.

YL


You can't see a rainbow without first experiencing the rain.

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I've had the same problems with MODI (Microsoft image-text converter)...luckily, there are some functions in examples forum where you can take an expected string, versus some string you get from OCR, and see how many characters are different...you can then loop through the possibilities until you find the one with the fewest character differences.  Unfortunately I don't recall the UDF, but do a search for typos, and you might get it.

Edit, best I could find:

https://www.autoitscript.com/forum/topic/149624-how-to-compare-2-strings-to-get-a-similarity-percent-in-result/?do=findComment&comment=1066750

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0