YellowLab Posted March 11, 2016 Share Posted March 11, 2016 I have tried to search the forums and did not find an answer to what I was looking for, likely I was searching for the wrong information. If I overlooked something please point me in the right direction. I am using Tesseract on a scanned pdf image to convert that to text. The results are very good, but not 100% perfect. I do have some extra knowledge: the scanned text comes from a finite list. Is there a way to "check" the OCR result against the known list and return the closest match? Again, if this is addressed somewhere else please point me in the proper direction. YL You can't see a rainbow without first experiencing the rain. Link to comment Share on other sites More sharing options...
jdelaney Posted March 12, 2016 Share Posted March 12, 2016 (edited) I've had the same problems with MODI (Microsoft image-text converter)...luckily, there are some functions in examples forum where you can take an expected string, versus some string you get from OCR, and see how many characters are different...you can then loop through the possibilities until you find the one with the fewest character differences. Unfortunately I don't recall the UDF, but do a search for typos, and you might get it. Edit, best I could find: https://www.autoitscript.com/forum/topic/149624-how-to-compare-2-strings-to-get-a-similarity-percent-in-result/?do=findComment&comment=1066750 Edited March 12, 2016 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now