Sign in to follow this  
Followers 0
neogia

Trying out OCR

8 posts in this topic

Well, I do alot of PC setups and in order to install Norton Antivirus, I first have to uninstall the trial version. This I wish to automate. Only trouble is, there's a security image, where you have to decipher the letters. Well.. I know there are infinitely easier ways to go about this whole thing, but I've always wanted to dip into developing my own Optical Character Recognition algorithm. So I've started, done... well, okay, I suppose. But I'm at a standstill. I can clean up the image, taking out most of the noise, but then I can't figure out how to go about actually recognizing the characters. Maybe it's a lost cause, maybe not. If you want to look at the script, I've uploaded the files. Bear in mind, I wasn't writing this to be release-worthy, so It's a bit messy, and not well commented. However, if you just want to see a cool algorithm do its, stuff, just put all the files in the same directory and run OCR.au3, and enjoy!

Please let me know if any of you decide to improve on the algorithm.

OCR.au3

TestOCR.au3

security.bmp


[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites



Well, I do alot of PC setups and in order to install Norton Antivirus, I first have to uninstall the trial version. This I wish to automate. Only trouble is, there's a security image, where you have to decipher the letters. Well.. I know there are infinitely easier ways to go about this whole thing, but I've always wanted to dip into developing my own Optical Character Recognition algorithm. So I've started, done... well, okay, I suppose. But I'm at a standstill. I can clean up the image, taking out most of the noise, but then I can't figure out how to go about actually recognizing the characters. Maybe it's a lost cause, maybe not. If you want to look at the script, I've uploaded the files. Bear in mind, I wasn't writing this to be release-worthy, so It's a bit messy, and not well commented. However, if you just want to see a cool algorithm do its, stuff, just put all the files in the same directory and run OCR.au3, and enjoy!

Please let me know if any of you decide to improve on the algorithm.

Not gonna happen with AutoIt... and to my knowledge, these haven't been cracked yet for an OCR anywhere.


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

Not gonna happen with AutoIt... and to my knowledge, these haven't been cracked yet for an OCR anywhere.

I'm pretty sure I've already come to that conclusion, if you couldn't tell by my exasperation. But did you try the algorithm? I'm pretty pleased with where it's gotten to...


[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites

I'm pretty sure I've already come to that conclusion, if you couldn't tell by my exasperation. But did you try the algorithm? I'm pretty pleased with where it's gotten to...

I dont know if you have done a search, but many people have created OCR's in AutoIt. I am not sure what exactly you are trying to accomplish further as I havent actually worked with any OCR's, but I figured I would link you to a few others work.

Link 1

Link 2

Those are the only two I could find in Scripts and Scraps forum.

JS


AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites

I'm pretty sure I've already come to that conclusion, if you couldn't tell by my exasperation. But did you try the algorithm? I'm pretty pleased with where it's gotten to...

Looks good neogia, there's a few on here as JS said... This is really right along the same lines as them, you might get some good ideas from the links that JS provided on how to improve them.

I've written a few... I seemed to take a different path than most, If I ever decide to make them UDF worthy, I'll post them on here (maybe in this life time lol). I think that it was 'pingpong24' (if it wasn't pingpong24 i'm sorry for getting that wrong) that wrote one that he insist is 'the best' I hadn't looked at it, but if he's that proud of it... take a peak, it couldn't hurt.


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

Read about Captchas:

http://en.wikipedia.org/wiki/Captcha

http://www.brains-n-brawn.com/default.aspx?vDir=aicaptcha

http://www.ocr-research.org.ua/list.html

The first thing you need to do is think about the problem.

What constraints are one the possible inputs? Perhaps each image will always be four uppercase letters?

How do you want to go about "cleaning up noise"?

If you look at the image in mspaint and go to View > Zoom > Large Size

you might notice a patten to the light colored noise (e.g., the light colored noise is a 2x2 pixel square when inside a letter but generally only a 1x1 pixel when outside a letter)

One of the links states observations and assumptions about certain captchas:

characters are aligned horizontally

characters don't overlap

etc.


Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!

Share this post


Link to post
Share on other sites

How do you want to go about "cleaning up noise"?

If you look at the image in mspaint and go to View > Zoom > Large Size

you might notice a patten to the light colored noise (e.g., the light colored noise is a 2x2 pixel square when inside a letter but generally only a 1x1 pixel when outside a letter)

That's actually exactly what I've done. I turn the 2x2 squares into the dark color, then the 1x1 into white, then I use a recursive algorithm to calculate the perimeter of any dark piece in the picture, and if it's less than the "$pix" (tolerance) value, around 50 pixels, then I delete that piece. The only thing left is to define the edges of the letters, and do some matching.

And I just realized why you guys probably haven't run the algorithm for yourself. I just looked through my OCR.au3, and it calls Run(@ComSpec & " /c Start C:\TestOCR.exe") so you'll have to either compile TestOCR.au3 and throw it in the C:\ directory or change the run line to get it to work. Sorry, I should've checked that.

@CyberSlug: You should take a look at it, it does exactly what you just described.


[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0