Sign in to follow this  
Followers 0
Oldschool

need a GOCR sample please

16 posts in this topic

#1 ·  Posted (edited)

Does anyone have a sample of GOCR image processing from AutoIt please.

Basically grabbing a grey scale rectangle screen shot of a word in an application and precessing it with GOCR.

I'm trying to make an OCR module.

I've read this topic:

http://www.autoitscript.com/forum/index.php?showtopic=10891

Very handy OCR tutorial in there, however this does not take into account the various backgrounds that one can encounter when reading images.

I would have to somehow drop the image background before being able to PixelCheckSum characters.

And the letters I'm trying to read are white, which also present a problem.

Edited by Oldschool

Share this post


Link to post
Share on other sites



Thanks gseller, that's what it should have been right...

Your Drag_and_drop OCR is great, I've been using it.

I rewrote your Func _GetTextOCR() to use textract sdk

$oOCR = ObjCreate("TxtrCtl.TxtrCtl.1")
If Not IsObj($oOCR) Then MsgBox(0, "", "TxtrCtl.TxtrCtl.1 is not an onject")
    
$img = "C:\Pics\NameCapture12_NEW.bmp"
$oOCR.Init
$oUtput = $oOCR.ReadFile($img)
$Text =  $oOCR.Text 
;StringSplit($oUtput, Chr(13))
MsgBox(0, "", $Text)
$oOCR.TermoÝ÷ Øý½çèºwm«më^ÆÚÚrÔÞÆÓ]Ùe«)£«~éN  ޶׫¶§j|­)à#ozÇý¦åzÚ+y§h}ö¥-j|¢¨Â+ajX çm{[Êèqë,x-ë­ç§qªmº·¬~*ì·­)é'©äʶ·±¶¶¸¡öÚrH+¢éÝrh®êå

Share this post


Link to post
Share on other sites

Where can i get "TextOCX.dll " ?

Thanks in advance

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Where can i get "TextOCX.dll " ?

Thanks in advance

It's part of the textract SDK

http://www.structurise.com/textract/index.shtml

it's not free, but has a 40 day free trial, from what I understand it's engine is based on Tesseract OCR, which is open source

http://code.google.com/p/tesseract-ocr/

If I was good at VS I would just compile me a .dll from tesseract source, but since I'm not good at VS, well I'm stuck with textract.

Edited by Oldschool

Share this post


Link to post
Share on other sites

Nice!! I will have to checkout the textract SDK sounds like it may be able to distinguish between a darker background with light text. Thanks.. And yes, gseller is correct.. LOL :)

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Nice!! I will have to checkout the textract SDK sounds like it may be able to distinguish between a darker background with light text. Thanks.. And yes, gseller is correct.. LOL :)

You bet!

Check out this example...

Starting image:

Posted Image

Processed by:

$oMyError = ObjEvent("AutoIt.Error","MyErrFunc")


Dim $img
Dim $ret


$oMagic = ObjCreate("ImageMagickObject.MagickImage.1")  
    
$ret = $oMagic.Convert("C:\Pics\example.bmp", _
    "-black-threshold", "100%", "C:\Pics\example_NEW.bmp")

$oOCR = ObjCreate("TxtrCtl.TxtrCtl.1")
If Not IsObj($oOCR) Then MsgBox(0, "", "TxtrCtl.TxtrCtl.1 is not an onject")
    
$img = "C:\Pics\example_NEW.bmp"
$oOCR.Init
$oUtput = $oOCR.ReadFile($img)
$Text =  $oOCR.Text 
MsgBox(0, "", $Text)
$oOCR.Term


Func MyErrFunc()
  $HexNumber=hex($oMyError.number,8)
  Msgbox(0,"COM Error Test","We intercepted a COM Error !"     & @CRLF  & @CRLF & _
             "err.description is: " & @TAB & $oMyError.description   & @CRLF & _
             "err.windescription:"   & @TAB & $oMyError.windescription & @CRLF & _
             "err.number is: "     & @TAB & $HexNumber        & @CRLF & _
             "err.lastdllerror is: "   & @TAB & $oMyError.lastdllerror   & @CRLF & _
             "err.scriptline is: "   & @TAB & $oMyError.scriptline     & @CRLF & _
             "err.source is: "     & @TAB & $oMyError.source         & @CRLF & _
             "err.helpfile is: "       & @TAB & $oMyError.helpfile     & @CRLF & _
             "err.helpcontext is: " & @TAB & $oMyError.helpcontext _
            )
  SetError(1) ; to check for after this function returns
Endfunc

Exit

Produces:

Posted Image

P.S. You need the COM version of image magic for this to work:

http://www.imagemagick.org/download/binari...windows-dll.exe

Edited by Oldschool

Share this post


Link to post
Share on other sites

I installed the image majic from your link and cannot get example to work. Is this link the version I need?

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

I installed the image majic from your link and cannot get example to work. Is this link the version I need?

Did you regsrv32 the dll? What's the error you are getting?

Should work...Try this instead:

ImageMagick-6.3.7-7-Q8-windows-dll

http://www.imagemagick.org/download/binari...windows-dll.exe

There are only 2 dll binaries so one of them should work:

http://www.imagemagick.org/download/binaries/

I just checked and I have the Q8 one...

Another issue that you may encounter is the fonts issue. The TextOCX.dll uses a font's db it builds during the install out of the fonts installed on your PC. I got about 142 default fonts in my XP install, if you install new fonts post textract install, you have to update the textract fonts db by rebuilding it through the textract GUI interface.

Hope this helps,

Cheers

Edited by Oldschool

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Here are my errors..

COM Error Test We intercepted a COM Error!

err. description is:

err. windescription:

err.number is:

err. Iastdllerror is:

err. scriptline is:

err.source is:

err .helpFile is:

err. helpcontext is:

Invalid class string

800401F3 14000 13 OK j

Then

TxtrCtl.TxtrCtl. 1 is not an onject OK J

Then

COM Error Test We intercepted a COM Error!

err, description is: Variab!e must be of type Object,

err. windescription:

err,number is:

err, Iastdllerror is:

err. scriptline is:

err.source is:

err.helpfile is:

err.helpcontext is:

000000Ag 0 17 OK ]

Then

Autolt Error QLine 18 (File C:\Sync Files\Autoit Rogue FIIes\OCR -

Working\Another Aproach\New Autolt v3 Script.au3): $oUtput = $oOCR.

ReadFile($img) $output = ERROR Error: Error in expression. OK j

The install that the link goes to said it registered the dll..

edit:

Ahh, I see what it is doing. it is taking the image and doing this to it. , LOL

Edited by gesller

Share this post


Link to post
Share on other sites

Still no luck. I installed the textract and see that the dll file is now named txtrocx.dll and is installed on my pc as well. Here is the code I am using. I am excited to see this work like you describe. I have only seen OCR work with light backgrounds and dark lettering.

$oMyError = ObjEvent("AutoIt.Error","MyErrFunc")


Dim $img
Dim $ret


$oMagic = ObjCreate("ImageMagickObject.MagickImage.1")  
    
$ret = $oMagic.Convert("C:\pics\878d935.bmp", _
    "-black-threshold", "100%", "C:\pics\878d935.bmp")

$oOCR = ObjCreate("TxtrCtl.TxtrCtl.1")
If Not IsObj($oOCR) Then MsgBox(0, "", "TxtrCtl.TxtrCtl.1 is not an onject")
    
$img = "C:\pics\878d935.bmp"
$oOCR.Init
$oUtput = $oOCR.ReadFile($img)
$Text =  $oOCR.Text 
MsgBox(0, "", $Text)
$oOCR.Term


Func MyErrFunc()
  $HexNumber=hex($oMyError.number,8)
  Msgbox(0,"COM Error Test","We intercepted a COM Error !"     & @CRLF  & @CRLF & _
             "err.description is: " & @TAB & $oMyError.description   & @CRLF & _
             "err.windescription:"   & @TAB & $oMyError.windescription & @CRLF & _
             "err.number is: "     & @TAB & $HexNumber        & @CRLF & _
             "err.lastdllerror is: "   & @TAB & $oMyError.lastdllerror   & @CRLF & _
             "err.scriptline is: "   & @TAB & $oMyError.scriptline     & @CRLF & _
             "err.source is: "     & @TAB & $oMyError.source         & @CRLF & _
             "err.helpfile is: "       & @TAB & $oMyError.helpfile     & @CRLF & _
             "err.helpcontext is: " & @TAB & $oMyError.helpcontext _
            )
  SetError(1) ; to check for after this function returns
Endfunc

Exit

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

make sure you regsrv32 the txtrocx.dll

If you getting the not an object error, it's not registered properly.

I think the other problem is I used tinypic.com to upload the original image, tinypic.com autoconverts images by default to another format that may have augmented the font. I notice that the .bmp you have posted post conversion is completely unreadable. See mine...

Try using the the .bmp that I attached. It definitely works.

P.S.

$ret = $oMagic.Convert("C:\pics\878d935.bmp", _
    "-black-threshold", "100%", "C:\pics\878d935.bmp")

Will overwrite the original...I'm not sure you want to do that.

example.bmp

example_NEW.bmp

Edited by Oldschool

Share this post


Link to post
Share on other sites

I did regsvr32 /s txtrocx.dll and I get a "c" in the MsgBox and then the image is burned. Posted Image I am wondering if the image Majic file is destroying it..

Share this post


Link to post
Share on other sites

I did regsvr32 /s txtrocx.dll and I get a "c" in the MsgBox and then the image is burned. Posted Image I am wondering if the image Majic file is destroying it..

I don't know why image magic is messing up that conversion like that. I just started using image magic myself, so I have no clue why mine works and yours does not...

Some setting on your PC....are you doing this on XP SP2? 32-bit color? are you sure it's a real .bmp you are starting with? I'm using the exact example I posted on the exact image I uploaded.

Share this post


Link to post
Share on other sites

I don't know. I will leep messing with it as time allows and see if I can get it straightened out. I would much rather work with just a couple dlls and extract text with images around it like you are doing than using word. Good work so far!

Share this post


Link to post
Share on other sites

It's part of the textract SDK

http://www.structurise.com/textract/index.shtml

it's not free, but has a 40 day free trial, from what I understand it's engine is based on Tesseract OCR, which is open source

http://code.google.com/p/tesseract-ocr/

If I was good at VS I would just compile me a .dll from tesseract source, but since I'm not good at VS, well I'm stuck with textract.

Try my new Tesseract UDF if you like -> Tesseract UDF.

It uses the Tesseract executable at the moment, though in the future I hope to create a DLL for AutoIT.


Cheers, Sean.

See my other UDFs:

Chrome UDF - Automate Chrome | SAP UDF - Automate SAP | Java UDF - Automate Java Applications & Applets | Tesseract (OCR) UDF - Capture text from applications, controls and the desktop | Textract (OCR) UDF - Capture text from applications and controls | FileSystemMonitor UDF - File, Folder, Drive and Shell Monitoring | VLC (Media Player) UDF - Creating and controlling a VLC control in AutoIT | Google Maps UDF - Creating and controlling Google Maps (inc. GE) in AutoIT | SAPIListBox (Speech Recognition) UDF - Speech Recognition via the Microsoft Speech (SAPI) ListBox | eBay UDF - Automate eBay using the eBay API | ChildProc (Parallel Processing) UDF - Parallel processing functions for AutoIT | HyperCam (Screen Recording) UDF - Automate the HyperCam screen recorder | Twitter UDF - Automate Twitter using OAuth and the Twitter API | cURL UDF - a UDF for transferring data with URL syntax

See my other Tools:

Rapid Menu Writer - Add menus to DVDs in seconds | TV Player - Automates the process of playing videos on an external TV / Monitor | Rapid Video Converter - A tool for resizing and reformatting videos | [topic130531]Rapid DVD Creator - Convert videos to DVD fast and for free | ZapPF - A tool for killing processes and recycling files | Sean's eBay Bargain Hunter - Find last minute bargains in eBay using AutoIT | Sean's GUI Inspector - A scripting tool for querying GUIs | TransLink Journey Planner with maps - Incorporating Google Maps into an Australian Journey Planner | Automate Qt and QWidgets | Brisbane City Council Event Viewer - See what's going on in Brisbane, Australia

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0