Jump to content
Sign in to follow this  
BeBoP

Removing every color except a subset of specified colors from a bitmap.

Recommended Posts

BeBoP

I'm trying to OCR some text, however the images have a lot of background noise and the software doesn't work handle that well.

Fortunately the text has a unique color, so in order to improve results I filter out the noise by setting the unwanted pixels to pure white and the text color values to pure black.

This results in black and white image with only the text.

Currently I figured out a couple of ways of doing this however I thought it might be a good idea to ask for advice here on doing this more efficiently.

Things I've tried:

Using GDI GetPixel and SetPixel in a for loop going through all the x and y values.

Same as above but using BitmapLockbits and reading/writing the values inside a DLL structure.

Using BitmapLockbits but instead of going through every value pixel by pixel I used StringRegExpReplace(), which seems to be the fastest so far. However I'm struggling to wrap my head around RegExp I imagine there has to be a better way to write this.
 

$sResult = StringRegExpReplace($sResult, "F01BF7FF|F21BF9FF|EE1BF5FF", "--------")
$sResult = StringRegExpReplace($sResult, "[0-9 A-F]{8}", "FFFFFFFF")
$sResult = StringRegExpReplace($sResult, "--------", "000000FF")

 

Thank you for any help.

Share this post


Link to post
Share on other sites
AndyG

Hi,

It would be good if you could show a picture. There are several methods to optimize a picture for an input to an OCR.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×