BeBoP Posted May 5, 2018 Share Posted May 5, 2018 I'm trying to OCR some text, however the images have a lot of background noise and the software doesn't work handle that well. Fortunately the text has a unique color, so in order to improve results I filter out the noise by setting the unwanted pixels to pure white and the text color values to pure black. This results in black and white image with only the text. Currently I figured out a couple of ways of doing this however I thought it might be a good idea to ask for advice here on doing this more efficiently. Things I've tried: Using GDI GetPixel and SetPixel in a for loop going through all the x and y values. Same as above but using BitmapLockbits and reading/writing the values inside a DLL structure. Using BitmapLockbits but instead of going through every value pixel by pixel I used StringRegExpReplace(), which seems to be the fastest so far. However I'm struggling to wrap my head around RegExp I imagine there has to be a better way to write this. $sResult = StringRegExpReplace($sResult, "F01BF7FF|F21BF9FF|EE1BF5FF", "--------") $sResult = StringRegExpReplace($sResult, "[0-9 A-F]{8}", "FFFFFFFF") $sResult = StringRegExpReplace($sResult, "--------", "000000FF") Thank you for any help. Link to comment Share on other sites More sharing options...
AndyG Posted May 5, 2018 Share Posted May 5, 2018 Hi, It would be good if you could show a picture. There are several methods to optimize a picture for an input to an OCR. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now