Jump to content

Recommended Posts

Posted (edited)

Hi. A friend has asked me to try to make his life a bit easier at work by doing a bit of programming, and although i do not expect to be able to do it entirely, i thought i'd give it a try.

It's all about his company's database: I need to create a system that will gather text from certain pieces of paper (it's always the same font, and the same size, and the papers are new so i guess an ocr would do the job quite well), and print it (well.. parts of it) on a form, while also recording all of the data plus some typed on a keyboard to some sort of database (and it would get quite large so i need something fast and... well.. reliable).

if you guys have any suggestions on what i should use : a database (maybe sql but i'm open to suggestions), an ocr software that can be taught to work with just one specific font for maximum accuracy, and a programming language that can work with that database, the ocr, and generate a printable page from the data in the database (and please-oh-please say autoit here :) )

also, basic tutorials for everything you suggest would be greatly appreciated.

thx very much in advance. cheers!

ps: i always take on big projects that i can't finish... ah, well... i'm young, maybe i'll grow out of it.

Edited by ovideo
Posted (edited)

that looks great, thanx. i'll probably be back with questions soon.

edit: wow this forum is great, and autoit too. i've just found a script example of SQLite semi Embedded database functionality, also by ptrex (many thx, by the way) so a big part of the project might just be about fiddling with his scripts (hope he doesn't mind).

edit2: quick question: all the data has been stored in an excel spreadsheet so far, but i'm expecting about 100 000 entries (12 columns so thats 1.2mil cells) within a couple of years. would excel choke on that volume of data or can i just use that instead of SQL(ite)? There will be a lot of queries and waiting a minute or two every time would not be good.

Edited by ovideo
Posted

edit2: quick question: all the data has been stored in an excel spreadsheet so far, but i'm expecting about 100 000 entries (12 columns so thats 1.2mil cells) within a couple of years. would excel choke on that volume of data or can i just use that instead of SQL(ite)? There will be a lot of queries and waiting a minute or two every time would not be good.

Excel is limited to 65535 rows/entries, so a dbase is the way to go.
  • Moderators
Posted

Excel is limited to 65535 rows/entries, so a dbase is the way to go.

That depends on what version of Excel you are using.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Posted

ok, so DB all the way, but from what i figure that should be the (relatively) easy part.

i've run into a problem. MODI can't read the papers properly. i think it has to do with the fact that the text is not printed but rather... typewritten, i think, and sometimes a letter may touch another, making the ocr's accuracy quite poor (also I, and 1 look exactly alike).

on the other hand, there is a fixed form in which the text comes: all letters are capitals, and on the first row there's always letter-letter-dash-number-number-dash-letter-letter-letter and so on for the rest of it. So i may work around it if i can "teach" the ocr these things, right?

Posted Image

this is a sample of what i'm dealing with. it should read FIDA1105244 but it reads...

F

I

)A

11052

14

and the result is far worse on other parts.

Any ideas, anyone?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...