ovideo Posted September 11, 2008 Share Posted September 11, 2008 (edited) Hi. A friend has asked me to try to make his life a bit easier at work by doing a bit of programming, and although i do not expect to be able to do it entirely, i thought i'd give it a try. It's all about his company's database: I need to create a system that will gather text from certain pieces of paper (it's always the same font, and the same size, and the papers are new so i guess an ocr would do the job quite well), and print it (well.. parts of it) on a form, while also recording all of the data plus some typed on a keyboard to some sort of database (and it would get quite large so i need something fast and... well.. reliable). if you guys have any suggestions on what i should use : a database (maybe sql but i'm open to suggestions), an ocr software that can be taught to work with just one specific font for maximum accuracy, and a programming language that can work with that database, the ocr, and generate a printable page from the data in the database (and please-oh-please say autoit here ) also, basic tutorials for everything you suggest would be greatly appreciated. thx very much in advance. cheers! ps: i always take on big projects that i can't finish... ah, well... i'm young, maybe i'll grow out of it. Edited September 12, 2008 by ovideo Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted September 11, 2008 Moderators Share Posted September 11, 2008 http://www.autoitscript.com/forum/index.ph...c=50608&hl= Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
ovideo Posted September 11, 2008 Author Share Posted September 11, 2008 (edited) that looks great, thanx. i'll probably be back with questions soon. edit: wow this forum is great, and autoit too. i've just found a script example of SQLite semi Embedded database functionality, also by ptrex (many thx, by the way) so a big part of the project might just be about fiddling with his scripts (hope he doesn't mind). edit2: quick question: all the data has been stored in an excel spreadsheet so far, but i'm expecting about 100 000 entries (12 columns so thats 1.2mil cells) within a couple of years. would excel choke on that volume of data or can i just use that instead of SQL(ite)? There will be a lot of queries and waiting a minute or two every time would not be good. Edited September 12, 2008 by ovideo Link to comment Share on other sites More sharing options...
picaxe Posted September 12, 2008 Share Posted September 12, 2008 edit2: quick question: all the data has been stored in an excel spreadsheet so far, but i'm expecting about 100 000 entries (12 columns so thats 1.2mil cells) within a couple of years. would excel choke on that volume of data or can i just use that instead of SQL(ite)? There will be a lot of queries and waiting a minute or two every time would not be good.Excel is limited to 65535 rows/entries, so a dbase is the way to go. Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted September 12, 2008 Moderators Share Posted September 12, 2008 Excel is limited to 65535 rows/entries, so a dbase is the way to go.That depends on what version of Excel you are using. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
ovideo Posted September 12, 2008 Author Share Posted September 12, 2008 ok, so DB all the way, but from what i figure that should be the (relatively) easy part.i've run into a problem. MODI can't read the papers properly. i think it has to do with the fact that the text is not printed but rather... typewritten, i think, and sometimes a letter may touch another, making the ocr's accuracy quite poor (also I, and 1 look exactly alike).on the other hand, there is a fixed form in which the text comes: all letters are capitals, and on the first row there's always letter-letter-dash-number-number-dash-letter-letter-letter and so on for the rest of it. So i may work around it if i can "teach" the ocr these things, right? this is a sample of what i'm dealing with. it should read FIDA1105244 but it reads... FI)A1105214and the result is far worse on other parts. Any ideas, anyone? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now