KhalidAnsari Posted January 8, 2019 Posted January 8, 2019 Hi, I need to read specific location from .pdf that has digital signature on it. I have gone through forum related this, I have seen screen capture and Tessaract example. I don't known how to get exact location. Please help me out with this. Thanks, Khalid Ansari
FrancescoDiMuro Posted January 8, 2019 Posted January 8, 2019 @KhalidAnsari Digging a little bit on the Internet, you can read data from a PDF file without using Tesseract OCR. You can directly use VBA (translate it to AutoIt) Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KhalidAnsari Posted January 8, 2019 Author Posted January 8, 2019 Hi @FrancescoDiMuro Thanks for reply. I have converted this .net but i need to read digital signature that is in image format. So i though i should use Tessaract and screen capture. Any other help will be appreciated. Thanks, Khalid
FrancescoDiMuro Posted January 8, 2019 Posted January 8, 2019 @KhalidAnsari Then you can take a look at this UDF (or something similiar available on the Forum) Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KhalidAnsari Posted January 9, 2019 Author Posted January 9, 2019 (edited) Hi @FrancescoDiMuro Thanks for reply. I can able to find the position through screen capture UDF. Out i am getting in Junk character. Actual text = "Login For Chat" Output = "Ln-gin fur Chat |" Am I missing any character recognition for English? Following are the tessdata file detail Thanks, Khalid Edited January 9, 2019 by KhalidAnsari
FrancescoDiMuro Posted January 9, 2019 Posted January 9, 2019 @KhalidAnsari How do you call the Tesseract OCR from your script? Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KhalidAnsari Posted January 9, 2019 Author Posted January 9, 2019 Hi Thanks for quick reply. I am calling it this way Local $TESS_EXE="C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
FrancescoDiMuro Posted January 9, 2019 Posted January 9, 2019 @KhalidAnsari Could you please post the entire code you are using? Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KhalidAnsari Posted January 9, 2019 Author Posted January 9, 2019 @FrancescoDiMuro Attached is my sample file which i am using.OcrScreenCapture.au3 Thanks, Khalid
FrancescoDiMuro Posted January 9, 2019 Posted January 9, 2019 (edited) @KhalidAnsari As mentioned in the Tesseract OCR Wiki, try to use the parameter -l: $TESS_PARAMS= $TIF_FILENAME & " output -l eng" Edited January 9, 2019 by FrancescoDiMuro Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KhalidAnsari Posted January 9, 2019 Author Posted January 9, 2019 @FrancescoDiMuro Thanks for quick reply. I m making change then I post my output. Thanks, Khalid
KhalidAnsari Posted January 9, 2019 Author Posted January 9, 2019 (edited) Hi @FrancescoDiMuro Same out put after changing that line. $TESS_PARAMS= $TIF_FILENAME & " output -l eng" . Below is my tiff image. thanks Khalidtest.tif Edited January 9, 2019 by KhalidAnsari
FrancescoDiMuro Posted January 9, 2019 Posted January 9, 2019 @KhalidAnsari Sorry if I ask, but could you provide the .pdf or the download link? Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KhalidAnsari Posted January 9, 2019 Author Posted January 9, 2019 (edited) Hi @FrancescoDiMuro Thanks for reply. I really appreciate ur effort. Following this I am doing 1. Read the Application Title name. Based on that I will use winWait till that application window load. 2. Then i need to pass data from application. 3. Then i need to open pdf file and again i need to read OCr detail for specific location of pdf . Point 1 and 3 are for OCR. Attaching sample pdf which don't have digital signature. Thanks, Khalid invoice1.pdf Edited January 9, 2019 by KhalidAnsari
KhalidAnsari Posted January 12, 2019 Author Posted January 12, 2019 Hi Any suggestions or help Thanks
FrancescoDiMuro Posted January 12, 2019 Posted January 12, 2019 @KhalidAnsari A little research on the Forum Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
KhalidAnsari Posted January 22, 2019 Author Posted January 22, 2019 (edited) Hi @FrancescoDiMuro I have search the forum and able to read the pdf and tiff file. I am facing difficulty while read this type of hand written scanned pdf data. Out put result is in txt file. Please review it Result.txt Edited January 22, 2019 by KhalidAnsari
KhalidAnsari Posted February 7, 2019 Author Posted February 7, 2019 Hi @FrancescoDiMuro Any suggestion will be appreciated Thanks,
FrancescoDiMuro Posted February 7, 2019 Posted February 7, 2019 @KhalidAnsari Tesseract OCR is a free OCR software, so, it has its limits. If you want more from you're OCR software, you may spend some money with a kore reliable (and non-free) software, which could help you with this kind of recognition Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now