Jump to content
Sign in to follow this  
syno

Searching a PDF document

Recommended Posts

syno

Hi guys

I am trying to put together a script that searches a document for a given string and when it finds it returns some result. The code I have to do this so far is:

File=FileRead("C:\Documents and Settings\My Documents\Software Testing\Autoit Scripts\Project\Calculation.pdf")
$res=StringInStr($File,"income")
if $res=0 then
    MsgBox(0,"No String Found","Please try again!")
    Exit
EndIf
MsgBox(0,"Yahoo","We have found one, It can be found at "&$res)

However even if the string 'income' excists in the PDF document, nothing is found. Is it possible to use StringInStr to search strings in a PDF document?

Thanks

Share this post


Link to post
Share on other sites
FireFox

@syno

-try to write txt file with your pdf and after your search done you can rewrite the pdf if there are changes.

-try to use string replace and if it return no error its that the string youre searching for exists :

$read = FileRead("yourfile.ext")
StringReplace($read, "income", "income")
If Not @error then
MsgBox(64, "income", "found !")
EndIf

Cheers, FireFox.


 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites
SpookMeister

I don't think so. PDF documents have a "proprietary" format, and can not be read directly like strings of a text file. It might be possible to do something like open the file in Adobe (or another PDF viewer) and perform a search... not sure how much help you will be able to find here on it though.

Edited by SpookMeister

[u]Helpful tips:[/u]If you want better answers to your questions, take the time to reproduce your issue in a small "stand alone" example script whenever possible. Also, make sure you tell us 1) what you tried, 2) what you expected to happen, and 3) what happened instead.[u]Useful links:[/u]BrettF's update to LxP's "How to AutoIt" pdfValuater's Autoit 1-2-3 Download page for the latest versions of Autoit and SciTE[quote]<glyph> For example - if you came in here asking "how do I use a jackhammer" we might ask "why do you need to use a jackhammer"<glyph> If the answer to the latter question is "to knock my grandmother's head off to let out the evil spirits that gave her cancer", then maybe the problem is actually unrelated to jackhammers[/quote]

Share this post


Link to post
Share on other sites
trancexx

Hi guys

I am trying to put together a script that searches a document for a given string and when it finds it returns some result. The code I have to do this so far is:

File=FileRead("C:\Documents and Settings\My Documents\Software Testing\Autoit Scripts\Project\Calculation.pdf")
$res=StringInStr($File,"income")
if $res=0 then
    MsgBox(0,"No String Found","Please try again!")
    Exit
EndIf
MsgBox(0,"Yahoo","We have found one, It can be found at "&$res)

However even if the string 'income' excists in the PDF document, nothing is found. Is it possible to use StringInStr to search strings in a PDF document?

Thanks

Text inside PDF file is compressed (Flate or LZW algorithm depending on application that created that file), that's why you cannot find that string.

Text portion(s) are easily detected but as I said before, compressed.


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites
syno

Ok, that makes sense.

In that case is there not some other way of converting the PDF document into a text file using Autoit for this purpose. If not then it looks like I will need to use a pdf to doc converter first...

Thanks for your help...

Share this post


Link to post
Share on other sites
ptrex

@all

Of course everything is searchable in a PC, if you use the right tools and approach.

For your needs you need to fall back on the native MS Indexing service and extend it with the DF Filter.

Read more over here MS Indexing Service

I hope this gets you started.

regards

ptrex

Share this post


Link to post
Share on other sites
trancexx

Ok, that makes sense.

In that case is there not some other way of converting the PDF document into a text file using Autoit for this purpose. If not then it looks like I will need to use a pdf to doc converter first...

Thanks for your help...

I've made request for LZW algorithm in machine code thread by Ward mainly for this purposes. Got no response, but I guess you never know. I'm still waiting.

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites
ReFran

With the free commandline tool PDFTK.exe you can uncompress and then search for text.

That only doesn't work with images, scanned documents, .... normal text is nor problem.

You can also search for text using autoit and

Adobe Reader using the menuitem find,

or using some Adobe JS-code to get the page(s) where it is on.

Which way you go depends also on the results you want - Page Numbers, or only occurence, or, ...

So you may state a little bit more.

Best regards,

Reinhard

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×