Jump to content

Searching a PDF document


syno
 Share

Recommended Posts

Hi guys

I am trying to put together a script that searches a document for a given string and when it finds it returns some result. The code I have to do this so far is:

File=FileRead("C:\Documents and Settings\My Documents\Software Testing\Autoit Scripts\Project\Calculation.pdf")
$res=StringInStr($File,"income")
if $res=0 then
    MsgBox(0,"No String Found","Please try again!")
    Exit
EndIf
MsgBox(0,"Yahoo","We have found one, It can be found at "&$res)

However even if the string 'income' excists in the PDF document, nothing is found. Is it possible to use StringInStr to search strings in a PDF document?

Thanks

Link to comment
Share on other sites

@syno

-try to write txt file with your pdf and after your search done you can rewrite the pdf if there are changes.

-try to use string replace and if it return no error its that the string youre searching for exists :

$read = FileRead("yourfile.ext")
StringReplace($read, "income", "income")
If Not @error then
MsgBox(64, "income", "found !")
EndIf

Cheers, FireFox.

Link to comment
Share on other sites

I don't think so. PDF documents have a "proprietary" format, and can not be read directly like strings of a text file. It might be possible to do something like open the file in Adobe (or another PDF viewer) and perform a search... not sure how much help you will be able to find here on it though.

Edited by SpookMeister

[u]Helpful tips:[/u]If you want better answers to your questions, take the time to reproduce your issue in a small "stand alone" example script whenever possible. Also, make sure you tell us 1) what you tried, 2) what you expected to happen, and 3) what happened instead.[u]Useful links:[/u]BrettF's update to LxP's "How to AutoIt" pdfValuater's Autoit 1-2-3 Download page for the latest versions of Autoit and SciTE[quote]<glyph> For example - if you came in here asking "how do I use a jackhammer" we might ask "why do you need to use a jackhammer"<glyph> If the answer to the latter question is "to knock my grandmother's head off to let out the evil spirits that gave her cancer", then maybe the problem is actually unrelated to jackhammers[/quote]

Link to comment
Share on other sites

Hi guys

I am trying to put together a script that searches a document for a given string and when it finds it returns some result. The code I have to do this so far is:

File=FileRead("C:\Documents and Settings\My Documents\Software Testing\Autoit Scripts\Project\Calculation.pdf")
$res=StringInStr($File,"income")
if $res=0 then
    MsgBox(0,"No String Found","Please try again!")
    Exit
EndIf
MsgBox(0,"Yahoo","We have found one, It can be found at "&$res)

However even if the string 'income' excists in the PDF document, nothing is found. Is it possible to use StringInStr to search strings in a PDF document?

Thanks

Text inside PDF file is compressed (Flate or LZW algorithm depending on application that created that file), that's why you cannot find that string.

Text portion(s) are easily detected but as I said before, compressed.

♡♡♡

.

eMyvnE

Link to comment
Share on other sites

Ok, that makes sense.

In that case is there not some other way of converting the PDF document into a text file using Autoit for this purpose. If not then it looks like I will need to use a pdf to doc converter first...

Thanks for your help...

Link to comment
Share on other sites

@all

Of course everything is searchable in a PC, if you use the right tools and approach.

For your needs you need to fall back on the native MS Indexing service and extend it with the DF Filter.

Read more over here MS Indexing Service

I hope this gets you started.

regards

ptrex

Link to comment
Share on other sites

Ok, that makes sense.

In that case is there not some other way of converting the PDF document into a text file using Autoit for this purpose. If not then it looks like I will need to use a pdf to doc converter first...

Thanks for your help...

I've made request for LZW algorithm in machine code thread by Ward mainly for this purposes. Got no response, but I guess you never know. I'm still waiting.

♡♡♡

.

eMyvnE

Link to comment
Share on other sites

With the free commandline tool PDFTK.exe you can uncompress and then search for text.

That only doesn't work with images, scanned documents, .... normal text is nor problem.

You can also search for text using autoit and

Adobe Reader using the menuitem find,

or using some Adobe JS-code to get the page(s) where it is on.

Which way you go depends also on the results you want - Page Numbers, or only occurence, or, ...

So you may state a little bit more.

Best regards,

Reinhard

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...