Sign in to follow this  
Followers 0
randallc

pdf to text?

14 posts in this topic




hi,

I know that people have this running;

I need a "quick and dirty" "Save pdf as text" to add to a script..

Has anyone done this? -; I'm exhausted with ideas, researching methods...

Randall

give me a minute and i'll try to whip something up. are we talking about a pdf that's being displayed, or just any arbitrary pdf file?

1100111 00001011101111 00011101101111 00010111100100 00001111110100 00110111110010 00101101111001 0011100i didn't make up this form of encryption, but i like it.credit to the lvl 6 challenge on arcanum.co.nz

Share this post


Link to post
Share on other sites

give me a minute and i'll try to whip something up. are we talking about a pdf that's being displayed, or just any arbitrary pdf file?

It's been 33 minutes already, you have 30 + ones done already? :lmao:


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

It's been 33 minutes already, you have 30 + ones done already? :lmao:

i actually haven't started yet because i asked for elaboration regarding what exactly the script should do. the last thing i want to do is write a script that doesn't do what it's supposed to. (again)

1100111 00001011101111 00011101101111 00010111100100 00001111110100 00110111110010 00101101111001 0011100i didn't make up this form of encryption, but i like it.credit to the lvl 6 challenge on arcanum.co.nz

Share this post


Link to post
Share on other sites

i actually haven't started yet because i asked for elaboration regarding what exactly the script should do. the last thing i want to do is write a script that doesn't do what it's supposed to. (again)

I resemble that remark!

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

You might be able to incorporate a third-party program such as http://www.pdf-to-html-word.com/pdf-to-text/

http://www.pdfzone.com/article2/0,1895,1864583,00.asp

Another route is to add a printer:

Port is "FILE:"

Manufacturer is "Generic"

Model is "Generic / Text Only"

But I don't know of a command line option with Adobe Reader (or Foxit Reader) for printing a PDF document.


Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!

Share this post


Link to post
Share on other sites

A COM based solution is available commercially for about $100 from

http://www.isedquickpdf.com

Using that library, I have developed a free PDF to text converter, including both GUI and command-line interfaces, available at

http://www.empowermentzone.com/p2tsetup.exe

neat

1100111 00001011101111 00011101101111 00010111100100 00001111110100 00110111110010 00101101111001 0011100i didn't make up this form of encryption, but i like it.credit to the lvl 6 challenge on arcanum.co.nz

Share this post


Link to post
Share on other sites

Hi,

Great!

Is the exe AutoIt? - ? source

But is it really that complicated?

You couldn't use this in your own poogram for copyright reasons, I presume; so how is it "free"?

We have at least one object....?

$oPDF = ObjCreate("AcroPDF.PDF.1");

$Version=$oPDF.GetVersions

Is it really hard to do with some obj calls?

It is going to be part of (or already is in) Word12 beta, so will be fairly easily accessible soon for many of us.

Randall

Share this post


Link to post
Share on other sites

My "PDF to TXT" program is written in PowerBasic. The DLL for manipulating PDF files can be distributed with a program that uses it, as long as the license key is not exposed (the source code initializes the library with a function call that passes the developer's license key as a parameter).

Let me clarify that the primary purpose of the library is creating PDF files and forms. I just took advantage of a feature that supports text extraction from an existing PDF.

Unfortunately, I don't think there is a free COM-based solution for converting a PDF to text--I looked far and wide. The Adobe COM libraries are commercial--in fact, they require an installation of Adobe Acrobat or equivalent.

There are other free executables (as opposed to COM or standard DLL libraries) for converting PDFs to text. Based on testing by myself and others, the one I developed with the ISEDQUICKPDF library works as well as any of them. Moreover, it permits convenient batch conversions, including converting all PDFs linked to an Internet web page.

Share this post


Link to post
Share on other sites

As far as I know, Adobe Reader allows one to open or print a PDF with command line parameters, but not save to text (doing that requires operation of the user interface). I have tried automating save as text by sending keystrokes, but found that it does not work reliably--at least for converting a batch of PDFs to text. Apparently, Adobe Reader is programmed to defy such automation attempts. No matter what delays I tried between keystrokes, etc., a lock up would occur. Also, Adobe Reader does not use standard menus, so the menu automation technique would not work either.

If anyone can demonstrate an automation solution with Adobe Reader, I would also be interested. In the meantime, my suggestion, based on experience, is to use a 3rd party command-line utility (for a free approach).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0