Jump to content

Automate Adobe Reader or Foxit Reader "Save as text"


mkSa
 Share

Recommended Posts

Hello,

I would like to know if it is possible to create Autoit script which would use Adobe reader or Foxit reader to convert PDF to text. It is not option to use some other program because only these two give me result I need.

These two programs don't have command line actions for this option.

This script should be executed from PHP script, to is it possible for all this to happen in background?

Link to comment
Share on other sites

There are functions for PHP that can convert PDF to text directly. Going via AutoIt, via Adobe Reader is weird and way too complicated.

Here is an example I found: http://community.livejournal.com/php/295413.html

I know there are but they don't do a good job. These functions have problems when extracting tables because they extract column by column and I need to extract row by row so I don't have to use magic to parse data :)
Link to comment
Share on other sites

Can't you rewrite the tools to use a row by row extraction? The PHP code used seems simple enough to at least give it a try. Using AutoIt here for PHP conversion requires a lot more "magic" than a conversion on the PHP code. As a result, the AutoIt code will use "magic" to do the conversion and the PHP code will use a solid technique.

The solution going via AutoIt via an external program has quite a few drawbacks. Here is a short list of them:

  • Platform specific.

    Your PHP solution which runs on any platform suddenly became platform specific. And not for a very good reason either.

  • Unstable.

    The solution you are going for now is by nature unstable. You will have to build a lot of error checking in order to get this to around 90% succes rate. And then after it working for a few months, there is an update to Adobe Reader and you forgot to configure it to not allow that. Now your application is not working because Adobe Reader is stuck in a dialog box asking you whether you want to update. Customers get angry, etc. You have been there.

  • Needs GUI interaction. Running as a windows service with GUI interaction is a nightmare. You will probably end up with a solution where you need a (administrative) user logged in all the time. Everytime the server reboots you will have to log in before everything will work proper or you have to set up where logon is done automatically.
  • Performance.

    You can probably not get PHP to run AutoIt directly because of permission issues. AutoIt will have to run constantly in the background checking some shared resource like a file for a list of conversion jobs every X seconds. Once that is done, your client will have to refresh the page, you can use a javascript solution for this, in order to let the customer know that the job is done. I think it'll take 30 seconds to do a conversion job via AutoIt where the native solution requires 5 seconds max.

You see, there are a lot of downsides to this solution already and I haven't even begun coding anything. The PHP native method is the way to go and probably the only proper and stable solution. It also scales up very well. I urge you to absolutely exhaust every resource in this direction before reconsidering using AutoIt.

Edited by Manadar
Link to comment
Share on other sites

Thank you all for replays.

I found solution. XPDF is a console application and it can export row by row. It is also available on windows and linux.

I found this application before but I didn't know for -layout parameter which will preserve table look.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...