Jump to content

Saving a pdf from a web site


Recommended Posts

In a major effort to reduce reliance on paper, I am trying to automate the retrieval of pdf documents from various web sites. After spending about 20 hours learning AutoIT, mostly with help from the forums, I am ready for more specific help.

I have a function which opens an IE browser, attaches a url pointing to a pdf, then saves the pdf to disk. The function works sometimes:

CODE
Func OpenPdfSave($URL , $FileName)

;

; Given the URL and filename, open a pdf and save it.

;

$oIEpdf = _IECreate( "about:blank" )

_IENavigate( $oIEpdf , $URL ) ; Load URL

Sleep (500)

While _IEPropertyGet ( $oIEpdf, "statustext" ) <> "Done"

Sleep ( 500 )

WEnd

Sleep (500)

Send( "{ALT down}fa{ALT up}" ) ; Menu: File >> Save As

Sleep (500)

WinWaitActive( "Save a Copy...") ; Wait til window opens

ControlSend( "Save a Copy..." , "File &name:" , "Edit1", $FileName ) ; Enter filename

ControlClick( "Save a Copy..." , "File &name:" , "Button2" ) ; Save

; Check to make sure file is written with wait and closure of pop-ups.

For $j = 1 to 5

Sleep(1000) ; Wait 1 second

$var = WinList()

For $i = 1 to $var[0][0]

; Check for Active "Save As" window

If $var[$i][0] == "Save As" AND WinActive( "Save As" , "The file already exists." ) Then

ControlClick( "Save As" , "The file already exists." , "Button1" ) ; click "Yes"

EndIf

Next

Next

_IEAction( $oIEpdf , "quit")

EndFunc

I really hate to add Sleeps to code. It is a very unreliable methodology. Here are some of the things I have tried which do not work:

1. InetGet() This returns html from the browser, not the pdf file.

2. _IENavigate is supposed to wait until the page loads before proceding. It doesn't.

3. Adding _IELoadWait($oIE) after IENavigate just creates a permanent wait.

3. Even with all of the sleeps and explicit wait til "Done" appears in the statusbar, the send command is occassionally executed before the pdf is completely downloaded.

I suspect that the reason this is so challenging is that IE is using a plug-in from Acrobat.

Can anyone offer a more reliable way of performing this function?

I'm using IE7 and Adobe Acrobat v8.0

Edited by automator
Link to comment
Share on other sites

In a major effort to reduce reliance on paper, I am trying to automate the retrieval of pdf documents from various web sites. After spending about 20 hours learning AutoIT, mostly with help from the forums, I am ready for more specific help.

I have a function which opens an IE browser, attaches a url pointing to a pdf, then saves the pdf to disk. The function works sometimes:

CODE
Func OpenPdfSave($URL , $FileName)

;

; Given the URL and filename, open a pdf and save it.

;

$oIEpdf = _IECreate( "about:blank" )

_IENavigate( $oIEpdf , $URL ) ; Load URL

Sleep (500)

While _IEPropertyGet ( $oIEpdf, "statustext" ) <> "Done"

Sleep ( 500 )

WEnd

Sleep (500)

Send( "{ALT down}fa{ALT up}" ) ; Menu: File >> Save As

Sleep (500)

WinWaitActive( "Save a Copy...") ; Wait til window opens

ControlSend( "Save a Copy..." , "File &name:" , "Edit1", $FileName ) ; Enter filename

ControlClick( "Save a Copy..." , "File &name:" , "Button2" ) ; Save

; Check to make sure file is written with wait and closure of pop-ups.

For $j = 1 to 5

Sleep(1000) ; Wait 1 second

$var = WinList()

For $i = 1 to $var[0][0]

; Check for Active "Save As" window

If $var[$i][0] == "Save As" AND WinActive( "Save As" , "The file already exists." ) Then

ControlClick( "Save As" , "The file already exists." , "Button1" ) ; click "Yes"

EndIf

Next

Next

_IEAction( $oIEpdf , "quit")

EndFunc

I really hate to add Sleeps to code. It is a very unreliable methodology. Here are some of the things I have tried which do not work:

1. InetGet() This returns html from the browser, not the pdf file.

2. _IENavigate is supposed to wait until the page loads before proceding. It doesn't.

3. Adding _IELoadWait($oIE) after IENavigate just creates a permanent wait.

3. Even with all of the sleeps and explicit wait til "Done" appears in the statusbar, the send command is occassionally executed before the pdf is completely downloaded.

I suspect that the reason this is so challenging is that IE is using a plug-in from Acrobat.

Can anyone offer a more reliable way of performing this function?

I'm using IE7 and Adobe Acrobat v8.0

Are yoiu sure you can't use InetGet to get the file? I don't think it returns html unless you ask it to. I use InetGet to download files with various extensions and never had a problem.
Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.
Link to comment
Share on other sites

Are yoiu sure you can't use InetGet to get the file? I don't think it returns html unless you ask it to. I use InetGet to download files with various extensions and never had a problem.

The URL that I have to fetch the pdf file looks like this:

https://esp.srcponline.com/w...r/viewpdf.asp?tocken=D62... (I've truncated for security)

and the URL which shows up on the page that is returned is:

https://esp.srcponline.com/w...r/Showdoc.asp?tocken=1A2...

I'm guessing that the fact that there is no *.pdf readily apparent prevents InetGet from acting

on the pdf rather than the HTML.

Edited by automator
Link to comment
Share on other sites

Your sispicion about the trouble being caused by the Adobe plugin is correct. The plugin is hosted by the browser and replaces the normal DOM (document object model) that is used for HTML pages. It has its own set of properties and methods and, for example, the document.readyState property that _IELoadWait relies upon is not used.

This and any other plug-in object model is currently beyond the scope of IE.au3, but since PDF is as common as it is, I would be interested in extending it to include it. Unfortunately, I've never researched the detail and don't know how easy or hard it would be.

With a little research you may be able to figure it out since you already have the $oIE object as the parent.

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...