automator Posted May 6, 2007 Posted May 6, 2007 (edited) In a major effort to reduce reliance on paper, I am trying to automate the retrieval of pdf documents from various web sites. After spending about 20 hours learning AutoIT, mostly with help from the forums, I am ready for more specific help. I have a function which opens an IE browser, attaches a url pointing to a pdf, then saves the pdf to disk. The function works sometimes: CODEFunc OpenPdfSave($URL , $FileName) ; ; Given the URL and filename, open a pdf and save it. ; $oIEpdf = _IECreate( "about:blank" ) _IENavigate( $oIEpdf , $URL ) ; Load URL Sleep (500) While _IEPropertyGet ( $oIEpdf, "statustext" ) <> "Done" Sleep ( 500 ) WEnd Sleep (500) Send( "{ALT down}fa{ALT up}" ) ; Menu: File >> Save As Sleep (500) WinWaitActive( "Save a Copy...") ; Wait til window opens ControlSend( "Save a Copy..." , "File &name:" , "Edit1", $FileName ) ; Enter filename ControlClick( "Save a Copy..." , "File &name:" , "Button2" ) ; Save ; Check to make sure file is written with wait and closure of pop-ups. For $j = 1 to 5 Sleep(1000) ; Wait 1 second $var = WinList() For $i = 1 to $var[0][0] ; Check for Active "Save As" window If $var[$i][0] == "Save As" AND WinActive( "Save As" , "The file already exists." ) Then ControlClick( "Save As" , "The file already exists." , "Button1" ) ; click "Yes" EndIf Next Next _IEAction( $oIEpdf , "quit") EndFunc I really hate to add Sleeps to code. It is a very unreliable methodology. Here are some of the things I have tried which do not work: 1. InetGet() This returns html from the browser, not the pdf file. 2. _IENavigate is supposed to wait until the page loads before proceding. It doesn't. 3. Adding _IELoadWait($oIE) after IENavigate just creates a permanent wait. 3. Even with all of the sleeps and explicit wait til "Done" appears in the statusbar, the send command is occassionally executed before the pdf is completely downloaded. I suspect that the reason this is so challenging is that IE is using a plug-in from Acrobat. Can anyone offer a more reliable way of performing this function? I'm using IE7 and Adobe Acrobat v8.0 Edited May 6, 2007 by automator
martin Posted May 6, 2007 Posted May 6, 2007 In a major effort to reduce reliance on paper, I am trying to automate the retrieval of pdf documents from various web sites. After spending about 20 hours learning AutoIT, mostly with help from the forums, I am ready for more specific help. I have a function which opens an IE browser, attaches a url pointing to a pdf, then saves the pdf to disk. The function works sometimes: CODEFunc OpenPdfSave($URL , $FileName) ; ; Given the URL and filename, open a pdf and save it. ; $oIEpdf = _IECreate( "about:blank" ) _IENavigate( $oIEpdf , $URL ) ; Load URL Sleep (500) While _IEPropertyGet ( $oIEpdf, "statustext" ) <> "Done" Sleep ( 500 ) WEnd Sleep (500) Send( "{ALT down}fa{ALT up}" ) ; Menu: File >> Save As Sleep (500) WinWaitActive( "Save a Copy...") ; Wait til window opens ControlSend( "Save a Copy..." , "File &name:" , "Edit1", $FileName ) ; Enter filename ControlClick( "Save a Copy..." , "File &name:" , "Button2" ) ; Save ; Check to make sure file is written with wait and closure of pop-ups. For $j = 1 to 5 Sleep(1000) ; Wait 1 second $var = WinList() For $i = 1 to $var[0][0] ; Check for Active "Save As" window If $var[$i][0] == "Save As" AND WinActive( "Save As" , "The file already exists." ) Then ControlClick( "Save As" , "The file already exists." , "Button1" ) ; click "Yes" EndIf Next Next _IEAction( $oIEpdf , "quit") EndFunc I really hate to add Sleeps to code. It is a very unreliable methodology. Here are some of the things I have tried which do not work: 1. InetGet() This returns html from the browser, not the pdf file. 2. _IENavigate is supposed to wait until the page loads before proceding. It doesn't. 3. Adding _IELoadWait($oIE) after IENavigate just creates a permanent wait. 3. Even with all of the sleeps and explicit wait til "Done" appears in the statusbar, the send command is occassionally executed before the pdf is completely downloaded. I suspect that the reason this is so challenging is that IE is using a plug-in from Acrobat. Can anyone offer a more reliable way of performing this function? I'm using IE7 and Adobe Acrobat v8.0Are yoiu sure you can't use InetGet to get the file? I don't think it returns html unless you ask it to. I use InetGet to download files with various extensions and never had a problem. Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.
automator Posted May 6, 2007 Author Posted May 6, 2007 (edited) Are yoiu sure you can't use InetGet to get the file? I don't think it returns html unless you ask it to. I use InetGet to download files with various extensions and never had a problem.The URL that I have to fetch the pdf file looks like this:https://esp.srcponline.com/w...r/viewpdf.asp?tocken=D62... (I've truncated for security)and the URL which shows up on the page that is returned is:https://esp.srcponline.com/w...r/Showdoc.asp?tocken=1A2...I'm guessing that the fact that there is no *.pdf readily apparent prevents InetGet from actingon the pdf rather than the HTML. Edited May 6, 2007 by automator
DaleHohm Posted May 6, 2007 Posted May 6, 2007 Your sispicion about the trouble being caused by the Adobe plugin is correct. The plugin is hosted by the browser and replaces the normal DOM (document object model) that is used for HTML pages. It has its own set of properties and methods and, for example, the document.readyState property that _IELoadWait relies upon is not used. This and any other plug-in object model is currently beyond the scope of IE.au3, but since PDF is as common as it is, I would be interested in extending it to include it. Unfortunately, I've never researched the detail and don't know how easy or hard it would be. With a little research you may be able to figure it out since you already have the $oIE object as the parent. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now