Jump to content

How to login to Site, crawl list of URLS save each page as HTML


Recommended Posts

So i've been tasked with logging into a site with IE, and then look at a text file that contains a list of URLs on the site to crawl.  I need to crawl each URL, save the HTML to a specified file name, then go to the next URL and repeat the process.

So far I am able to login to the site in IE with this:

#include <ie.au3>
#include <INet.au3>
#include <MsgBoxConstants.au3>

$uname="tony"
$pwd="tony"

; Get ready to login!
$oIE = _IECreate ("http://localhost/books/login.aspx")
$oForm = _IEFormGetObjByName ($oIE, "form1")
$oQuery1 = _IEFormElementGetObjByName ($oForm, "userNameTextBox")
$oQuery2 = _IEFormElementGetObjByName ($oForm, "passwordTextBox")

; Start sending form values and then simulate a click to login
_IEFormElementSetValue ($oQuery1,$uname)
_IEFormElementSetValue ($oQuery2,$pwd)
$oButton=_IEGetObjById($oIE,"loginButton")
_IEAction ($oButton, "click")
_IELoadWait($oIE,0)

That gets me to login.

My text file (c:urls.txt) looks like this:

http://localhost/books/book1.html
http://localhost/books/book2.html
http://localhost/guies/book1.html

I have to then open IE up to the first URL, then Save file AS something, then go to the next one.  

Any suggestions?

Thanks!
-Tony

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...