Sign in to follow this  
Followers 0
Mingi

Waiting for IE tabs to load if its not the maintab

4 posts in this topic

I am trying to automate downloading several pages from a Website.The urls I access differ only in the number at the end. So I am using a for loop to go through them one by one and save them afterwards.

I already could figure out how to make IE wait until the page is loaded with IELoadWait and use WinExists to wait until the saving window closes. But because loading takes about 9 seconds per page, it would be great if I could load several pages at once into different tabs and save them one by one. So my program would not stay idle for the whole loading time. Eventually I want to download about 200.000 pages. Every second less it takes per page would save a lot of time in the end.

So far I could not figure out a way to make autoit wait until the page is loaded when it is not the first tab. Currently I have the problem that my program most of the time saves the pages before they are fully loaded.

Is there any way to wait for the focused tab to load? I checked IE.au3. But there only seems to be a time out function for __IELoadWait.

In case there is another way around that loading problem any hints would be great.

Thanks a lot for any comments!

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Welcome to AutoIt and the forum!

Can you please tell us what you try to download? 200.000 pages is quite a lot.

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Ah, a good question. Thanks for asking. Such a large number probably looks a little bit dubious.

The webpage I am accessing is a database with newspaper articles that I want to do a quantitative analysis with.

Basically just checking for how often certain words occur and in which contexts.

200000 would be the upper limit. Though I have no idea whether I even have the time to analyze them all.

But eventually I hope to get hold of about 50000 of the most recent of them which would cover the last three years.

Currently with only one tab, I can download about 250 articles per hour. So anything faster would help me a lot.

Even with a more modest data sample, it would take me days just running the program to download them.

I would actually have looked them up online without downloading. But every article/page takes about 10 seconds to load.

So I would spend lots of time just sitting and waiting and autoit would save me at least the loading time.

There is of course a search function on the website which gives the articles that mention the terms I am actually looking for.  I would love to just download these results as it is "only" 18000 articles for the one term I am focusing on right now. But because the position of the links I would have to access is changing when some article titles are too long, I cannot use the simple MouseClick and Send commands to automate the saving. I could not figure out how to automatically find the link by its text yet and as it is javascript it probably is a lot more complicated. So this option is kind of not available for me right now.

Eventually, I also want to check for co-occurences of certain terms. So I would have to download the results for the other terms later as well. So I am assuming it would be a lot faster to just download the all articles for a certain period and then see which include the terms I am looking for.

To make a long story short, it would all be in the name of science. But I understand if someone does not believe me right away.

Edited by Mingi

Share this post


Link to post
Share on other sites

Thanks for the detailed explanation!

Another question before we start: Did you check the terms of use for the site? Do they allow automated mass downloading? Can you by any chance post the URL?

I assume you are using Internet Explorer to access the site. You could access the page with the search results, extract all links by using the IE UDF and then process the links.

Let's first concentrate on getting the list of links.

If you search for the terms, do you get the results on a single page or on multiple pages?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0