Grab downloads from webpage...now broken.

mdwerne · May 12, 2020

Hello,

Before Broadcom took over Symantec, I was able to use the following code as the base to scrape daily definition downloads from the web. Since the pages were moved over to the Broadcom web servers, I get a page full of what I believe may be JavaScript instead of the fully rendered page that lies behind it. Does anyone have any suggestions as to how I can read the full rendered webpage using AutoIt?

#include <IE.au3>
$oIE_DEFS = _IECreate("https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep14", 0, 0, 1, 1)
$sString_DEFS = _IEBodyReadHTML($oIE_DEFS)
MsgBox(0, "https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep14", $sString_DEFS)

The above code will show the JavaScript, but if you go to the URL in a browser that has JavaScript enabled, you will see the fully rendered page that I would like to access. I hope my question makes sense, and I would appreciate any suggestions to get this working again.

All the best,
-Mike

faustf · May 12, 2020

if you find a link use stringregrexp and match all href

mdwerne · May 12, 2020

Thank you for the suggestion, but unfortunately, the links to the files don't appear to be within the JavaScript...they only appear after the page is fully rendered.

faustf · May 12, 2020

scroll the page

and after find a link use stringregrexp and match all href

or i remember the big company have also FTP anonymous for download tools or other , try to look if symantec have , google is our friend , bye

TheXman · May 12, 2020

<SNIP>

When I ran my example, I thought I saw the rendered HTML but I appear to have been mistaken. So I removed it.

Edited May 12, 2020 by TheXman
Removed example since it was incorrect.

MattHiggs · May 14, 2020

On 5/12/2020 at 3:29 PM, mdwerne said:
Hello,

Before Broadcom took over Symantec, I was able to use the following code as the base to scrape daily definition downloads from the web. Since the pages were moved over to the Broadcom web servers, I get a page full of what I believe may be JavaScript instead of the fully rendered page that lies behind it. Does anyone have any suggestions as to how I can read the full rendered webpage using AutoIt?
#include <IE.au3>
$oIE_DEFS = _IECreate("https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep14", 0, 0, 1, 1)
$sString_DEFS = _IEBodyReadHTML($oIE_DEFS)
MsgBox(0, "https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep14", $sString_DEFS)
The above code will show the JavaScript, but if you go to the URL in a browser that has JavaScript enabled, you will see the fully rendered page that I would like to access. I hope my question makes sense, and I would appreciate any suggestions to get this working again.

All the best,
-Mike

When I am writing a script to scrape a website, there are several ways that I go about it:

1) If the html is fully available, then that is the easiest way to go about it

2) I download the file manually in the browser, and then I copy the download link from the browser itself and see if the actual download links have a commonality that would allow me to download the files I need without actually knowing pulling the download link from the web page:

3) In the even that the file download link doesn't directly reference the file, like in the case of the file at this url: https://www.sordum.org/files/downloads.php?easy-context-menu

then I use the wget tool and have it handle the redirections and force the file download into the file format that I know it is supposed to be in. So, using the above example, I know the file is supposed to be a zip file, so I would run the wget command
wget https://www.sordum.org/files/downloads.php?easy-context-menu -O"Path\to\save\file\file.zip"

Oh, also, wget is native to linux operating systems, but there is a very decent windows port of wget available here

mdwerne · May 19, 2020

Thanks for the reply, @MattHiggs. This is something new to try as I have been unable to get at the files any other way. I haven't played with WGET much, but I'll give it a shot. Manually downloading a handful of different definition sets, every night, from this site: https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep14
is getting kinda old.

Again, thanks for your reply, I was at an impasse.

-Mike

P.S. Part of the issue is also that the download files names change a few times a day...which is why I need to scrape the fully rendered Javascript (HTML) page each time.

Edited May 19, 2020 by mdwerne
Forgot some info...

Sign In

Grab downloads from webpage...now broken.

Recommended Posts

mdwerne

faustf

mdwerne

faustf

TheXman

MattHiggs

mdwerne

Create an account or sign in to comment

Create an account

Sign in

Similar Content

Aquarium

AutoIt Sysinternal Tools Synchronizer v0.99.6 build 2020-09-23 beta 1 2

BrowserControl Companion

Who is Who ? (a little drag&drop game)

Mobile Pinch zoom problem

Browse

AutoIt Resources

Release

Beta