Jump to content
Sign in to follow this  
Burgs

'modern day' website data scrape...?

Recommended Posts

Burgs

Greetings,

  I'd like to be able to scrape data from websites, mostly official corporate websites for name and address information.  I know about the "_IE" related functions pertaining to 'tables', 'forms', 'body', and other website elements.  However it seems now-adays many websites are instead using Javascript/PHP/SQL to post data directly onto the page from a database query...and this seems to make the "_IE" related functions problematic...if not defeat them entirely.

  Is there an effective way of getting data off a website when these situations are encountered...?  Thanks in advance for any hints.

Regards

Share this post


Link to post
Share on other sites
Burgs

Hi,

  Well I can give you one quick example off the top of my head...go to the official corporate website for McDonalds (www.mcdonalds.com)...click on 'locate' at the top menu bar.  Then put in whatever you want...state, zip code...however you want to search.  The site will then give a listing of stores in the area you searched in...

  It seems they are not in 'tables' or 'frames' etc...they are pulled from a database...so the names/addresses for each listed store do not even appear in the 'body' tag of the HTML...

   There are many sites like that...just about any corporate or transit website, many seem to be like that now...so how are "_IE" functions supposed to scrape that information from them?

 

Share this post


Link to post
Share on other sites
Burgs

Hmmm well it seems not, unless you care to explain...

if you do a search on any locality...the information I want to scrape, "123 main street, anytown, anystate" for example...that information is not in the body of the HTML or anywhere in the source of the page...so how can I scrape it?  If it ain't there...it ain't there...

Share this post


Link to post
Share on other sites
Danp2

Using Firefox, I right clicked on the desired element and chose Inspect Element from the popup menu. This opened the Developer Tools' Inspector tab and highlights the designated element. FWIW, I was able to view the address. IIRC, it was contained in a bunch of nested DIV elements.

If it's visible, it's there somewhere... B)

Share this post


Link to post
Share on other sites
Burgs

well that's what I would have figured...however for some reason when I did a search (ctrl F) of the source for the page the address info did not come up...

however yes I followed your instruction and was able to find the 'DIV' element (h3/h4) where the address and town are listed...now my obvious question...how can I use that process to scrape that info...?  I mean it would not be feasible to send 'clicks' and read the 'Inspect Element'...on that particular site there are countless 'DIV' elements...there must be a way to automate it...correct?

Share this post


Link to post
Share on other sites
Danp2
21 minutes ago, Burgs said:

there must be a way to automate it...correct?

Yes... the same way you would for any other site. In this case, you could use the _IE commands to retrieve all DIV elements with class "restaurant-location__address-container". From there, either retrieve the H3/H4 elements and grab their innertext values.

If I was trying to automate this, I would probably invoke the jQuerify  solution from @Chimp.

Share this post


Link to post
Share on other sites
Burgs

Thanks for the reply.  I will look at that link, i'm not familiar with it.  This particular website (McDonalds) is only 1 example...I want to be able to scrape similar information (name, address, and other related info) from a great multitude of any particular random website...thus that particular container may be fine for the McDonalds website...however not valid for KFC...for example...which is likely laid out differently.

It seems it may not be possible to do as I would like...as I mentioned earlier perhaps the 'good 'ole days' of data being neatly available in forms, tables, and frames are not viable anymore...???

Edited by Burgs

Share this post


Link to post
Share on other sites
Danp2

I don't know of a single AutoIT solution that will automatically grab the address from any website. Even in the "good 'old days" you still had to know which elements (forms, tables, frames, etc) contained the desired data.

FWIW, jQuerify is an AutoIT solution. It just gives you access to the jQuery variable so that you can invoke commands that may not be available via the standard _IE* commands.

Share this post


Link to post
Share on other sites
Burgs

OK yes I see...I am looking at that jQuerify link now.   Yes all I mean is that the information was more readily available when located in such an element like a table...seemed easier to be able to parse and track down what you are looking for.  Thanks again for the information.  I will study that jQuery link you supplied.  Regards.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.