Jump to content

'modern day' website data scrape...?


 Share

Recommended Posts

Greetings,

  I'd like to be able to scrape data from websites, mostly official corporate websites for name and address information.  I know about the "_IE" related functions pertaining to 'tables', 'forms', 'body', and other website elements.  However it seems now-adays many websites are instead using Javascript/PHP/SQL to post data directly onto the page from a database query...and this seems to make the "_IE" related functions problematic...if not defeat them entirely.

  Is there an effective way of getting data off a website when these situations are encountered...?  Thanks in advance for any hints.

Regards

Link to comment
Share on other sites

Hi,

  Well I can give you one quick example off the top of my head...go to the official corporate website for McDonalds (www.mcdonalds.com)...click on 'locate' at the top menu bar.  Then put in whatever you want...state, zip code...however you want to search.  The site will then give a listing of stores in the area you searched in...

  It seems they are not in 'tables' or 'frames' etc...they are pulled from a database...so the names/addresses for each listed store do not even appear in the 'body' tag of the HTML...

   There are many sites like that...just about any corporate or transit website, many seem to be like that now...so how are "_IE" functions supposed to scrape that information from them?

 

Link to comment
Share on other sites

Hmmm well it seems not, unless you care to explain...

if you do a search on any locality...the information I want to scrape, "123 main street, anytown, anystate" for example...that information is not in the body of the HTML or anywhere in the source of the page...so how can I scrape it?  If it ain't there...it ain't there...

Link to comment
Share on other sites

Using Firefox, I right clicked on the desired element and chose Inspect Element from the popup menu. This opened the Developer Tools' Inspector tab and highlights the designated element. FWIW, I was able to view the address. IIRC, it was contained in a bunch of nested DIV elements.

If it's visible, it's there somewhere... B)

Link to comment
Share on other sites

well that's what I would have figured...however for some reason when I did a search (ctrl F) of the source for the page the address info did not come up...

however yes I followed your instruction and was able to find the 'DIV' element (h3/h4) where the address and town are listed...now my obvious question...how can I use that process to scrape that info...?  I mean it would not be feasible to send 'clicks' and read the 'Inspect Element'...on that particular site there are countless 'DIV' elements...there must be a way to automate it...correct?

Link to comment
Share on other sites

21 minutes ago, Burgs said:

there must be a way to automate it...correct?

Yes... the same way you would for any other site. In this case, you could use the _IE commands to retrieve all DIV elements with class "restaurant-location__address-container". From there, either retrieve the H3/H4 elements and grab their innertext values.

If I was trying to automate this, I would probably invoke the jQuerify  solution from @Chimp.

Link to comment
Share on other sites

Thanks for the reply.  I will look at that link, i'm not familiar with it.  This particular website (McDonalds) is only 1 example...I want to be able to scrape similar information (name, address, and other related info) from a great multitude of any particular random website...thus that particular container may be fine for the McDonalds website...however not valid for KFC...for example...which is likely laid out differently.

It seems it may not be possible to do as I would like...as I mentioned earlier perhaps the 'good 'ole days' of data being neatly available in forms, tables, and frames are not viable anymore...???

Edited by Burgs
Link to comment
Share on other sites

I don't know of a single AutoIT solution that will automatically grab the address from any website. Even in the "good 'old days" you still had to know which elements (forms, tables, frames, etc) contained the desired data.

FWIW, jQuerify is an AutoIT solution. It just gives you access to the jQuery variable so that you can invoke commands that may not be available via the standard _IE* commands.

Link to comment
Share on other sites

OK yes I see...I am looking at that jQuerify link now.   Yes all I mean is that the information was more readily available when located in such an element like a table...seemed easier to be able to parse and track down what you are looking for.  Thanks again for the information.  I will study that jQuery link you supplied.  Regards.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...