Jump to content

IE.au3 - Grabbing Data from Webpage


Recommended Posts

Hello Everybody!

I want to grab the Movie Titles for movies 'Now Playing' at the below web address. Normally, I would use _IETableWriteToArray() to get the data, but I can already forsee some issues just getting the Movie Title, and not a bunch of other data. If that doesn't work, then I normally go to _IELinkGetCollection() and iterate through the links with some sort of criteria that returns only the links I want, but that won't work here because all of the links have the same properties (from what I can tell).

Maybe if there were some way to use _IELinkGetCollection(), from just an indexed table, that might work. Or, as is common practice, the AutoIt community enlightens me by letting me know of some alternate way I have never thought of. I am partial to working with IE.au3 or COM, so I would like to think in that framework if possible. I wanted some feedback to see if there are other options that I am overlooking. Thanks!

$sUrl = "http://www.fandango.com/edwardsfresnostadi...fyg/theaterpage"

Link to comment
Share on other sites

So, if you use DebugBar to look at the structure of the page and look for a pattern, you'll see that the movie titles are the text of a link (<a>) inside an <h4> element.

So, if you get the collection of h4's with _IETagnameGetCollection, loop through them and get the _IEPropertyGet, innertext of the first link inside each h4. you should have what you want.

Take a stab at it and post some code if you have trouble.

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

So, if you use DebugBar to look at the structure of the page and look for a pattern, you'll see that the movie titles are the text of a link (<a>) inside an <h4> element.

So, if you get the collection of h4's with _IETagnameGetCollection, loop through them and get the _IEPropertyGet, innertext of the first link inside each h4. you should have what you want.

Take a stab at it and post some code if you have trouble.

Dale

Pfffft... 2 hours later litlmike taps out. I looked through the help file for DebugBar (lol), then found it on your Sig; though it looks awesome I don't understand how to use to Debug, but I can learn more about it later. I did notice the pattern you mentioned, previously, by just looking through the HTML, but seeing a pattern and manipulating it are different. :D

MSDN's search engine isn't working ATM and it makes learning about something I know nothing about very cumbersome. I think I get what you are saying though; I need to get a collection of 'h4' (I think it means headers, text size 4), then get the innertext property of that collection of h4, then Viola. But, I can find ANYWHERE in MSDN how to work with headers or h4. I assume _IETagnameGetCollection returns a collection object, but how do I tell that Func to get the h4, 'head' doesn't seem to return what I am looking for. I tried finding a list of things Tagname applies to, but whatever I found didn't produce the result. I tried A, head, etc. I abandoned the first set of code below for the second set, which returned too much data, but at least some of it contained the intended result. From there, I did not know how to refine it to h4 elements.

This may be TMI, but I thought it important to show my work to the teacher :D

#include<IE.au3>
$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 1)
$oInputs = _IETagNameGetCollection ($oIE, "h4")

For $oInput In $oInputs

    ConsoleWrite ( _IEPropertyGet ($oInputs, "innertext") & @CRLF)  
NextoÝ÷ Ù«­¢+Ø¥¹±Õ±Ðí%¹ÔÌÐì(ÀÌØí½%ô}%
ÉÑ ÅÕ½Ðí¡ÑÑÀè¼½ÝÝܹ¹¹¼¹½´½ÝÉÍÉ͹½ÍÑ¥Õ´Èɹ¥µá}å½Ñ¡ÑÉÁÅÕ½Ðì°Ä¤(ÀÌØí½1¥¹­Ìô}%1¥¹­Ñ
½±±Ñ¥½¸ ÀÌØí½%¤()½ÈÀÌØí½1¥¹¬%¸ÀÌØí½1¥¹­Ì(%
½¹Í½±]É¥Ñ }%AɽÁÉÑåÐ ÀÌØí½1¥¹¬°ÅÕ½Ðí¥¹¹ÉÑáÐÅÕ½Ð줵Àì
I1¤)9á
Link to comment
Share on other sites

hi

maybe another approach would help?

this is not as elegant as dale's solution but i apporached the source from different side (split out all relevant parts of the html, then split results again etc..)

..coding time was 5min :D

CODE

; theater

#include <INET.au3>

#include <array.au3>

#include <String.au3>

$start_url = "http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage"

$sCode = _INetGetSource($start_url)

ClipPut($sCode); just to check what we get

$tmp = _StringBetween($sCode, 'Now Playing</a></h2>', '</table>')

If @error <> 1 Then

; _ArrayDisplay($tmp, 'Stringbetween Search')

;, general selection between links

$tmp1 = _StringBetween($tmp[0], '<LI>', '</LI>')

If @error <> 1 Then

;_ArrayDisplay($tmp1, 'Stringbetween Search')

;; all entries

For $iCC = 0 To UBound($tmp1, 1) - 1

$tmp2 = _StringBetween($tmp1[$iCC], '>', '</a>')

If @error <> 1 Then

$title = $tmp2[0]

MsgBox(0, "Now playing", $title); --> get now all titles

;; all entries

EndIf

Next

EndIf

EndIf

Link to comment
Share on other sites

ok - me back again: after another 3 min reserch i come up with this

CODE

#include<IE.au3>

$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 1)

$oLinks = _IELinkGetCollection ($oIE)

For $oLink In $oLinks

If StringRight($oLink.href, 5) == "date=" then

ConsoleWrite ( $oLink.href & @CRLF)

ConsoleWrite ( _IEPropertyGet ($oLink, "innertext") & @CRLF)

endif

Next

Link to comment
Share on other sites

ok - me back again: after another 3 min reserch i come up with this

CODE

#include<IE.au3>

$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 1)

$oLinks = _IELinkGetCollection ($oIE)

For $oLink In $oLinks

If StringRight($oLink.href, 5) == "date=" then

ConsoleWrite ( $oLink.href & @CRLF)

ConsoleWrite ( _IEPropertyGet ($oLink, "innertext") & @CRLF)

endif

Next

Well, this does work, but for learning purposes I would like to bridge the gap with Dale's method of grabbing the H4 elements, because I think that will come in handy in the future. Also, how is it that this method works, when I look at the HTML here is how the link is coded, but I see no mention of "date="

<a href="http://www.fandango.com/madeofhonor_102109/movieoverview">Made of Honor</a>
Link to comment
Share on other sites

hi

so i guess i cant help anymore :D

in live & programming there is more than one solution to solve a problem..

Well you can still help by answering this question from the last post:

Also, how is it that this method works, when I look at the HTML here is how the link is coded, but I see no mention of "date="

<a href="http://www.fandango.com/madeofhonor_102109/movieoverview">Made of Honor</a>
Link to comment
Share on other sites

Ok, look this over:

#include <IE.au3>
$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 1)

$oH4s = _IETagNameGetCollection ($oIE, "h4")

For $oH4 in $oH4s ; loop though <H4>s
    $oA = _IETagNameGetCollection($oH4, "a", 0) ; get the first <A> inside the <H4>
    ConsoleWrite("MovieName: " & _IEPropertyGet($oA, "innertext") & @CR)
Next

Let me know if you have questions.

Dale

p.s. Regarding DebugBar, drag the target icon over the element you are interested in in the webpage and then examing the source it shows you on the left

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Ok, look this over:

#include <IE.au3>
$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 1)

$oH4s = _IETagNameGetCollection ($oIE, "h4")

For $oH4 in $oH4s ; loop though <H4>s
    $oA = _IETagNameGetCollection($oH4, "a", 0) ; get the first <A> inside the <H4>
    ConsoleWrite("MovieName: " & _IEPropertyGet($oA, "innertext") & @CR)
Next

Let me know if you have questions.

Dale

p.s. Regarding DebugBar, drag the target icon over the element you are interested in in the webpage and then examing the source it shows you on the left

Ahhh, I was so close yet so far. This makes sense now, once you have the collection of h4, then get the collection of links, then get the innertext properties. How could I have known what are acceptable Tagnames? In the help file it mentions IMG and TR, but where in MSDN can I find a list of all acceptable elements to collect?

I see I have more to learn about how HTML is structured with Objects, etc. I originally thought that I could only use _IETagName once, and that it must be either 'h4', 'a', 'head' or something like that. This is helpful to understand that once a collection of objects is made, I can then search that collection and return another collection.

How do you find that DebugBar helps you in scripting? To identify where objects are located in the HTML, or are there more uses?

Thanks as always

Link to comment
Share on other sites

TagNames are the core elements of HTML... BODY, TABLE, A, UL, LI, DIV, IMG, TR, OBJECT, P, etc. - essentailly anything inside <>

There are many types of collections... TagName collections are just one of them.

It isn't easy and the documentation can be confusing -- if this were not the case, there would be much less need for IE.au3.

I use DebugBar primarily to understand the HTML and document structure and to examine page source. It is also good for finding and digging into frames and examining scripts and more...

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

TagNames are the core elements of HTML... BODY, TABLE, A, UL, LI, DIV, IMG, TR, OBJECT, P, etc. - essentailly anything inside <>

There are many types of collections... TagName collections are just one of them.

It isn't easy and the documentation can be confusing -- if this were not the case, there would be much less need for IE.au3.

I use DebugBar primarily to understand the HTML and document structure and to examine page source. It is also good for finding and digging into frames and examining scripts and more...

Dale

"essentailly anything inside <>"

Excellent to know! It seems then that potentially one could use _IETagnameGetCollection($oIE, "table"), instead of _IETableGetCollection() (though probably not advisable). Interesting to see the interconnectivity. Thanks for your help and your creation of the IE UDF, it does me wonders on a daily basis.

Link to comment
Share on other sites

Oops, I just noticed we need an extra step in this script. The current script you produced provides too much data; it should only display the movies that are under the heading "Showtimes", but it now includes those under the "Tickets Now Available for these Coming Attractions". After looking at DebugBar, it looks like <UL class=showtimes> is another pattern and it is not shared by the data not needed. So I gave it a try, but I failed miserably; the lesson is never try. Below I included the code I am working on, and also my interpretation of what my script and your script are saying.

#include <IE.au3>
$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 0)
$oULs = _IETagNameGetCollection ($oIE, "UL")
For $oUL in $oULs
    $oShowTimes = _IETagNameGetCollection ($oULs, "showtimes")
        For $oShowTime in $oShowTimes ; loop though
            $oH4s = _IETagNameGetCollection ($oIE, "h4")
                For $oH4 in $oH4s ; loop though <H4>s
                    $oA = _IETagNameGetCollection($oH4, "a", 0) ; get the first <A> inside the <H4>
                    ConsoleWrite("MovieName: " & _IEPropertyGet($oA, "innertext") & @CR)
                Next
        Next
Next

#cs
Dale's
Make a collection object that Gets all the h4
From that object, make a collection object, return the 1st indexed
From that object, PropertyGet the innertext
#ce

#cs
Mine
Make a collection object that Gets all UL
But how do I refine it to class = showtimes?
    [this should refine it to only the elements under Showtimes (I believe)]
From that object, make a collection object, Loop and get the Tagname collection of h4s
From that object, make a collection object, Loop and get the Tagname collection of <a>
From that object, PropertyGet the innertext

#ce
Link to comment
Share on other sites

You were on the right track. The string in class= is not a tag however - UL is. Also , you need to know that the property for class is className... therefore:

#include <IE.au3>
$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 1)

$oULs = _IETagNameGetCollection ($oIE, "ul")

For $oUL in $oULs
    $oH4s = _IETagNameGetCollection ($oul, "h4")
    If String($oUL.className) = "showtimes" Then
        For $oH4 in $oH4s
            $oA = _IETagNameGetCollection($oH4, "a", 0)
            ConsoleWrite("MovieName: " & _IEPropertyGet($oA, "innertext") & @CR)
        Next
    EndIf
Next

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

You were on the right track. The string in class= is not a tag however - UL is. Also , you need to know that the property for class is className... therefore:

#include <IE.au3>
$oIE = _IECreate ("http://www.fandango.com/edwardsfresnostadium22andimax_aafyg/theaterpage", 1)

$oULs = _IETagNameGetCollection ($oIE, "ul")

For $oUL in $oULs
    $oH4s = _IETagNameGetCollection ($oul, "h4")
    If String($oUL.className) = "showtimes" Then
        For $oH4 in $oH4s
            $oA = _IETagNameGetCollection($oH4, "a", 0)
            ConsoleWrite("MovieName: " & _IEPropertyGet($oA, "innertext") & @CR)
        Next
    EndIf
Next

Dale

Niiiiice....there is so much satisfaction in discovering a solution. Thanks.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...