Sign in to follow this  
Followers 0
hallaplay835

Interacting with the HTML of a webpage

13 posts in this topic

Hello there, I seem to be unable to find a way to solve this.

Suppose you browsed to a webpage that contains the following HTML code:

<html>
<head><title>Sample title</title></head>
<body>
<br />
<table>
<tr><td>Random text</td></tr>
</table>
<div>233</div>
<div>Not this div tag</div>
<div>Nor this other</div>
</body>
</html>

Now suppose the text between the first div tag was dynamic (i.e. when the web page renders, the number inside the div tags is random).

The thing is as the div tag has no id or class, there is no way to store its contents in a variable using the _IEGetObjById() function.

I have tried to use _IEBodyReadHTML() to store the whole HTML code in a variable, and then tried to use string management functions such as StringStripCR(), StringInStr() or _StringBetween() to store the contents of the div tag but without success. Everything seems too complicated as the HTML code in the real thing is much more extense than the above one and there are thousands of div tags and other elements of which I want to obtain the contents of only one.

Any ideas? I am sure there is a much more simpler approach to do this, but cannot seem to work it out.

Thanks.


_____________________________________________________[size="2"][font="Arial"]"Pain is temporary, glory is forever."[/font][/size]

Share this post


Link to post
Share on other sites



All time you need the first div content?


When the words fail... music speaks

Share this post


Link to post
Share on other sites

All time you need the first div content?

Yes. There are always the same number of div tags, and I only want to store the contents of the first one.


_____________________________________________________[size="2"][font="Arial"]"Pain is temporary, glory is forever."[/font][/size]

Share this post


Link to post
Share on other sites

This should work for all divs based on index:

#include <String.au3>

$HTML = '<html>'
$HTML &= '<head><title>Sample title</title></head>'
$HTML &= '<body>'
$HTML &= '<br />'
$HTML &= '<table>'
$HTML &= '<tr><td>Random text</td></tr>'
$HTML &= '</table>'
$HTML &= '<div>233</div>'
$HTML &= '<div>Not this div tag</div>'
$HTML &= '<div>Nor this other</div>'
$HTML &= '</body>'
$HTML &= '</html>'

;   $HTML = _IEBodyReadHTML() in your case
MsgBox(0,"",HTML_Div($HTML,1))  ; 1 means the content of first div

Func HTML_Div($HTML,$DIV_INDEX)
    Local $DIV = _StringBetween($HTML,"<div>","</div>")
    If IsArray($DIV) Then
        If ($DIV_INDEX-1) <= UBound($DIV)-1 Then
            Return $DIV[$DIV_INDEX-1]
        Else
            Return "Out_of_range"
        EndIf
    Else
        Return "No_div_found"
    EndIf
EndFunc

When the words fail... music speaks

Share this post


Link to post
Share on other sites

This should work for all divs based on index:

#include <String.au3>

$HTML = '<html>'
$HTML &= '<head><title>Sample title</title></head>'
$HTML &= '<body>'
$HTML &= '<br />'
$HTML &= '<table>'
$HTML &= '<tr><td>Random text</td></tr>'
$HTML &= '</table>'
$HTML &= '<div>233</div>'
$HTML &= '<div>Not this div tag</div>'
$HTML &= '<div>Nor this other</div>'
$HTML &= '</body>'
$HTML &= '</html>'

;   $HTML = _IEBodyReadHTML() in your case
MsgBox(0,"",HTML_Div($HTML,1))  ; 1 means the content of first div

Func HTML_Div($HTML,$DIV_INDEX)
    Local $DIV = _StringBetween($HTML,"<div>","</div>")
    If IsArray($DIV) Then
        If ($DIV_INDEX-1) <= UBound($DIV)-1 Then
            Return $DIV[$DIV_INDEX-1]
        Else
            Return "Out_of_range"
        EndIf
    Else
        Return "No_div_found"
    EndIf
EndFunc

I have tried this using _IEBodyReadHTML and it does not work. In fact, if I try to search for any string between any tags using _StringBetween(), _StringBetween() will always return 0. Any ideas why? With normal strings created manually it does work, but when I use _IEBodyReadHTML and store the HTML as a string in a variable, it simply will not let me perform any string management functions on it whatsoever. Why?

_____________________________________________________[size="2"][font="Arial"]"Pain is temporary, glory is forever."[/font][/size]

Share this post


Link to post
Share on other sites

Always the first DIV?

$oDiv = _IETagnameGetCollection($oIE, "div", 0)

$sTheTextYouWant = _IEPropertyGet($oDiv, "innertext")

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

@hallaplay835

Test the variable that content returned by _IEBodyReadHTML() to be sure not fail.


When the words fail... music speaks

Share this post


Link to post
Share on other sites

@hallaplay835

Test the variable that content returned by _IEBodyReadHTML() to be sure not fail.

The variable returned by _IEBodyReadHTML() works fine, the HTML markup is retrieved as expected. The part that does not work is when I search within the code using _StringBetween(), which always returns a non-array variable containing 0, which, according to the documentation, means that the _StringBetween() function has failed to find anything between the tags. The strange thing is, that when I try to do the same thing but with simple HTML markup, say, the following:

#include <IE.au3>
#include <String.au3>
#include <Array.au3>

$oIE_basic = _IE_Example("basic")
$HTML = _IEBodyReadHTML($oIE_basic)

$result = _StringBetween($HTML, '<div id=line2>', '</div>')
_ArrayDisplay($result, "Title")
$result2 = $result[0]
MsgBox(0, "Title", $result2)

it does work! I do not know why then, using a much longer HTML markup, the function fails to find the string.


_____________________________________________________[size="2"][font="Arial"]"Pain is temporary, glory is forever."[/font][/size]

Share this post


Link to post
Share on other sites

Maybe you guys can find something in the REAL HTML markup that impedes the function to search for the string.

The HTML is too long so I paste below an image outlining the main structure of the document:

Posted Image

As you can see, in reality I did not want the contents of the first div tag of the document (I only said that to make things simpler); I am trying to find the contents of:

html---body---table---tbody---tr---td---2nd table---tbody---tr---td---table---tbody---tr---table---tbody---tr---3rd td.

This td contains an <a> tag, a <br> tag and a number, usually between 100 and 100000. The number is random and that is why I want the script to be able to read it each time the page is rendered.

I know it is a pain in the ass with all those tables, but is there a way of telling the script to look for that td and retrieve its contents (the structure of the HTML markup will always be the same i.e same number of tables, tr tags, td tags etc...)?

Thanks for the support, I really appreciate this.


_____________________________________________________[size="2"][font="Arial"]"Pain is temporary, glory is forever."[/font][/size]

Share this post


Link to post
Share on other sites

I would grab the nested table (the 5th one, index 4) and write its contents to an array and then grab the array element you want

$oTable = _IETableGetCollection($oIE, 4)
$aValues = _IETableWriteToArray($oTable, True)

then $aValues[0][2] should contain the value you want

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

I would grab the nested table (the 5th one, index 4) and write its contents to an array and then grab the array element you want

$oTable = _IETableGetCollection($oIE, 4)
$aValues = _IETableWriteToArray($oTable, True)

then $aValues[0][2] should contain the value you want

Dale

I do not understand. How can the function retrieve the specified table if the tables are nested within each other? I want to grab a table that is nested within a table that is nested within another table. There is simply no way the function can account for this 'nesting'; I've tried it and it returns zero. Moreover, there are <tbody> tags also, I do not think the _IETableWriteToArray() function can account for this (maybe I am wrong).

Could you please be more explicit, or write the full code that will work? Maybe I am stupid and I am missing something out, but it simply does not seem to work for me.

Thanks.


_____________________________________________________[size="2"][font="Arial"]"Pain is temporary, glory is forever."[/font][/size]

Share this post


Link to post
Share on other sites

That IS essentiall the full code, let me get a spoon...

#include <IE.au3>

$oIE = _IECreate(your-url-here)
$oTable = _IETableGetCollection($oIE, 4)
$aValues = _IETableWriteToArray($oTable, True)
$sTheValueYouWant = $aValues[0][2]

You need to read about _IETableGetCollection and see that the index, 4, will grab a reference to the 5th <table> tag on the page.

You keep saysing things return '0'. Please make sure you understand what an object is. _IETableGetCollection returns an object, not an integer. If you want to test success, use IsObj($oTable).

Look around the forum for examples of the functions referenced here and read the help file both for the specific functions and the Obj/COM reference section.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

That IS essentiall the full code, let me get a spoon...

#include <IE.au3>

$oIE = _IECreate(your-url-here)
$oTable = _IETableGetCollection($oIE, 4)
$aValues = _IETableWriteToArray($oTable, True)
$sTheValueYouWant = $aValues[0][2]

You need to read about _IETableGetCollection and see that the index, 4, will grab a reference to the 5th <table> tag on the page.

You keep saysing things return '0'. Please make sure you understand what an object is. _IETableGetCollection returns an object, not an integer. If you want to test success, use IsObj($oTable).

Look around the forum for examples of the functions referenced here and read the help file both for the specific functions and the Obj/COM reference section.

Dale

Sorry for posting so late. I understand perfectly, and the funtion works fine; I just realised I was grabbing a reference to the incorrect table, just managed to get it working. Thanks for your support.

_____________________________________________________[size="2"][font="Arial"]"Pain is temporary, glory is forever."[/font][/size]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0