Sign in to follow this  
Followers 0
lilx

[Question] Cutting Div content from html page

10 posts in this topic

Hello,

i have the following question i am trying to extract the content from a specific div. i know that there is a command "_IEGetObjById($oIE, "divID")"

but i end up with to much garbage because the div containing a id; have a couple of child div which only have a class. so i wondered is there a possibility to look for class name in stat of a ID?

Share this post


Link to post
Share on other sites



provide a snippet of the HTML, and I can show you how to use my signature...or you can use DOM methods to loop through the children of your parent returned by:_IEGetObjById($oIE, "divID")

http://www.w3schools.com/htmldom/dom_methods.asp


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

You can use _IETagnameGetCollection, DIV starting with the DIV you have and use the 0-based index to specify the nested DIV you want. Or, you can loop through the DIV collection returned without the index and look at attributes like classname for a match.

Dale

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

provide a snippet of the HTML, and I can show you how to use my signature...or you can use DOM methods to loop through the children of your parent returned by:_IEGetObjById($oIE, "divID")

http://www.w3schools.com/htmldom/dom_methods.asp

hi here is the part of the html page i am intrested in.

<div id="rt-main" class="mb8-sa4">
<div class="rt-mainsection">
     <div class="rt-mainrow">
                                             <div class="rt-grid-8 rt-alpha">
                                                                     <div class="rt-block component-block">
             <div class="component-content">
                 <div class="item-page">
<h2>
<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">
[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>
</h2>

<dl class="article-info">
<dt class="article-info-term">Gegevens</dt>
[b]<dd class="category-name">
Categorie: <a href="/nokia">Nokia</a> </dd>
<dd class="published">
Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]
</dl>

[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]

this site provide the latest news on smartphones, I want to able to extract the data of the post to provide it as a news feed on a project im working on, for the record the source will be added.

The information that the program will need to produce lies in the bold marked code. Do you have any suggestions how to accomplish this?

Share this post


Link to post
Share on other sites

bump..

Share this post


Link to post
Share on other sites

lilx,

Kludgy, non-IE solution (FWIW)

#include <array.au3>

local $str = '<div id="rt-main" class="mb8-sa4"> ' & _
'<div class="rt-mainsection">' & _
     '<div class="rt-mainrow">' & _
                                             '<div class="rt-grid-8 rt-alpha">' & _
                                                                     '<div class="rt-block component-block">' & _
             '<div class="component-content">' & _
                 '<div class="item-page">' & _
'<h2>' & _
'<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">' & _
'[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>' & _
'</h2>' & _
'<dl class="article-info">' & _
'<dt class="article-info-term">Gegevens</dt>' & _
'[b]<dd class="category-name">' & _
'Categorie: <a href="/nokia">Nokia</a> </dd>' & _
'<dd class="published">' & _
'Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]' & _
'</dl>' & _
'[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]'

$str = stringregexpreplace($str,'\[b\]','')
$str = stringregexpreplace($str,'\[/b\]','')
$ret = stringregexp($str,'>(.*?)<',3)

for $1 = ubound($ret) - 1 to 0 step -1
    if stringlen(stringstripws($ret[$1],3)) = 0 then    _arraydelete($ret,$1)
next

 _arraydisplay($ret)

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

lilx,

This might give you some ideas for an IE solution.

#include <array.au3>
#include <ie.au3>

local $str = '<div id="rt-main" class="mb8-sa4"> ' & _
'<div class="rt-mainsection">' & _
     '<div class="rt-mainrow">' & _
                                             '<div class="rt-grid-8 rt-alpha">' & _
                                                                     '<div class="rt-block component-block">' & _
             '<div class="component-content">' & _
                 '<div class="item-page">' & _
'<h2>' & _
'<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">' & _
'[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>' & _
'</h2>' & _
'<dl class="article-info">' & _
'<dt class="article-info-term">Gegevens</dt>' & _
'[b]<dd class="category-name">' & _
'Categorie: <a href="/nokia">Nokia</a> </dd>' & _
'<dd class="published">' & _
'Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]' & _
'</dl>' & _
'[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]'

Local $ohtml = ObjCreate('HTMLFILE')

If Not IsObj($ohtml) Then SetError(-1)

$ohtml.open()
$ohtml.write($str)
$ohtml.close()

Local $odivs = _IETagnameGetCollection($ohtml, 'div'), $o_str

if not isobj($odivs) then seterror(-2)

for $odiv in $odivs
    ConsoleWrite('!  Classname = ' & $odiv.classname & '   title = ' & $odiv.title & '   id = ' & $odiv.id & @LF)
    consolewrite('>' & @tab & @tab & $odiv.innertext & @lf)
Next

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Your answer in in the reply I left for you. If you don't understand it, do some reading and ask questions, but I wouldn't suggest you just ignore it, bump and hope someone writes it for you.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

hi guys,

sorry for the late response but indeed i found my answer with _IETagNameGetCollection.

I bumped the topic because i couldn't get it to work the first time with _IETagNameGetCollection and other commands, but now i gto it working thank you for tip.

Here is what i got so far:

$divs = _IETagNameGetCollection($oIE, "div")

For $div In $divs
Local $Title, $ImgSource, $Publised
If $div.className == "item-page" Then
$htmlcontent = String($div.innerHTML)

$Title = _StringBetween($htmlcontent, '">', '')

$ImgSource = _StringBetween($htmlcontent, 'src="', '"')

$rawPublised = _StringBetween($htmlcontent, '<dd class="published">', '</dd>')
    _formatdate($rawPublised)
    EndIf
    Next
    
    
    _IEQuit($oIE)
    
    Func _formatdate($sString) ; " Gepubliceerd op woensdag, 02 januari 2013 "
    $day = StringRegExp ( $sString, '([0-9]{2})', 1)
    $year = StringRegExp ( $sString, '([0-9]{4})', 1)
    $month = StringRegExp ( $sString, '([a-z]{3-9})(:? [0-9]{4})', 1)
    ConsoleWrite ($day[0] & ' ' & $month[0] & ' ' & $year[0])
    EndFunc

i now have a question about StringRegEXP hope you can help me with this.

as you can see $sString contains this value "Gepubliceerd op woensdag, 02 januari 2013" and i was able the extract the 02 and 2013 value.

But now i am tryin to extract the month but my criteria isent working, my idea was to look at the string as followed "02 " extract " 2013" any idea's?

Share this post


Link to post
Share on other sites

nvm found what i was looking for, StringRegExp ( $sString, '([a-z]{3,9})(?: ([0-9]{4}))', 1).

i think that i can finish it from here off.

thank you guys for the tips

Topic can be closed

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • FrancescoDiMuro
      By FrancescoDiMuro
      Good evening everyone
      Before all, I want to say that I'm doing this script to see how _IE* functions work, and see if my studs can hack a quiz I'm working on.
      I want to clarify that I'm not automating any game, bypassing any CAPTCHAs, or anything that could damage anyone.
      I was trying to autofill a form, based on which question is displayed.
      The question is always stored in here:
      <header> <h1><span class="questionid">1. </span>Here goes the question</h1> </header> And answers are stored in here:
      <ul class="answers"> <li><label><span><input id="answer_0" name="answer[]" type="radio" value="0">Answer 1</span></label></li> <li><label><span><input id="answer_1" name="answer[]" type="radio" value="1">Answer 2</span></label></li> <li><label><span><input id="answer_2" name="answer[]" type="radio" value="2">Anwser 3</span></label></li> <li><label><span><input id="answer_3" name="answer[]" type="radio" value="3">Answer 4</span></label></li> </ul></fieldset></form></div> And, there are 15 questions like this.
      How can automatically fill my form?
      Thanks in advance
      Francesco
    • houser747
      By houser747
      I have previously used _IEFormElementGetObjByName and _IEFormElementSetValue to enter text into a search box on a form and then submit the form.
      I am now trying to enter text into a search box which is not part of a form. 
      Here is the HTML from the website that i'm trying to enter the data on and then submit the search.
      <div class="row">
          <div class="form-group col-xs-12">
              <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_lblSearchText" for="input-search">Registreringsbeteckning</span>
              <div class="input-group col-xs-12">
                  <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_preSearchText" class="input-group-addon">SE -</span>
                  <input name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$txtSearchText" type="text" value="DTH" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_txtSearchText" class="form-control" />
              </div>
          </div>
      </div>
      <div class="row">
          <div class="form-group col-xs-12">
              <label class="sr-only" for="">Sök</label>
              <input type="submit" name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$btnSearch" value="Sök" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_btnSearch" class="btn btn-primary ladda-button" data-style="expand-right" />
          </div>
      </div>
      Many thanks in advance
      cheers
      Roger
    • boy233
      By boy233
      I need to click on the text "Batch submission" but I can not!
      <div class="batchmenu2" onclick="Go('/lot/')" style="background-color: rgb(255, 255, 255);"> <span class="iconep">l</span> <div class="menu"> <b>Batch submission</b> <br> Bulk messages via file </div> </div> How could I do it?
      How can I click the specific OnClick?
       
    • SkysLastChance
      By SkysLastChance
      #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IECreate("website") Local $oForms = _IEFormGetCollection($oIE) For $oForm In $oForms MsgBox($MB_SYSTEMMODAL, "Form Info", $oForm.name) _IEImgClick($oForm.name, "Reserved.ReportViewerWebControl.axd?OpType=Resource&amp;Version=12.0.5522.0&amp;Name=Microsoft.ReportingServices.Rendering.HtmlRenderer.RendererResources.TogglePlus.gif", "src") Next I am having trouble clicking an image. Here is what I have tried.