Jump to content
Sign in to follow this  
lilx

[Question] Cutting Div content from html page

Recommended Posts

Hello,

i have the following question i am trying to extract the content from a specific div. i know that there is a command "_IEGetObjById($oIE, "divID")"

but i end up with to much garbage because the div containing a id; have a couple of child div which only have a class. so i wondered is there a possibility to look for class name in stat of a ID?

Share this post


Link to post
Share on other sites

provide a snippet of the HTML, and I can show you how to use my signature...or you can use DOM methods to loop through the children of your parent returned by:_IEGetObjById($oIE, "divID")

http://www.w3schools.com/htmldom/dom_methods.asp


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

You can use _IETagnameGetCollection, DIV starting with the DIV you have and use the 0-based index to specify the nested DIV you want. Or, you can loop through the DIV collection returned without the index and look at attributes like classname for a match.

Dale

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

provide a snippet of the HTML, and I can show you how to use my signature...or you can use DOM methods to loop through the children of your parent returned by:_IEGetObjById($oIE, "divID")

http://www.w3schools.com/htmldom/dom_methods.asp

hi here is the part of the html page i am intrested in.

<div id="rt-main" class="mb8-sa4">
<div class="rt-mainsection">
     <div class="rt-mainrow">
                                             <div class="rt-grid-8 rt-alpha">
                                                                     <div class="rt-block component-block">
             <div class="component-content">
                 <div class="item-page">
<h2>
<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">
[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>
</h2>

<dl class="article-info">
<dt class="article-info-term">Gegevens</dt>
[b]<dd class="category-name">
Categorie: <a href="/nokia">Nokia</a> </dd>
<dd class="published">
Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]
</dl>

[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]

this site provide the latest news on smartphones, I want to able to extract the data of the post to provide it as a news feed on a project im working on, for the record the source will be added.

The information that the program will need to produce lies in the bold marked code. Do you have any suggestions how to accomplish this?

Share this post


Link to post
Share on other sites

lilx,

Kludgy, non-IE solution (FWIW)

#include <array.au3>

local $str = '<div id="rt-main" class="mb8-sa4"> ' & _
'<div class="rt-mainsection">' & _
     '<div class="rt-mainrow">' & _
                                             '<div class="rt-grid-8 rt-alpha">' & _
                                                                     '<div class="rt-block component-block">' & _
             '<div class="component-content">' & _
                 '<div class="item-page">' & _
'<h2>' & _
'<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">' & _
'[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>' & _
'</h2>' & _
'<dl class="article-info">' & _
'<dt class="article-info-term">Gegevens</dt>' & _
'[b]<dd class="category-name">' & _
'Categorie: <a href="/nokia">Nokia</a> </dd>' & _
'<dd class="published">' & _
'Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]' & _
'</dl>' & _
'[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]'

$str = stringregexpreplace($str,'\[b\]','')
$str = stringregexpreplace($str,'\[/b\]','')
$ret = stringregexp($str,'>(.*?)<',3)

for $1 = ubound($ret) - 1 to 0 step -1
    if stringlen(stringstripws($ret[$1],3)) = 0 then    _arraydelete($ret,$1)
next

 _arraydisplay($ret)

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

lilx,

This might give you some ideas for an IE solution.

#include <array.au3>
#include <ie.au3>

local $str = '<div id="rt-main" class="mb8-sa4"> ' & _
'<div class="rt-mainsection">' & _
     '<div class="rt-mainrow">' & _
                                             '<div class="rt-grid-8 rt-alpha">' & _
                                                                     '<div class="rt-block component-block">' & _
             '<div class="component-content">' & _
                 '<div class="item-page">' & _
'<h2>' & _
'<a href="/nokia/mock-up-windows-rt-tablet-van-nokia-duikt-op">' & _
'[b]Mock-up Windows RT tablet van Nokia duikt op[/b]</a>' & _
'</h2>' & _
'<dl class="article-info">' & _
'<dt class="article-info-term">Gegevens</dt>' & _
'[b]<dd class="category-name">' & _
'Categorie: <a href="/nokia">Nokia</a> </dd>' & _
'<dd class="published">' & _
'Gepubliceerd op woensdag, 02 januari 2013 </dd>[/b]' & _
'</dl>' & _
'[b]<p><img src="/images/Nokia/Nokia-Windows-RT.jpg" border="0" alt="Nokia Windows RT Mockup" style="border: 0px; display: block; margin-left: auto; margin-right: auto;" /></p>………………..[/b]'

Local $ohtml = ObjCreate('HTMLFILE')

If Not IsObj($ohtml) Then SetError(-1)

$ohtml.open()
$ohtml.write($str)
$ohtml.close()

Local $odivs = _IETagnameGetCollection($ohtml, 'div'), $o_str

if not isobj($odivs) then seterror(-2)

for $odiv in $odivs
    ConsoleWrite('!  Classname = ' & $odiv.classname & '   title = ' & $odiv.title & '   id = ' & $odiv.id & @LF)
    consolewrite('>' & @tab & @tab & $odiv.innertext & @lf)
Next

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Your answer in in the reply I left for you. If you don't understand it, do some reading and ask questions, but I wouldn't suggest you just ignore it, bump and hope someone writes it for you.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

hi guys,

sorry for the late response but indeed i found my answer with _IETagNameGetCollection.

I bumped the topic because i couldn't get it to work the first time with _IETagNameGetCollection and other commands, but now i gto it working thank you for tip.

Here is what i got so far:

$divs = _IETagNameGetCollection($oIE, "div")

For $div In $divs
Local $Title, $ImgSource, $Publised
If $div.className == "item-page" Then
$htmlcontent = String($div.innerHTML)

$Title = _StringBetween($htmlcontent, '">', '')

$ImgSource = _StringBetween($htmlcontent, 'src="', '"')

$rawPublised = _StringBetween($htmlcontent, '<dd class="published">', '</dd>')
    _formatdate($rawPublised)
    EndIf
    Next
    
    
    _IEQuit($oIE)
    
    Func _formatdate($sString) ; " Gepubliceerd op woensdag, 02 januari 2013 "
    $day = StringRegExp ( $sString, '([0-9]{2})', 1)
    $year = StringRegExp ( $sString, '([0-9]{4})', 1)
    $month = StringRegExp ( $sString, '([a-z]{3-9})(:? [0-9]{4})', 1)
    ConsoleWrite ($day[0] & ' ' & $month[0] & ' ' & $year[0])
    EndFunc

i now have a question about StringRegEXP hope you can help me with this.

as you can see $sString contains this value "Gepubliceerd op woensdag, 02 januari 2013" and i was able the extract the 02 and 2013 value.

But now i am tryin to extract the month but my criteria isent working, my idea was to look at the string as followed "02 " extract " 2013" any idea's?

Share this post


Link to post
Share on other sites

nvm found what i was looking for, StringRegExp ( $sString, '([a-z]{3,9})(?: ([0-9]{4}))', 1).

i think that i can finish it from here off.

thank you guys for the tips

Topic can be closed

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • By Colduction
      Hello AutoIt Scriptwriters! 
      I want to read https based site that it's address is: Soft98 (https://soft98.ir/)
      I've tried with "_INetGetSource", "BinaryToString(InetRead)" and "InetRead" but none of them don't help me
       
      How can i get this site html source code without opening IE Windows? 
       
    • By Viszna
      Hello.
      I create a function that saves the log to an html file.
      File html are included picture (conversion to Base64)
      Everything works OK.
      But I do the actions:
      screenshot to the png file (smaller than bmp) per disk (  _ScreenCapture_Capture(@ScriptDir & "\screenshot.png")  ) convert image from disk to base64 I need help to optimize the script:
      - screenshot to memory (do not save to disk) - convert this object to png in memory - invoking the conversion of png image from memory to base64 The first step _ScreenCapture_Capture("") create handle to an HBITMAP in memory
      How to convert image in memory to png?
      How to use Func _ConvertToBase64
      I attach my code
      #include <ScreenCapture.au3> #include <Date.au3> Global $RaportFileName = @YEAR & "-" & @MON & "-" & @MDAY & "_" & @HOUR & "_" & @MIN & "_" & @SEC & ".html" $text = "This is first line text" & @CRLF &"and this is next line" _Raport($text, 1) FileWrite(@ScriptDir & "\" & $RaportFileName, "</pre></html>") ; #FUNCTION# ==================================================================================================================== ; Name ..........: _Raport ; Description ...: ; Syntax ........: _Raport($sText1[, $Screen = 0]) ; Parameters ....: $sText1 - a string value. ; $Screen - [optional] an unknown value. Default is 0. ; 0 - Default - do not screenshot ; 1 - added screenshot full desktop ; Return values .: None ; Author ........: Your Name ; Modified ......: ; Remarks .......: ; Related .......: ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func _Raport($sText1, $Screen=0) Local $sText = "" Local $sHead = "" ; Define HTML file header and style $sHead = '<html>' & @CRLF & '<head><meta charset="utf-8"></head>' & @CRLF $sHead = $sHead & '<style>img{border:3px solid #FF0000;}</style>' & @CRLF $sHead = $sHead & '<style>pre{font-family: monospace;}</style>' & @CRLF $sHead = $sHead & '<style>pre{font-size: large;}</style>' & @CRLF $sHead = $sHead & '<pre>' & @CRLF If NOT FileExists(@ScriptDir & "\" & $RaportFileName) Then ; If file Raport not exist then create FileOpen(@ScriptDir & "\" & $RaportFileName, 258) FileWrite(@ScriptDir & "\" & $RaportFileName, $sHead) EndIf If StringInStr($sText1, @CRLF) > 0 Then ; @CRLF (ENTER) change the @CRLF and 11 space (indentation on width "[GG:MM:SS] ") $sText1 = StringReplace($sText1, @CRLF, @CRLF & '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;') EndIf $sText = $sText & $sText1 If $Screen <> 0 Then _ScreenCapture_Capture(@ScriptDir & "\screenshot.png") $sText = $sText & @CRLF & '<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;' & '<img src="data:image/png;base64,' & _ConvertToBase64(@ScriptDir & "\screenshot.png") & '"/>' EndIf FileWrite(@ScriptDir & "\" & $RaportFileName, "[" & _NowTime(5) & "]&nbsp;" & $sText & "<br><br>"&@CRLF) ; write to file Raport EndFunc Func _ConvertToBase64($fFile) ;Xroot 2011 ;ClipPut("") ;$FN=@ScriptDir & "\screenshot.png" $FN=$fFile $dat=FileRead(FileOpen($FN,16)) $objXML=ObjCreate("MSXML2.DOMDocument") $objNode=$objXML.createElement("b64") $objNode.dataType="bin.base64" $objNode.nodeTypedValue=$dat ClipPut("") $Wynik = "" ;ClipPut($objNode.Text) $Wynik = $objNode.Text Return $Wynik EndFunc P.S.
      Excuse me my not good English.
    • By Arlen
      I need to find a way to get the HTML from a website that has HTTPS. This is what I have tried:
      - WinHttp (Only worked on HTTP NOT HTTPS)
      - InetGet and _INetGetSource Function (Not working for HTTPS)
      - IE.au3 (Does work but it's too slow for my needs)
      If anybody can point me on the right direction, I would really appreciate it.
       
    • By XinYoung
      Greetings!
      I am in need of your guidance once again. I searched the forums for clicking in span, clicking by class, clicking without an ID or Name, etc., but I am unable to find a solution for my problem.
      I am trying to expand this tree in IE. There is an arrow ( > ) that i need to click, but I can't find a way to do it . Alternatively, I can double-click the text "Servers", but that seems to be even more troublesome. I will have to do this 2-3 more times as the tree expands.
      There appears to be an ID for the tree, simply called "tree", but that isn't working when I send a click to it.

      ;Open an IE session and navigate to pgAdmin. Global $oIE = _IECreate($pgAdmin) ;Maximize the IE window. WinSetState(_IEPropertyGet($oIE, "hwnd"), "", @SW_MAXIMIZE) Sleep(2000) ;Expand the tree $oTree = _IEGetObjById($oIE, "tree") _IEAction($oTree, "click") Any ideas?  
    • By XaelloNegative
      Hi,
      So, I am trying to automate and simplify the retrieval of data from our company website (made by an outside company). I've had experience with IE manipulation via autoIt however, this one is a bit tricky for me.
      I have a table that has 81 cells which are buttons. What im trying to get is the information in "data-content" tag. 
      <a id="bookedVehicleLinkButton" class="btn btn-sm btn-block btn-xsm btn-success" data-toggle="popover" data-placement="bottom" data-original-title="Trip Details" data-container="body" data-content="Plate No: UGQ-857<br />Model: TOYOTA GRANDIA GL" href="javascript:__doPostBack('ctl00$ctl00$masterContentPlaceholder$reservationContentPlaceholder$bookingDataList$ctl00$vehicleDataList$ctl28$bookedVehicleLinkButton','')">IXARA (LITO SULIT)</a> Tried using the following:
      $oTagsCell = _IETagNameGetCollection($oIE, "td") For $item in $oTagsCell $test = _IEPropertyGet($item, "innertext") ClipPut($test) Next $oTagsTable = _IETagNameGetCollection($oIE, "table") For $item in $oTagsTable $test = _IEPropertyGet($item, "innertext") ClipPut($test) Next $oTagsLink = _IETagNameGetCollection($oIE, "a") For $item in $oTagsLink $test = _IEPropertyGet($item, "innertext") ClipPut($test) Next I don't receive any errors from my aforementioned attempts however no luck in getting those "data-contents". Attached is a screenshot, I do the looking of elements in chrome but the codes are for IE. And if you might ask, the cells (or buttons) have the same ID.
       
      Thank you guys for any inputs.
       
×
×
  • Create New...