Jump to content

Extracting string from <p> tags


jobotATX
 Share

Go to solution Solved by SmOke_N,

Recommended Posts

Hello, I'm new at this, I'm trying to scrape the post ID number at the bottom of the page so I can ideally put it into a spreadsheet.  the HTML for a craigslist ad is below:

<body class="posting en desktop w1024">
    <script type="text/javascript"><!--    function C(k){return(document.cookie.match('(^|; )'+k+'=([^;]*)')||0)[2]}

   ....etc..

    <div class="postinginfos">
        <p class="postinginfo">post id: 4806467016</p>
        <p class="postinginfo">posted: <time datetime="2014-12-15T12:58:21-0600" title="2014-12-15 12:58pm">2 hours ago</time></p>

I can get to the div using the class name, but I can't go any further. Here is my script

; Script Start - Add your code below here
#include <IE.au3>

Local $oIE = _IECreate("http://www.craigslist.org")
WinWait("craigslist: austin jobs, apartments, personals, for sale, services, community, and events - Internet Explorer provided by Dell")
_IELinkClickByText($oIE, "cars+trucks")
_IELinkClickByText($oIE, "ALL CARS & TRUCKS")
_IELinkClickByText($oIE, "list")
_IELinkClickByIndex ($oIE, 27)

; - On the page, get the post ID, and copy it
$tags = $oIE.document.GetElementsByTagName("div")
For $tag in $tags
    $class_value = $tag.ClassName
    If $class_value = "postinginfos" Then

       ;not sure where to go from here

    EndIf
Next 

I can do things like get everything in the postinginfos div and display it in a message box, but I cannot extract the post id out of the line: <p class="postinginfo">post id: 4806467016</p>

 

I think I need to use the .innertext method, but I am unsure how to continue.

Any help is appreciated!

Link to comment
Share on other sites

  • Moderators
  • Solution

That's the "innerText" or "outerText" of the paragraph tag.

BTW, Austin?  You're right up the road from me ;)

pseudo code:

#cs
    <div class="postinginfos">
        <p class="postinginfo">post id: 4806467016</p>
#ce

$goDiv = _IETagNameGetCollection($oIE, "div")
If @error Then Exit 2

Local $goPar
For $oDiv In $goDiv
    If $oDiv.className = "postinginfos" Then
        $goPar = _IETagNameGetCollection($oDiv, "p")
        If @error Then ContinueLoop
        For $oPar In $goPar
            If $oPar.className = "postinginfo" Then
                ConsoleWrite("Found: " & $oPar.innerText & @CRLF)
            EndIf
        Next
    EndIf
Next

replace your $tags = section with that.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...