Jump to content
Sign in to follow this  

Extracting string from <p> tags

Recommended Posts

Hello, I'm new at this, I'm trying to scrape the post ID number at the bottom of the page so I can ideally put it into a spreadsheet.  the HTML for a craigslist ad is below:

<body class="posting en desktop w1024">
    <script type="text/javascript"><!--    function C(k){return(document.cookie.match('(^|; )'+k+'=([^;]*)')||0)[2]}


    <div class="postinginfos">
        <p class="postinginfo">post id: 4806467016</p>
        <p class="postinginfo">posted: <time datetime="2014-12-15T12:58:21-0600" title="2014-12-15 12:58pm">2 hours ago</time></p>

I can get to the div using the class name, but I can't go any further. Here is my script

; Script Start - Add your code below here
#include <IE.au3>

Local $oIE = _IECreate("http://www.craigslist.org")
WinWait("craigslist: austin jobs, apartments, personals, for sale, services, community, and events - Internet Explorer provided by Dell")
_IELinkClickByText($oIE, "cars+trucks")
_IELinkClickByText($oIE, "ALL CARS & TRUCKS")
_IELinkClickByText($oIE, "list")
_IELinkClickByIndex ($oIE, 27)

; - On the page, get the post ID, and copy it
$tags = $oIE.document.GetElementsByTagName("div")
For $tag in $tags
    $class_value = $tag.ClassName
    If $class_value = "postinginfos" Then

       ;not sure where to go from here


I can do things like get everything in the postinginfos div and display it in a message box, but I cannot extract the post id out of the line: <p class="postinginfo">post id: 4806467016</p>


I think I need to use the .innertext method, but I am unsure how to continue.

Any help is appreciated!

Share this post

Link to post
Share on other sites

That's the "innerText" or "outerText" of the paragraph tag.

BTW, Austin?  You're right up the road from me ;)

pseudo code:

    <div class="postinginfos">
        <p class="postinginfo">post id: 4806467016</p>

$goDiv = _IETagNameGetCollection($oIE, "div")
If @error Then Exit 2

Local $goPar
For $oDiv In $goDiv
    If $oDiv.className = "postinginfos" Then
        $goPar = _IETagNameGetCollection($oDiv, "p")
        If @error Then ContinueLoop
        For $oPar In $goPar
            If $oPar.className = "postinginfo" Then
                ConsoleWrite("Found: " & $oPar.innerText & @CRLF)

replace your $tags = section with that.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Create New...