Sign in to follow this  
Followers 0

Extracting string from <p> tags

4 posts in this topic

Hello, I'm new at this, I'm trying to scrape the post ID number at the bottom of the page so I can ideally put it into a spreadsheet.  the HTML for a craigslist ad is below:

<body class="posting en desktop w1024">
    <script type="text/javascript"><!--    function C(k){return(document.cookie.match('(^|; )'+k+'=([^;]*)')||0)[2]}


    <div class="postinginfos">
        <p class="postinginfo">post id: 4806467016</p>
        <p class="postinginfo">posted: <time datetime="2014-12-15T12:58:21-0600" title="2014-12-15 12:58pm">2 hours ago</time></p>

I can get to the div using the class name, but I can't go any further. Here is my script

; Script Start - Add your code below here
#include <IE.au3>

Local $oIE = _IECreate("")
WinWait("craigslist: austin jobs, apartments, personals, for sale, services, community, and events - Internet Explorer provided by Dell")
_IELinkClickByText($oIE, "cars+trucks")
_IELinkClickByText($oIE, "ALL CARS & TRUCKS")
_IELinkClickByText($oIE, "list")
_IELinkClickByIndex ($oIE, 27)

; - On the page, get the post ID, and copy it
$tags = $oIE.document.GetElementsByTagName("div")
For $tag in $tags
    $class_value = $tag.ClassName
    If $class_value = "postinginfos" Then

       ;not sure where to go from here


I can do things like get everything in the postinginfos div and display it in a message box, but I cannot extract the post id out of the line: <p class="postinginfo">post id: 4806467016</p>


I think I need to use the .innertext method, but I am unsure how to continue.

Any help is appreciated!

Share this post

Link to post
Share on other sites

#2 ·  Posted (edited)

That's the "innerText" or "outerText" of the paragraph tag.

BTW, Austin?  You're right up the road from me ;)

pseudo code:

    <div class="postinginfos">
        <p class="postinginfo">post id: 4806467016</p>

$goDiv = _IETagNameGetCollection($oIE, "div")
If @error Then Exit 2

Local $goPar
For $oDiv In $goDiv
    If $oDiv.className = "postinginfos" Then
        $goPar = _IETagNameGetCollection($oDiv, "p")
        If @error Then ContinueLoop
        For $oPar In $goPar
            If $oPar.className = "postinginfo" Then
                ConsoleWrite("Found: " & $oPar.innerText & @CRLF)

replace your $tags = section with that.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post

Link to post
Share on other sites

Thank you so much for the replies, I got it!

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
Followers 0