Jump to content

Recommended Posts

Posted (edited)
#include <Inet.au3>
#include <Array.au3>

$sUrl       = "https://deadline.com/"

$sRegEx     = '(?<=(?:post-title">))((\n)|.)*?(?=(?:<p class="post-author-time))'

$sHTML      = _INetGetSource($sUrl)

;~ MsgBox(0,"",$sHTML)

$aArticles = StringRegExp($sHTML,$sRegEx,3) ; get articles

_ArrayDisplay($aArticles)

;~ ConsoleWrite($aArticles[0] & @CRLF)

I want to do a simple get of HTML texts on this news site for each article. I know that this site has 12 articles on their front page, and the after I apply the regex to split each article into an array, I can see that it has 12 elements as well, but they are empty. I assume it has something to do with the linebreaks; because when I do the same but for just single lines, the elements in the array are no longer empty. How do I fix this to have the elements contain the article info and not be empty?

Edited by yyywww
Posted (edited)

@FrancescoDiMuro

Edit: No, it's actually everything inbetween post-title"> and <p class="post-author-time

But, what exactly you get is not very important, it could obtain anything from this site; but it needs to be multiple lines at once (Because when I get single lines it does work). I'm more interested in why the array contains empty elements when I do it like this with the code above, or what I need to change in order to not have the array contain empty elements, but instead contain the HTML between those tags.

Edited by yyywww
Posted (edited)

@yyywww
Something like this?

#include <Array.au3>
#include <Inet.au3>
#include <StringConstants.au3>

Global $strUrl = "https://deadline.com/", _
       $strHTML = "", _
       $arrResult

$strHTML = _INetGetSource($strURL, True)

$arrResult = StringRegExp($strHTML, '(?s)<h2 class="post-title">(.*?)<p class="post-author-time">', $STR_REGEXPARRAYGLOBALMATCH)

_ArrayDisplay($arrResult)

:)

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Posted

@FrancescoDiMuro

With the help of your script I was able to narrow down the issue: In my faulty script I used (.)*?, but I should have used (.*?) instead. I also learned about the usage of (?s) which was very helpful. Thanks.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...