Jump to content
Sign in to follow this  
yyywww

Global StringRegExp returns Array but its elements are empty

Recommended Posts

#include <Inet.au3>
#include <Array.au3>

$sUrl       = "https://deadline.com/"

$sRegEx     = '(?<=(?:post-title">))((\n)|.)*?(?=(?:<p class="post-author-time))'

$sHTML      = _INetGetSource($sUrl)

;~ MsgBox(0,"",$sHTML)

$aArticles = StringRegExp($sHTML,$sRegEx,3) ; get articles

_ArrayDisplay($aArticles)

;~ ConsoleWrite($aArticles[0] & @CRLF)

I want to do a simple get of HTML texts on this news site for each article. I know that this site has 12 articles on their front page, and the after I apply the regex to split each article into an array, I can see that it has 12 elements as well, but they are empty. I assume it has something to do with the linebreaks; because when I do the same but for just single lines, the elements in the array are no longer empty. How do I fix this to have the elements contain the article info and not be empty?

Edited by yyywww

Share this post


Link to post
Share on other sites

@yyywww
So you want to get everything between <p class="post-author-time"> and the end of the </p> ? :)


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

@FrancescoDiMuro

Edit: No, it's actually everything inbetween post-title"> and <p class="post-author-time

But, what exactly you get is not very important, it could obtain anything from this site; but it needs to be multiple lines at once (Because when I get single lines it does work). I'm more interested in why the array contains empty elements when I do it like this with the code above, or what I need to change in order to not have the array contain empty elements, but instead contain the HTML between those tags.

Edited by yyywww

Share this post


Link to post
Share on other sites

@yyywww
Something like this?

#include <Array.au3>
#include <Inet.au3>
#include <StringConstants.au3>

Global $strUrl = "https://deadline.com/", _
       $strHTML = "", _
       $arrResult

$strHTML = _INetGetSource($strURL, True)

$arrResult = StringRegExp($strHTML, '(?s)<h2 class="post-title">(.*?)<p class="post-author-time">', $STR_REGEXPARRAYGLOBALMATCH)

_ArrayDisplay($arrResult)

:)

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

@FrancescoDiMuro

With the help of your script I was able to narrow down the issue: In my faulty script I used (.)*?, but I should have used (.*?) instead. I also learned about the usage of (?s) which was very helpful. Thanks.

Share this post


Link to post
Share on other sites

@yyywww
Happy to have helped and, you're welcome :)


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...