Jump to content
Sign in to follow this  
youtuber

Data retrieval problem with loop

Recommended Posts

Friends, I'm having trouble in my loop or I don't get the regex demi problem, but I can't get the right data.

My codes are as follows

$aRegexGet = _HttpGetRegexTest("https://autoitscripttr.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500")
;a href="https://www.nofollow.com/" rel="nofollow" target="_blank"
;a href="https://www.dofollow.com/" target="_blank"
$RegExp0 = StringRegExp($aRegexGet, "(?i)href=(?:'|(?:"))(.*)nofollow", 3)
$RegExp1 = StringRegExp($aRegexGet, "(?i)href=(?:'|(?:"))([^'&]+)", 3)
If IsArray($RegExp0) Then
    For $i = 0 To UBound($RegExp0) - 1
        ConsoleWrite($RegExp1[$i] & " " & "Nofollow" & @CRLF)
    Next
Else
    For $c = 0 To UBound($RegExp1) - 1
        ConsoleWrite($RegExp1[$i] & " " & "Dofollow" & @CRLF)
    Next
EndIf


Func _HttpGetRegexTest($aUrl)
    Local $oHTTP = ObjCreate("winhttp.winhttprequest.5.1")
    $oHTTP.Open("GET", $aUrl, False)
    $oHTTP.Send()
    If @error Then
        $error = 1
        $oHTTP = 0
        Return SetError(1)
    EndIf
    If $oHTTP.Status = 200 Then
        Local $sReceived = $oHTTP.ResponseText
        $oHTTP = Null
        Return $sReceived
    EndIf
    $oHTTP = Null
    Return -1
EndFunc   ;==>_HttpGetRegexTest

 

Share this post


Link to post
Share on other sites

What data are you trying to get, what data are you actually getting?

We can't help you if you don't tell us what you need, and what you get, that should be blatantly obvious to you by this time.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

To parse the link structure I specified below

this is an nofollow url
;a href="https://www.nofollow.com/" rel="nofollow" target="_blank"

This is a dofollow url
;a href="https://www.dofollow.com/" target="_blank"

 

Share this post


Link to post
Share on other sites

is there a reason you can't just use a much simpler StringInStr?

 

If StringLeft($sString, 7) = "a href=" Then
    If StringInStr($sString, ";nofollow") > 0 Then
        ConsoleWrite("This is a NO follow" & @CRLF)
    Else
        ConsoleWrite("This is a FOLLOW" & @CRLF)
    EndIf
EndIf

 


hmm... I guess I have to have a signature...

Share this post


Link to post
Share on other sites

your code doesn't work

$aRegexGet = _HttpGetRegexTest("https://autoitscripttr.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500")

$RegExp1 = StringRegExp($aRegexGet, "(?i)href=(?:'|(?:"))([^'&]+)", 3)

If StringLeft($aRegexGet, 7) = "a href=" Then
    If StringInStr($aRegexGet, ";nofollow") > 0 Then
        ConsoleWrite($RegExp1[0] & "This is a NO follow" & @CRLF)
    Else
        ConsoleWrite($RegExp1[0] & "This is a FOLLOW" & @CRLF)
    EndIf
EndIf

Func _HttpGetRegexTest($aUrl)
    Local $oHTTP = ObjCreate("winhttp.winhttprequest.5.1")
    $oHTTP.Open("GET", $aUrl, False)
    $oHTTP.Send()
    If @error Then
        $error = 1
        $oHTTP = 0
        Return SetError(1)
    EndIf
    If $oHTTP.Status = 200 Then
        Local $sReceived = $oHTTP.ResponseText
        $oHTTP = Null
        Return $sReceived
    EndIf
    $oHTTP = Null
    Return -1
EndFunc

 

Share this post


Link to post
Share on other sites

works fine for the 2 example strings you posted above.

If it's not putting anything in the Console then it's not an a href line and it's ignored.


hmm... I guess I have to have a signature...

Share this post


Link to post
Share on other sites

The webmaster who wrote the source code of the concerned page should be fired  :)

#Include <Array.au3>

$aRegexGet = _HttpGetRegexTest("https://autoitscripttr.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500")
$RegExp0 = StringRegExp($aRegexGet, '(https?[^;"]+)(?=(?:&quot;|")[\w=\h"]*rel=(?:&quot;|")?nofollow)', 3)

_ArrayDisplay($RegExp0, "nofollow")

 

Share this post


Link to post
Share on other sites

@mikell how should this html be shaped?
Is my pattern right?

nofollow
<a href="https://testregex.nofollow.com" rel="nofollow">https://testregex.nofollow.com</a>
<a href="https://testregex.nofollowopennewtab.com" target="_blank" rel="nofollow noopener">https://testregex.nofollowopennewtab.com</a>
(https?[^"]+)(?=(?:|")[\w=\h"]*rel=(?:|")?nofollow)

dofollow
<a href="https://testregex.dofollowopennewtab.com" target="_blank" rel="noopener">https://testregex.dofollowyenisekme.com</a>
<a href="https://testregex.dofollow.com">https://testregex.dofollow.com</a>
<a\s+(?:[^>]*?\s+)?href="(?:[^>]*)>
or dofollow regex pattern
<a\s+(?:[^>]*?\s+)?href="(https?[^"]+)(?:["^>]*)>

 

Edited by youtuber

Share this post


Link to post
Share on other sites
22 hours ago, youtuber said:

Is my pattern right?

I don't know. Because I don't know what you exactly want to do, what should precisely be the expected results, etc
You can play with this snippet which will - maybe... - fit your needs

#Include <Array.au3>

$txt = 'nofollow' & _
'<a href="https://testregex.nofollow.com" rel="nofollow">https://testregex.nofollow.com</a>' & _
'<a href="https://testregex.nofollowopennewtab.com" target="_blank" rel="nofollow noopener">"https://testregex.nofollowopennewtab.com</a>' & _
'dofollow' & _
'<a href="https://testregex.dofollowopennewtab.com" target="_blank" rel="noopener">https://testregex.dofollowyenisekme.com</a>' & _
'<a href="https://testregex.dofollow.com">https://testregex.dofollow.com</a>'
;Msgbox(0,"", $txt)

$link = '(https?[^"<>]++)"'
$check = '[\w=\h"]*rel="nofollow'

$a = StringRegExp($txt, $link & '(?=' & $check & ')', 3)
$b = StringRegExp($txt, $link & '(?!' & $check & ')', 3)
_ArrayDisplay($a, "nofollow")
_ArrayDisplay($b, "follow")

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...