Jump to content

[solved] StringRegExp command is kidding me


Marc
 Share

Recommended Posts

Hi all,

I'm confused. (Even more than usual, that is)

I am trying to capture the URL of the translated german dilbert comic.

If I put the code from the web page into the RegExpQuickTester, I get the desired URL back.

Using the very same RegEx in my Script, the url is not matched. 

Most likely it's a very stupid error I made, but I can't figure it out. New year really starts great 🤣

 

HttpSetUserAgent('Mozilla / 5.0')

; Tag in Sourcecode of Webpage is: ; src="https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg"
; so the complete source code of the webpage should be replaced with the match $1 of the regex. Result should be
; https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg

Local $source = _INetGetSourceEx('http://www.ingenieur.de/Spiel-Spass/Dilbert')
Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert_._.*?_2021-01-01_.*?\.jpg).*'
Local $url = ""

If StringRegExp($source, $suche) Then
    $url = StringRegExpReplace($source, $suche, "$1")
    MsgBox(0,"", $url)
Else
    MsgBox(16,"oops", "RegEx Problem" & @error)
    ClipPut($suche & @CRLF & $source)
EndIf

Func _INetGetSourceEx($s_URL, $bString = True)
    ; https://www.autoitscript.com/forum/topic/107500-inetgetsource-utf-8-problem/
    Local $sString = InetRead($s_URL, 1)
    Local $nError = @error, $nExtended = @extended
    If $bString Then $sString = BinaryToString($sString, 4)
    Return SetError($nError, $nExtended, $sString)
EndFunc   ;==>_INetGetSourceEx

best regards,

Marc

Edited by Marc
improved demo source code for more clarity

Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL)

Link to comment
Share on other sites

I'm trying to get the very specific Image-URL out of the complete source code of the web page, so this regex would be a little bit too unspecific.

No idea why the regex works in the regex tool and in regex101, but not in my script... 

Update: If the complete source code is copied in the regex tester tool, it does not find the URL.

If the text before line 567 is removed, it works.

What am I missing here? (the point, obviously)

source.txt

Edited by Marc

Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL)

Link to comment
Share on other sites

Link to comment
Share on other sites

@Nine:Indeed, I am trying to catch the current comic of the day. If available, which is not granted.

@pixelsearch: hm, using (i?) matches but the result of the StringRegExpReplace is the whole sourcecode of the page

In older versions of my script, I worked with StringInStr to get the position, then switched to regex to be more flexible and have a shorter syntax, because the site is sometimes doing funny things like storing the image in a different folder.

For todays comic (2020-01-02), the right URL is

https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3055_2021-01-02_F-980x305.jpg

To get the URL, this one works:

Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_._.*?_' & @YEAR & '-' & @MON & '-' & @MDAY & '.*?\.jpg).*'

But as you can see, I had to change the path of the image to "2020/12" instead of "2021/01". So I wanted to skip the subfolder-part with wildcards, but surprisingly, this one does not match:

Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert_._.*?_' & @YEAR & '-' & @MON & '-' & @MDAY & '.*?\.jpg).*'

Hmm.

Edited by Marc

Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL)

Link to comment
Share on other sites

This works for me using the provided source file

$source = FileRead("source.txt")
$date = @YEAR & '-' & @MON & '-' & StringFormat("%02i", @MDAY-1)
$pattern = '(?i)src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert.+?' & $date & '.+?\.jpg)'
$res = StringRegExp($source, $pattern, 1)
Msgbox(0,"", IsArray($res) ? $res[0] : "nothing today")

 

Edited by mikell
Link to comment
Share on other sites

@pixelsearch: after some thinking, you're right.

@mikell: yes, that works :) 

@all: seems I tricked myself by trying to replace the whole source code of the site with the match instead of jst keeping the resulting match.

🤦‍♂️

See also: Self-Awareness (savagechickens.com)

Edited by Marc

Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL)

Link to comment
Share on other sites

  • Marc changed the title to [solved] StringRegExp command is kidding me

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...