Marc 36 Posted January 1 (edited) Hi all, I'm confused. (Even more than usual, that is) I am trying to capture the URL of the translated german dilbert comic. If I put the code from the web page into the RegExpQuickTester, I get the desired URL back. Using the very same RegEx in my Script, the url is not matched. Most likely it's a very stupid error I made, but I can't figure it out. New year really starts great 🤣 HttpSetUserAgent('Mozilla / 5.0') ; Tag in Sourcecode of Webpage is: ; src="https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg" ; so the complete source code of the webpage should be replaced with the match $1 of the regex. Result should be ; https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg Local $source = _INetGetSourceEx('http://www.ingenieur.de/Spiel-Spass/Dilbert') Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert_._.*?_2021-01-01_.*?\.jpg).*' Local $url = "" If StringRegExp($source, $suche) Then $url = StringRegExpReplace($source, $suche, "$1") MsgBox(0,"", $url) Else MsgBox(16,"oops", "RegEx Problem" & @error) ClipPut($suche & @CRLF & $source) EndIf Func _INetGetSourceEx($s_URL, $bString = True) ; https://www.autoitscript.com/forum/topic/107500-inetgetsource-utf-8-problem/ Local $sString = InetRead($s_URL, 1) Local $nError = @error, $nExtended = @extended If $bString Then $sString = BinaryToString($sString, 4) Return SetError($nError, $nExtended, $sString) EndFunc ;==>_INetGetSourceEx best regards, Marc Edited January 1 by Marc improved demo source code for more clarity It's my job to comfort the disturbed and to disturb the comfortable. My Projects: Profiler, MakeSFX, UserInfo, Simple Robocopy Progressbar Share this post Link to post Share on other sites
Nine 921 Posted January 1 Maybe : 'src="([^"]*)' Not much of a signature, but working on it... Spoiler Block all input without UAC Save/Retrieve Images to/from Text Tool to search content in au3 files Date Range Picker Sudoku Game 2020 Overlapped Named Pipe IPC x64 Bitwise Operations Fast and simple WCD IPC GIF Animation (cached) Share this post Link to post Share on other sites
Marc 36 Posted January 1 (edited) I'm trying to get the very specific Image-URL out of the complete source code of the web page, so this regex would be a little bit too unspecific. No idea why the regex works in the regex tool and in regex101, but not in my script... Update: If the complete source code is copied in the regex tester tool, it does not find the URL. If the text before line 567 is removed, it works. What am I missing here? (the point, obviously) source.txt Edited January 1 by Marc It's my job to comfort the disturbed and to disturb the comfortable. My Projects: Profiler, MakeSFX, UserInfo, Simple Robocopy Progressbar Share this post Link to post Share on other sites
pixelsearch 191 Posted January 1 Hi Marc, It works for me with (?i) but not with (?is) (?is) makes it fail when first .*? is encountered in pattern. Result display (truncated in the pic) 0: https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg Share this post Link to post Share on other sites
Nine 921 Posted January 1 What is it you are looking for ? Picture of the day ? Last picture of the page ? You didn't say what, except that it does not work ! Not much of a signature, but working on it... Spoiler Block all input without UAC Save/Retrieve Images to/from Text Tool to search content in au3 files Date Range Picker Sudoku Game 2020 Overlapped Named Pipe IPC x64 Bitwise Operations Fast and simple WCD IPC GIF Animation (cached) Share this post Link to post Share on other sites
Marc 36 Posted January 2 (edited) @Nine:Indeed, I am trying to catch the current comic of the day. If available, which is not granted. @pixelsearch: hm, using (i?) matches but the result of the StringRegExpReplace is the whole sourcecode of the page In older versions of my script, I worked with StringInStr to get the position, then switched to regex to be more flexible and have a shorter syntax, because the site is sometimes doing funny things like storing the image in a different folder. For todays comic (2020-01-02), the right URL is https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3055_2021-01-02_F-980x305.jpg To get the URL, this one works: Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_._.*?_' & @YEAR & '-' & @MON & '-' & @MDAY & '.*?\.jpg).*' But as you can see, I had to change the path of the image to "2020/12" instead of "2021/01". So I wanted to skip the subfolder-part with wildcards, but surprisingly, this one does not match: Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert_._.*?_' & @YEAR & '-' & @MON & '-' & @MDAY & '.*?\.jpg).*' Hmm. Edited January 2 by Marc It's my job to comfort the disturbed and to disturb the comfortable. My Projects: Profiler, MakeSFX, UserInfo, Simple Robocopy Progressbar Share this post Link to post Share on other sites
mikell 1,007 Posted January 2 (edited) This works for me using the provided source file $source = FileRead("source.txt") $date = @YEAR & '-' & @MON & '-' & StringFormat("%02i", @MDAY-1) $pattern = '(?i)src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert.+?' & $date & '.+?\.jpg)' $res = StringRegExp($source, $pattern, 1) Msgbox(0,"", IsArray($res) ? $res[0] : "nothing today") Edited January 2 by mikell Share this post Link to post Share on other sites
Marc 36 Posted January 2 (edited) @pixelsearch: after some thinking, you're right. @mikell: yes, that works @all: seems I tricked myself by trying to replace the whole source code of the site with the match instead of jst keeping the resulting match. 🤦♂️ See also: Self-Awareness (savagechickens.com) Edited January 2 by Marc It's my job to comfort the disturbed and to disturb the comfortable. My Projects: Profiler, MakeSFX, UserInfo, Simple Robocopy Progressbar Share this post Link to post Share on other sites