Marc Posted January 1, 2021 Share Posted January 1, 2021 (edited) Hi all, I'm confused. (Even more than usual, that is) I am trying to capture the URL of the translated german dilbert comic. If I put the code from the web page into the RegExpQuickTester, I get the desired URL back. Using the very same RegEx in my Script, the url is not matched. Most likely it's a very stupid error I made, but I can't figure it out. New year really starts great 🤣 HttpSetUserAgent('Mozilla / 5.0') ; Tag in Sourcecode of Webpage is: ; src="https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg" ; so the complete source code of the webpage should be replaced with the match $1 of the regex. Result should be ; https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg Local $source = _INetGetSourceEx('http://www.ingenieur.de/Spiel-Spass/Dilbert') Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert_._.*?_2021-01-01_.*?\.jpg).*' Local $url = "" If StringRegExp($source, $suche) Then $url = StringRegExpReplace($source, $suche, "$1") MsgBox(0,"", $url) Else MsgBox(16,"oops", "RegEx Problem" & @error) ClipPut($suche & @CRLF & $source) EndIf Func _INetGetSourceEx($s_URL, $bString = True) ; https://www.autoitscript.com/forum/topic/107500-inetgetsource-utf-8-problem/ Local $sString = InetRead($s_URL, 1) Local $nError = @error, $nExtended = @extended If $bString Then $sString = BinaryToString($sString, 4) Return SetError($nError, $nExtended, $sString) EndFunc ;==>_INetGetSourceEx best regards, Marc Edited January 1, 2021 by Marc improved demo source code for more clarity Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL) Link to comment Share on other sites More sharing options...
Nine Posted January 1, 2021 Share Posted January 1, 2021 Maybe : 'src="([^"]*)' “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
Marc Posted January 1, 2021 Author Share Posted January 1, 2021 (edited) I'm trying to get the very specific Image-URL out of the complete source code of the web page, so this regex would be a little bit too unspecific. No idea why the regex works in the regex tool and in regex101, but not in my script... Update: If the complete source code is copied in the regex tester tool, it does not find the URL. If the text before line 567 is removed, it works. What am I missing here? (the point, obviously) source.txt Edited January 1, 2021 by Marc Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL) Link to comment Share on other sites More sharing options...
pixelsearch Posted January 1, 2021 Share Posted January 1, 2021 Hi Marc, It works for me with (?i) but not with (?is) (?is) makes it fail when first .*? is encountered in pattern. Result display (truncated in the pic) 0: https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3054_2021-01-01_F-980x305.jpg Link to comment Share on other sites More sharing options...
Nine Posted January 1, 2021 Share Posted January 1, 2021 What is it you are looking for ? Picture of the day ? Last picture of the page ? You didn't say what, except that it does not work ! “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
Marc Posted January 2, 2021 Author Share Posted January 2, 2021 (edited) @Nine:Indeed, I am trying to catch the current comic of the day. If available, which is not granted. @pixelsearch: hm, using (i?) matches but the result of the StringRegExpReplace is the whole sourcecode of the page In older versions of my script, I worked with StringInStr to get the position, then switched to regex to be more flexible and have a shorter syntax, because the site is sometimes doing funny things like storing the image in a different folder. For todays comic (2020-01-02), the right URL is https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_d_3055_2021-01-02_F-980x305.jpg To get the URL, this one works: Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/2020/12/Dilbert_._.*?_' & @YEAR & '-' & @MON & '-' & @MDAY & '.*?\.jpg).*' But as you can see, I had to change the path of the image to "2020/12" instead of "2021/01". So I wanted to skip the subfolder-part with wildcards, but surprisingly, this one does not match: Local $suche = '(?is).* src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert_._.*?_' & @YEAR & '-' & @MON & '-' & @MDAY & '.*?\.jpg).*' Hmm. Edited January 2, 2021 by Marc Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL) Link to comment Share on other sites More sharing options...
mikell Posted January 2, 2021 Share Posted January 2, 2021 (edited) This works for me using the provided source file $source = FileRead("source.txt") $date = @YEAR & '-' & @MON & '-' & StringFormat("%02i", @MDAY-1) $pattern = '(?i)src="(https://www.ingenieur.de/wp-content/uploads/.+?/Dilbert.+?' & $date & '.+?\.jpg)' $res = StringRegExp($source, $pattern, 1) Msgbox(0,"", IsArray($res) ? $res[0] : "nothing today") Edited January 2, 2021 by mikell Link to comment Share on other sites More sharing options...
Marc Posted January 2, 2021 Author Share Posted January 2, 2021 (edited) @pixelsearch: after some thinking, you're right. @mikell: yes, that works @all: seems I tricked myself by trying to replace the whole source code of the site with the match instead of jst keeping the resulting match. 🤦♂️ See also: Self-Awareness (savagechickens.com) Edited January 2, 2021 by Marc Any of my own codes posted on the forum are free for use by others without any restriction of any kind. (WTFPL) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now