youtuber Posted March 2, 2018 Posted March 2, 2018 I would like to get only this part of this url https://autoit.com/test1 but all my tests are failing thanks for this . thanks $url = "https://autoit.com/test1/test2/" $Reg = StringRegExp($url, 'https?://*.*[^/]+', 3) ConsoleWrite($Reg[0] & @CRLF)
ripdad Posted March 3, 2018 Posted March 3, 2018 Much simpler than straining with a regex. Took me 2 minutes and I know it's accurate. Local $url = 'https://autoit.com/test1/test2/' Local $str = StringLeft($url, StringInStr($url, '/', 0, 4) - 1) MsgBox(0, '', $str) youtuber 1 "The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward
Bilgus Posted March 3, 2018 Posted March 3, 2018 Took me 5 and its more flexible $url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, ftp://autovt.com/test7/test8/ " $Reg = StringRegExp($url, '(?i).*?(..?tp.?://[^/]*.?[^/]*)', 3) ; (?i) caseless matching ; .*? 0 or more characters but doesn't keep them ; ( start a capturing group ; . any character 'f or h' ; .? 0 or 1 char 't or `none`' ; tp literally tp ; .? 0 or 1 char 'http>S<' ; : literal colon ; // literal slashes ;[^/]* all characters till a slash ;.? or or 1 character ;[^/]* all characters till a slash (again) ;) end capturing group ConsoleWrite($Reg[0] & @CRLF & $Reg[1] & @CRLF& $Reg[2] & @CRLF) youtuber 1
Bilgus Posted March 3, 2018 Posted March 3, 2018 oh outputs: https://autoit.com/test1 http://autoet.com/test5 ftp://autovt.com/test7 youtuber 1
youtuber Posted March 3, 2018 Author Posted March 3, 2018 @ripdad is also not suitable for me because I will do url verification at the same time. @Bilgus your friend regex has worked well for me thank you. (?i).*?(..?tp.?://[^/]*.?[^/]*) maybe there is another alternative @mikell
mikell Posted March 3, 2018 Posted March 3, 2018 For the fun $sReg = StringRegExpReplace($url, '(.*?/){3}[^/]+\K.*', "") But the usual way is more understandable/accurate/secure $aReg = StringRegExp($url, 'https?://[^/]+/[^/]*', 1) youtuber 1
iamtheky Posted March 3, 2018 Posted March 3, 2018 2 hours ago, youtuber said: is also not suitable for me because I will do url verification at the same time. Are you doing an inetget on the return to verify the url or are you just validating that the first 8 characters are https://? either way it's plenty suitable within the existing parameters. ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
mikell Posted March 3, 2018 Posted March 3, 2018 2 hours ago, iamtheky said: either way it's plenty suitable within the existing parameters I totally agree. The code from ripdad in post #2 works the same way than my srer (find the 4th occurence of "/" and grab all chars on the left) A less classy look maybe (though this could be discussed, for sure) but equal efficiency
Jury Posted March 4, 2018 Posted March 4, 2018 $url = "https://autoit.com/test1/test2/" $Reg = StringRegExpReplace($url, '(https://.*?/\w+).*?$', '$1') ConsoleWrite($Reg & @CRLF) youtuber 1
Bilgus Posted March 4, 2018 Posted March 4, 2018 (edited) OP said he wanted to validate the URL at the same time so how about we start getting crazy $url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ " $aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)(?&host)(?&path)(?&path)', 3) For $i = 0 to UBound($aReg) - 1 ConsoleWrite($aReg[$i] & @CRLF) Next https://autoit.com/test1/test2/ http://autoet.com/test5/test6/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ Edited March 4, 2018 by Bilgus Forgot the outputs youtuber 1
youtuber Posted March 4, 2018 Author Posted March 4, 2018 (edited) @Bilgus Thank you for your help @mikell and @Bilgus I would like to ask you another question, what should be the pattern to cut the http:// or https:// and www. at the beginning? https://autoit.com/test1/test2/ http://aut-oet.com/test5/test6/ https://www.autovt.com/test7/test8/ http://aut.oet.com/test9/test10/ (?:http[s]?:\/\/)?(?:www\.)?([^\/]+\/[^\/]*) Edited March 4, 2018 by youtuber
Bilgus Posted March 4, 2018 Posted March 4, 2018 (edited) You have to turn them into non capturing groups $url = "test.com/test0/ https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ " ;This one is wrong ;$aReg = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3) $aReg = (?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)\K(?&host)(?&path) For $i = 0 to UBound($aReg) - 1 ConsoleWrite($aReg[$i] & @CRLF) Next ConsoleWrite(@CRLF) $bReg = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3) For $i = 0 to UBound($bReg) - 1 ConsoleWrite($bReg[$i] & @CRLF) Next autoit.com/test1/ autoet.com/test5/ blank.com/Nope/ autovt.com/test7/ autoet.com/test9/ autoit.com/test1 autoet.com/test5 blank.com, http:/ autovt.com/test7 autoet.com/test9 Edited March 4, 2018 by Bilgus Fixed
Bilgus Posted March 4, 2018 Posted March 4, 2018 So for the top one first off you can see the validation is a bit more robust I added \K to restart the match after matching Scheme since as far as I know you can't use a named group as a capturing group but we want it to be there for a valid match so we match it then reset the position for matching just after
Bilgus Posted March 4, 2018 Posted March 4, 2018 Really for what you want it could be shortened to $aReg = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3) or $aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE)(?<path>[^/\s,]+/))..?tps?://\K\w+\.\w{3}/?(?&path)(?&path)', 3) the second being just in case you decide you want more than a single portion of the path
iamtheky Posted March 4, 2018 Posted March 4, 2018 or you could just do this ;~ $str = "https://autoit.com/test1/test2/" ;~ $str = "http://aut-oet.com/test5/test6/" $str = "https://www.autovt.com/test7/test8/" ;~ $str = "http://aut.oet.com/test9/test10/" $sMid = Stringmid(stringleft($str , StringInStr($str , "/" , 0 , 4) - 1) , stringinstr($str , "aut")) msgbox(0, '' , $sMid) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
youtuber Posted March 4, 2018 Author Posted March 4, 2018 (edited) @Bilgus unfortunately some of your regex pattern unsuccessful domains are bypassing! $url = "https://autoitscript.com/test1/test2/" & @CRLF & _ "http://aut-oit.com/test5/test6/" & @CRLF & _ "https://www.autoit.com/test7/test8/" & @CRLF & _ "http://blog.autoitscript.com/test9/test10/" $aReg1 = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3) $aReg2 = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3) $aReg3 = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3) ConsoleWrite("----" & "$aReg1 " & "----" & @CRLF) For $i = 0 To UBound($aReg1) - 1 ConsoleWrite($aReg1[$i] & @CRLF) Next ConsoleWrite(@CRLF) ConsoleWrite("----" & "$aReg2 " & "----" & @CRLF) For $i = 0 To UBound($aReg2) - 1 ConsoleWrite($aReg2[$i] & @CRLF) Next ConsoleWrite(@CRLF) ConsoleWrite("----" & "$aReg3 " & "----" & @CRLF) For $i = 0 To UBound($aReg3) - 1 ConsoleWrite($aReg3[$i] & @CRLF) Next ConsoleWrite(@CRLF) console ----$aReg1 ---- autoitscript.com/test1/ www.autoit.com/ blog.autoitscript.com/ ----$aReg2 ---- autoitscript.com/test1 aut-oit.com/test5 www.autoit.com/test7;---->here www. I do not want the part blog.autoitscript.com/test9 ----$aReg3 ---- autoitscript.com/test1/ oit.com/test5/ www.autoit.com/ blog.autoitscript.com/ Looks better $aReg4 = StringRegExp($url, '(?i).*?..?tps?://?(?:www\.)?([^/]*.?[^/]*)', 3) Edited March 4, 2018 by youtuber
Bilgus Posted March 4, 2018 Posted March 4, 2018 I updated the post you might want to recheck it Sorry I realized when I was writing a follow up post that it wasn't right youtuber 1
youtuber Posted March 4, 2018 Author Posted March 4, 2018 (edited) @Bilgus @mikell or Is it better? I do not know (?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*) Edited March 4, 2018 by youtuber
Bilgus Posted March 5, 2018 Posted March 5, 2018 (edited) $url = "https://autoitscript.com/test1/test2/" & @CRLF & _ "http://aut-oit.com/test5/test6/" & @CRLF & _ "https://www.autoit.com/test7/test8/" & @CRLF & _ "https://google.com/test1/test2/" & @CRLF & _ "http://qwerty.org/test5/test?/whatabout_a_page.html" & @CRLF & _ "https://www.tungsten.com/test7/test8/metalisbetter.htm" & @CRLF & _ "https://falsedomains.com/test1/test2/" & @CRLF & _ "http://who%20is%20this/test5/test6/" & @CRLF & _;NO "https://www.autochecker.com/test7/test8/" & @CRLF & _ "https://autobanks.com" & @CRLF & _;NO "http://aut_ofitbaddomain" & @CRLF & _;NO "https://www.autoit.biz/test7/test8/" & @CRLF $aReg = StringRegExp($url, '(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/', 3) For $i = 0 to UBound($aReg) - 1 ConsoleWrite($aReg[$i] & @CRLF) Next ConsoleWrite(@CRLF) $bReg = StringRegExp($url, '(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)',3) For $i = 0 to UBound($bReg) - 1 ConsoleWrite($bReg[$i] & @CRLF) Next Its easy enough to test ;'(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/' autoitscript.com/test1/ aut-oit.com/test5/ autoit.com/test7/ google.com/test1/ qwerty.org/test5/ tungsten.com/test7/ falsedomains.com/test1/ autochecker.com/test7/ autoit.biz/test7/ ;'(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)' autoitscript.com/test1 aut-oit.com/test5 autoit.com/test7 google.com/test1 qwerty.org/test5 tungsten.com/test7 falsedomains.com/test1 who%20is%20this/test5 autochecker.com/test7 autobanks.com http:/ autoit.biz/test7 as you can see the first one is still more robust although I still had to change it a bit to match your data URI have a list of valid characters but I don't know that I'd want to build a regex for them lol Edited March 5, 2018 by Bilgus youtuber 1
youtuber Posted March 5, 2018 Author Posted March 5, 2018 I do not want the right / slash at the end (?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now