Jump to content
youtuber

regex url next slash mark?

Recommended Posts

youtuber

I would like to get only this part of this url https://autoit.com/test1 but all my tests are failing thanks for this . thanks

$url = "https://autoit.com/test1/test2/"
$Reg = StringRegExp($url, 'https?://*.*[^/]+', 3)

ConsoleWrite($Reg[0] & @CRLF)

 

Share this post


Link to post
Share on other sites
ripdad

Much simpler than straining with a regex. Took me 2 minutes and I know it's accurate.

Local $url = 'https://autoit.com/test1/test2/'
Local $str = StringLeft($url, StringInStr($url, '/', 0, 4) - 1)
MsgBox(0, '', $str)

 

  • Like 1

"The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward

Share this post


Link to post
Share on other sites
Bilgus

Took me 5 and its more flexible

$url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, ftp://autovt.com/test7/test8/ "

$Reg = StringRegExp($url, '(?i).*?(..?tp.?://[^/]*.?[^/]*)', 3)
; (?i) caseless matching
; .*? 0 or more characters but doesn't keep them
; ( start a capturing group
; . any character 'f or h'
; .? 0 or 1 char 't or `none`'
; tp literally tp
; .? 0 or 1 char 'http>S<'
; : literal colon
; // literal slashes
;[^/]* all characters till a slash
;.? or or 1 character
;[^/]* all characters till a slash (again)
;) end capturing group
ConsoleWrite($Reg[0] & @CRLF & $Reg[1] & @CRLF& $Reg[2] & @CRLF)

 

  • Like 1

Share this post


Link to post
Share on other sites
Bilgus

oh outputs:

https://autoit.com/test1
http://autoet.com/test5
 ftp://autovt.com/test7

 

  • Like 1

Share this post


Link to post
Share on other sites
youtuber

@ripdad is also not suitable for me because I will do url verification at the same time.

 @Bilgus your friend regex has worked well for me thank you.

(?i).*?(..?tp.?://[^/]*.?[^/]*)

maybe there is another alternative @mikell:)

 

Share this post


Link to post
Share on other sites
mikell

For the fun  :)

$sReg = StringRegExpReplace($url, '(.*?/){3}[^/]+\K.*', "")

But the usual way is more understandable/accurate/secure

$aReg = StringRegExp($url, 'https?://[^/]+/[^/]*', 1)

 

  • Like 1

Share this post


Link to post
Share on other sites
iamtheky
2 hours ago, youtuber said:

is also not suitable for me because I will do url verification at the same time.

Are you doing an inetget on the return to verify the url or are you just validating that the first 8 characters are https://? either way it's plenty suitable within the existing parameters.


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
mikell
2 hours ago, iamtheky said:

either way it's plenty suitable within the existing parameters

I totally agree. The code from ripdad in post #2 works the same way than my srer (find the 4th occurence of "/" and grab all chars on the left)
A less classy look maybe (though this could be discussed, for sure) but equal efficiency  :)

Share this post


Link to post
Share on other sites
Bilgus
Posted (edited)

OP said he wanted to validate the URL at the same time so how about we start getting crazy

$url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ "

$aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)(?&host)(?&path)(?&path)', 3)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
https://autoit.com/test1/test2/
http://autoet.com/test5/test6/
 ftp://autovt.com/test7/test8/
http://autoet.com/test9/test10/

 

Edited by Bilgus
Forgot the outputs
  • Like 1

Share this post


Link to post
Share on other sites
youtuber
Posted (edited)

@Bilgus Thank you for your help

@mikell and @Bilgus  I would like to ask you another question, what should be the pattern to cut the http:// or https:// and www. at the beginning?

https://autoit.com/test1/test2/
http://aut-oet.com/test5/test6/
https://www.autovt.com/test7/test8/
http://aut.oet.com/test9/test10/
(?:http[s]?:\/\/)?(?:www\.)?([^\/]+\/[^\/]*)

 

Edited by youtuber

Share this post


Link to post
Share on other sites
Bilgus
Posted (edited)

You have to turn them into non capturing groups

$url = "test.com/test0/ https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ "

;This one is wrong
;$aReg = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3)
$aReg = (?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)\K(?&host)(?&path)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

$bReg = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3)

For $i = 0 to UBound($bReg) - 1
    ConsoleWrite($bReg[$i] & @CRLF)
Next
autoit.com/test1/
autoet.com/test5/
blank.com/Nope/
autovt.com/test7/
autoet.com/test9/

autoit.com/test1
autoet.com/test5
blank.com, http:/
autovt.com/test7
autoet.com/test9

 

Edited by Bilgus
Fixed

Share this post


Link to post
Share on other sites
Bilgus

So for the top one first off you can see the validation is a bit more robust

I added \K to restart the match after matching Scheme

since as far as I know you can't use a named group as a capturing group

but we want it to be there for a valid match so we match it then reset the position for matching just after

 

Share this post


Link to post
Share on other sites
Bilgus

Really for what you want it could be shortened to 

$aReg = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3)

or 

$aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE)(?<path>[^/\s,]+/))..?tps?://\K\w+\.\w{3}/?(?&path)(?&path)', 3)

the second being just in case you decide you want more than a single portion of the path

Share this post


Link to post
Share on other sites
iamtheky

or you could just do this

;~ $str = "https://autoit.com/test1/test2/"
;~ $str = "http://aut-oet.com/test5/test6/"
$str = "https://www.autovt.com/test7/test8/"
;~ $str =  "http://aut.oet.com/test9/test10/"


$sMid = Stringmid(stringleft($str , StringInStr($str , "/"  , 0 , 4) - 1) , stringinstr($str , "aut"))

msgbox(0, '' , $sMid)

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
youtuber
Posted (edited)

@Bilgus unfortunately some of your regex pattern unsuccessful domains are bypassing!

$url = "https://autoitscript.com/test1/test2/" & @CRLF & _
        "http://aut-oit.com/test5/test6/" & @CRLF & _
        "https://www.autoit.com/test7/test8/" & @CRLF & _
        "http://blog.autoitscript.com/test9/test10/"

$aReg1 = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3)
$aReg2 = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3)
$aReg3 = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3)

ConsoleWrite("----" & "$aReg1 " & "----" & @CRLF)
For $i = 0 To UBound($aReg1) - 1
    ConsoleWrite($aReg1[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

ConsoleWrite("----" & "$aReg2 " & "----" & @CRLF)
For $i = 0 To UBound($aReg2) - 1
    ConsoleWrite($aReg2[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

ConsoleWrite("----" & "$aReg3 " & "----" & @CRLF)
For $i = 0 To UBound($aReg3) - 1

    ConsoleWrite($aReg3[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

console

----$aReg1 ----
autoitscript.com/test1/
www.autoit.com/
blog.autoitscript.com/

----$aReg2 ----
autoitscript.com/test1
aut-oit.com/test5
www.autoit.com/test7;---->here www. I do not want the part
blog.autoitscript.com/test9

----$aReg3 ----
autoitscript.com/test1/
oit.com/test5/
www.autoit.com/
blog.autoitscript.com/

Looks better :)

$aReg4 = StringRegExp($url, '(?i).*?..?tps?://?(?:www\.)?([^/]*.?[^/]*)', 3)

 

Edited by youtuber

Share this post


Link to post
Share on other sites
Bilgus

I updated the post you might want to recheck it Sorry I realized when I was writing a follow up post that it wasn't right

  • Like 1

Share this post


Link to post
Share on other sites
youtuber
Posted (edited)

@Bilgus @mikell or Is it better? I do not know :)

(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)

 

Edited by youtuber

Share this post


Link to post
Share on other sites
Bilgus
Posted (edited)
$url = "https://autoitscript.com/test1/test2/" & @CRLF & _
        "http://aut-oit.com/test5/test6/" & @CRLF & _
        "https://www.autoit.com/test7/test8/" & @CRLF & _
        "https://google.com/test1/test2/" & @CRLF & _
        "http://qwerty.org/test5/test?/whatabout_a_page.html" & @CRLF & _
        "https://www.tungsten.com/test7/test8/metalisbetter.htm" & @CRLF & _
        "https://falsedomains.com/test1/test2/" & @CRLF & _
        "http://who%20is%20this/test5/test6/" & @CRLF & _;NO
        "https://www.autochecker.com/test7/test8/" & @CRLF & _
        "https://autobanks.com" & @CRLF & _;NO
        "http://aut_ofitbaddomain" & @CRLF & _;NO
        "https://www.autoit.biz/test7/test8/" & @CRLF

$aReg = StringRegExp($url, '(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/', 3)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

$bReg = StringRegExp($url, '(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)',3)

For $i = 0 to UBound($bReg) - 1
    ConsoleWrite($bReg[$i] & @CRLF)
Next

Its easy enough to test 

;'(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/'
autoitscript.com/test1/
aut-oit.com/test5/
autoit.com/test7/
google.com/test1/
qwerty.org/test5/
tungsten.com/test7/
falsedomains.com/test1/
autochecker.com/test7/
autoit.biz/test7/


;'(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)'

autoitscript.com/test1
aut-oit.com/test5
autoit.com/test7
google.com/test1
qwerty.org/test5
tungsten.com/test7
falsedomains.com/test1
who%20is%20this/test5
autochecker.com/test7
autobanks.com
http:/
autoit.biz/test7

as you can see the first one is still more robust although I still had to change it a bit to match your data URI have a list of valid characters but I don't know that I'd want to build a regex for them lol

Edited by Bilgus
  • Like 1

Share this post


Link to post
Share on other sites
youtuber

I do not want the right / slash at the end

(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×