Jump to content

regex url next slash mark?


Recommended Posts

Much simpler than straining with a regex. Took me 2 minutes and I know it's accurate.

Local $url = 'https://autoit.com/test1/test2/'
Local $str = StringLeft($url, StringInStr($url, '/', 0, 4) - 1)
MsgBox(0, '', $str)

 

"The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward

Link to comment
Share on other sites

Took me 5 and its more flexible

$url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, ftp://autovt.com/test7/test8/ "

$Reg = StringRegExp($url, '(?i).*?(..?tp.?://[^/]*.?[^/]*)', 3)
; (?i) caseless matching
; .*? 0 or more characters but doesn't keep them
; ( start a capturing group
; . any character 'f or h'
; .? 0 or 1 char 't or `none`'
; tp literally tp
; .? 0 or 1 char 'http>S<'
; : literal colon
; // literal slashes
;[^/]* all characters till a slash
;.? or or 1 character
;[^/]* all characters till a slash (again)
;) end capturing group
ConsoleWrite($Reg[0] & @CRLF & $Reg[1] & @CRLF& $Reg[2] & @CRLF)

 

Link to comment
Share on other sites

2 hours ago, youtuber said:

is also not suitable for me because I will do url verification at the same time.

Are you doing an inetget on the return to verify the url or are you just validating that the first 8 characters are https://? either way it's plenty suitable within the existing parameters.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

2 hours ago, iamtheky said:

either way it's plenty suitable within the existing parameters

I totally agree. The code from ripdad in post #2 works the same way than my srer (find the 4th occurence of "/" and grab all chars on the left)
A less classy look maybe (though this could be discussed, for sure) but equal efficiency  :)

Link to comment
Share on other sites

OP said he wanted to validate the URL at the same time so how about we start getting crazy

$url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ "

$aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)(?&host)(?&path)(?&path)', 3)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
https://autoit.com/test1/test2/
http://autoet.com/test5/test6/
 ftp://autovt.com/test7/test8/
http://autoet.com/test9/test10/

 

Edited by Bilgus
Forgot the outputs
Link to comment
Share on other sites

@Bilgus Thank you for your help

@mikell and @Bilgus  I would like to ask you another question, what should be the pattern to cut the http:// or https:// and www. at the beginning?

https://autoit.com/test1/test2/
http://aut-oet.com/test5/test6/
https://www.autovt.com/test7/test8/
http://aut.oet.com/test9/test10/
(?:http[s]?:\/\/)?(?:www\.)?([^\/]+\/[^\/]*)

 

Edited by youtuber
Link to comment
Share on other sites

You have to turn them into non capturing groups

$url = "test.com/test0/ https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ "

;This one is wrong
;$aReg = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3)
$aReg = (?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)\K(?&host)(?&path)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

$bReg = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3)

For $i = 0 to UBound($bReg) - 1
    ConsoleWrite($bReg[$i] & @CRLF)
Next
autoit.com/test1/
autoet.com/test5/
blank.com/Nope/
autovt.com/test7/
autoet.com/test9/

autoit.com/test1
autoet.com/test5
blank.com, http:/
autovt.com/test7
autoet.com/test9

 

Edited by Bilgus
Fixed
Link to comment
Share on other sites

So for the top one first off you can see the validation is a bit more robust

I added \K to restart the match after matching Scheme

since as far as I know you can't use a named group as a capturing group

but we want it to be there for a valid match so we match it then reset the position for matching just after

 

Link to comment
Share on other sites

Really for what you want it could be shortened to 

$aReg = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3)

or 

$aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE)(?<path>[^/\s,]+/))..?tps?://\K\w+\.\w{3}/?(?&path)(?&path)', 3)

the second being just in case you decide you want more than a single portion of the path

Link to comment
Share on other sites

or you could just do this

;~ $str = "https://autoit.com/test1/test2/"
;~ $str = "http://aut-oet.com/test5/test6/"
$str = "https://www.autovt.com/test7/test8/"
;~ $str =  "http://aut.oet.com/test9/test10/"


$sMid = Stringmid(stringleft($str , StringInStr($str , "/"  , 0 , 4) - 1) , stringinstr($str , "aut"))

msgbox(0, '' , $sMid)

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

@Bilgus unfortunately some of your regex pattern unsuccessful domains are bypassing!

$url = "https://autoitscript.com/test1/test2/" & @CRLF & _
        "http://aut-oit.com/test5/test6/" & @CRLF & _
        "https://www.autoit.com/test7/test8/" & @CRLF & _
        "http://blog.autoitscript.com/test9/test10/"

$aReg1 = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3)
$aReg2 = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3)
$aReg3 = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3)

ConsoleWrite("----" & "$aReg1 " & "----" & @CRLF)
For $i = 0 To UBound($aReg1) - 1
    ConsoleWrite($aReg1[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

ConsoleWrite("----" & "$aReg2 " & "----" & @CRLF)
For $i = 0 To UBound($aReg2) - 1
    ConsoleWrite($aReg2[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

ConsoleWrite("----" & "$aReg3 " & "----" & @CRLF)
For $i = 0 To UBound($aReg3) - 1

    ConsoleWrite($aReg3[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

console

----$aReg1 ----
autoitscript.com/test1/
www.autoit.com/
blog.autoitscript.com/

----$aReg2 ----
autoitscript.com/test1
aut-oit.com/test5
www.autoit.com/test7;---->here www. I do not want the part
blog.autoitscript.com/test9

----$aReg3 ----
autoitscript.com/test1/
oit.com/test5/
www.autoit.com/
blog.autoitscript.com/

Looks better :)

$aReg4 = StringRegExp($url, '(?i).*?..?tps?://?(?:www\.)?([^/]*.?[^/]*)', 3)

 

Edited by youtuber
Link to comment
Share on other sites

$url = "https://autoitscript.com/test1/test2/" & @CRLF & _
        "http://aut-oit.com/test5/test6/" & @CRLF & _
        "https://www.autoit.com/test7/test8/" & @CRLF & _
        "https://google.com/test1/test2/" & @CRLF & _
        "http://qwerty.org/test5/test?/whatabout_a_page.html" & @CRLF & _
        "https://www.tungsten.com/test7/test8/metalisbetter.htm" & @CRLF & _
        "https://falsedomains.com/test1/test2/" & @CRLF & _
        "http://who%20is%20this/test5/test6/" & @CRLF & _;NO
        "https://www.autochecker.com/test7/test8/" & @CRLF & _
        "https://autobanks.com" & @CRLF & _;NO
        "http://aut_ofitbaddomain" & @CRLF & _;NO
        "https://www.autoit.biz/test7/test8/" & @CRLF

$aReg = StringRegExp($url, '(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/', 3)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

$bReg = StringRegExp($url, '(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)',3)

For $i = 0 to UBound($bReg) - 1
    ConsoleWrite($bReg[$i] & @CRLF)
Next

Its easy enough to test 

;'(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/'
autoitscript.com/test1/
aut-oit.com/test5/
autoit.com/test7/
google.com/test1/
qwerty.org/test5/
tungsten.com/test7/
falsedomains.com/test1/
autochecker.com/test7/
autoit.biz/test7/


;'(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)'

autoitscript.com/test1
aut-oit.com/test5
autoit.com/test7
google.com/test1
qwerty.org/test5
tungsten.com/test7
falsedomains.com/test1
who%20is%20this/test5
autochecker.com/test7
autobanks.com
http:/
autoit.biz/test7

as you can see the first one is still more robust although I still had to change it a bit to match your data URI have a list of valid characters but I don't know that I'd want to build a regex for them lol

Edited by Bilgus
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...