Jump to content

Recommended Posts

Posted

Took me 5 and its more flexible

$url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, ftp://autovt.com/test7/test8/ "

$Reg = StringRegExp($url, '(?i).*?(..?tp.?://[^/]*.?[^/]*)', 3)
; (?i) caseless matching
; .*? 0 or more characters but doesn't keep them
; ( start a capturing group
; . any character 'f or h'
; .? 0 or 1 char 't or `none`'
; tp literally tp
; .? 0 or 1 char 'http>S<'
; : literal colon
; // literal slashes
;[^/]* all characters till a slash
;.? or or 1 character
;[^/]* all characters till a slash (again)
;) end capturing group
ConsoleWrite($Reg[0] & @CRLF & $Reg[1] & @CRLF& $Reg[2] & @CRLF)

 

Posted
2 hours ago, youtuber said:

is also not suitable for me because I will do url verification at the same time.

Are you doing an inetget on the return to verify the url or are you just validating that the first 8 characters are https://? either way it's plenty suitable within the existing parameters.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Posted
2 hours ago, iamtheky said:

either way it's plenty suitable within the existing parameters

I totally agree. The code from ripdad in post #2 works the same way than my srer (find the 4th occurence of "/" and grab all chars on the left)
A less classy look maybe (though this could be discussed, for sure) but equal efficiency  :)

Posted (edited)

OP said he wanted to validate the URL at the same time so how about we start getting crazy

$url = "https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ "

$aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)(?&host)(?&path)(?&path)', 3)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
https://autoit.com/test1/test2/
http://autoet.com/test5/test6/
 ftp://autovt.com/test7/test8/
http://autoet.com/test9/test10/

 

Edited by Bilgus
Forgot the outputs
Posted (edited)

@Bilgus Thank you for your help

@mikell and @Bilgus  I would like to ask you another question, what should be the pattern to cut the http:// or https:// and www. at the beginning?

https://autoit.com/test1/test2/
http://aut-oet.com/test5/test6/
https://www.autovt.com/test7/test8/
http://aut.oet.com/test9/test10/
(?:http[s]?:\/\/)?(?:www\.)?([^\/]+\/[^\/]*)

 

Edited by youtuber
Posted (edited)

You have to turn them into non capturing groups

$url = "test.com/test0/ https://autoit.com/test1/test2/test3/test4, http://autoet.com/test5/test6/, http://blank.com, http://blank.com/Nope/ ftp://autovt.com/test7/test8/ http://autoet.com/test9/test10/ "

;This one is wrong
;$aReg = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3)
$aReg = (?i)(?x) (?(DEFINE) (?<scheme>..?tps?://)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?&scheme)\K(?&host)(?&path)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

$bReg = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3)

For $i = 0 to UBound($bReg) - 1
    ConsoleWrite($bReg[$i] & @CRLF)
Next
autoit.com/test1/
autoet.com/test5/
blank.com/Nope/
autovt.com/test7/
autoet.com/test9/

autoit.com/test1
autoet.com/test5
blank.com, http:/
autovt.com/test7
autoet.com/test9

 

Edited by Bilgus
Fixed
Posted

So for the top one first off you can see the validation is a bit more robust

I added \K to restart the match after matching Scheme

since as far as I know you can't use a named group as a capturing group

but we want it to be there for a valid match so we match it then reset the position for matching just after

 

Posted

Really for what you want it could be shortened to 

$aReg = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3)

or 

$aReg = StringRegExp($url, '(?i)(?x) (?(DEFINE)(?<path>[^/\s,]+/))..?tps?://\K\w+\.\w{3}/?(?&path)(?&path)', 3)

the second being just in case you decide you want more than a single portion of the path

Posted

or you could just do this

;~ $str = "https://autoit.com/test1/test2/"
;~ $str = "http://aut-oet.com/test5/test6/"
$str = "https://www.autovt.com/test7/test8/"
;~ $str =  "http://aut.oet.com/test9/test10/"


$sMid = Stringmid(stringleft($str , StringInStr($str , "/"  , 0 , 4) - 1) , stringinstr($str , "aut"))

msgbox(0, '' , $sMid)

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Posted (edited)

@Bilgus unfortunately some of your regex pattern unsuccessful domains are bypassing!

$url = "https://autoitscript.com/test1/test2/" & @CRLF & _
        "http://aut-oit.com/test5/test6/" & @CRLF & _
        "https://www.autoit.com/test7/test8/" & @CRLF & _
        "http://blog.autoitscript.com/test9/test10/"

$aReg1 = StringRegExp($url, '(?i)..?tps?://\K\w+\.\w{3}/?[^/\s,]+/', 3)
$aReg2 = StringRegExp($url, '(?i).*?..?tps?://([^/]*.?[^/]*)', 3)
$aReg3 = StringRegExp($url, '(?i)(?(DEFINE)(?<host>\w+\.\w{3}/?)(?<path>[^/\s,]+/))(?!..?tps?://)(?&host)(?&path)', 3)

ConsoleWrite("----" & "$aReg1 " & "----" & @CRLF)
For $i = 0 To UBound($aReg1) - 1
    ConsoleWrite($aReg1[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

ConsoleWrite("----" & "$aReg2 " & "----" & @CRLF)
For $i = 0 To UBound($aReg2) - 1
    ConsoleWrite($aReg2[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

ConsoleWrite("----" & "$aReg3 " & "----" & @CRLF)
For $i = 0 To UBound($aReg3) - 1

    ConsoleWrite($aReg3[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

console

----$aReg1 ----
autoitscript.com/test1/
www.autoit.com/
blog.autoitscript.com/

----$aReg2 ----
autoitscript.com/test1
aut-oit.com/test5
www.autoit.com/test7;---->here www. I do not want the part
blog.autoitscript.com/test9

----$aReg3 ----
autoitscript.com/test1/
oit.com/test5/
www.autoit.com/
blog.autoitscript.com/

Looks better :)

$aReg4 = StringRegExp($url, '(?i).*?..?tps?://?(?:www\.)?([^/]*.?[^/]*)', 3)

 

Edited by youtuber
Posted (edited)
$url = "https://autoitscript.com/test1/test2/" & @CRLF & _
        "http://aut-oit.com/test5/test6/" & @CRLF & _
        "https://www.autoit.com/test7/test8/" & @CRLF & _
        "https://google.com/test1/test2/" & @CRLF & _
        "http://qwerty.org/test5/test?/whatabout_a_page.html" & @CRLF & _
        "https://www.tungsten.com/test7/test8/metalisbetter.htm" & @CRLF & _
        "https://falsedomains.com/test1/test2/" & @CRLF & _
        "http://who%20is%20this/test5/test6/" & @CRLF & _;NO
        "https://www.autochecker.com/test7/test8/" & @CRLF & _
        "https://autobanks.com" & @CRLF & _;NO
        "http://aut_ofitbaddomain" & @CRLF & _;NO
        "https://www.autoit.biz/test7/test8/" & @CRLF

$aReg = StringRegExp($url, '(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/', 3)

For $i = 0 to UBound($aReg) - 1
    ConsoleWrite($aReg[$i] & @CRLF)
Next
ConsoleWrite(@CRLF)

$bReg = StringRegExp($url, '(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)',3)

For $i = 0 to UBound($bReg) - 1
    ConsoleWrite($bReg[$i] & @CRLF)
Next

Its easy enough to test 

;'(?i)https?://(?:www\.)?\K\S+\.\w{3}/?[^/\s,]+/'
autoitscript.com/test1/
aut-oit.com/test5/
autoit.com/test7/
google.com/test1/
qwerty.org/test5/
tungsten.com/test7/
falsedomains.com/test1/
autochecker.com/test7/
autoit.biz/test7/


;'(?i)(?:https?://)(?:www\.)?([^/]*.?[^/]*)'

autoitscript.com/test1
aut-oit.com/test5
autoit.com/test7
google.com/test1
qwerty.org/test5
tungsten.com/test7
falsedomains.com/test1
who%20is%20this/test5
autochecker.com/test7
autobanks.com
http:/
autoit.biz/test7

as you can see the first one is still more robust although I still had to change it a bit to match your data URI have a list of valid characters but I don't know that I'd want to build a regex for them lol

Edited by Bilgus

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...