Jump to content
Sign in to follow this  

StringRegEx get all URLs domain's names

Recommended Posts

So as the tittle says, I am trying to find a pattern to get all the urls from different string.

I want to capture from protocol, which is http in most of the cases, to the suffix of the domain name.

The problem is there are many different types of suffix of domain names in urls which makes it a little bit tricky.

In the beginning I made something like this

#include <Array.au3>
Local $sUrl = "http://www.google.com/(random expanded link)"
Local $aArray = StringRegExp($sUrl, '(?i)http://(.*?).com', 2)
If Not @error Then

However if the suffix is something other than .com, for example .net, .org, this pattern will fail.
Then I thought of creating an array with the most popular suffix and loop it and get all the domain names but this would take a lot of coding which could be avoided if I had better regex skills.

Finally I came up with this pattern but I am not 100% sure that it will capture everything and sometimes I get some weird results.

$pattern = "(?<Protocol>\w+):\/\/(?<Domain>[\w.]+\/?)\S*"

So anyone has any ideas better ideas?


Edit: hmmm I tried this simple pattern and seems to work pretty well.

Local $aArray = StringRegExp($sUrl, '(?i)http://(.*?)/', 2)
Either I'm very tired either it was very simple. Any opinions?
  Edited by AutID

Share this post

Link to post
Share on other sites

for more complex urls, you can use something like this regex :

Local $aUrl[9] = ["http://server:12345/path/blabla", _ 
                  "http://server.com:1234/path?query_string#fragment_id", _
                  "ftp://user:password@server:1234/path", _
                  "ftp://user@server:1234/path", _
                  "http://www.server.com", _
                  "www.server.com/path", _
                  "server.com", _
                  "http://user@server.com:1234/path?query_string#fragment_id", _
                  "user@server.com:1234" ]
Local $sPattern = "^(?i)(?:(?:[a-z]+):\/\/)?" & _ ; Protocol
                  "(?:(?:(?:[^@:]+))" & _         ; Username
                  "(?::(?:[^@]+))?@)?" & _        ; Password
                  "([^\/:]+)" & _                 ; Host
                  "(?::(?:\d+))?" & _             ; Port
                  "(?:\/(?:[^?]+)?)?" & _         ; Path
                  "(?:\?\N+)?"                    ; Query

For $i = 0 To UBound($aUrl) - 1
    $aHost = StringRegExp($aUrl[$i], $sPattern, 1)
    ConsoleWrite($aHost[0] & @TAB & $aUrl[$i] & @CRLF)


Edited by jguinch

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Create New...