Jump to content

Recommended Posts

Posted (edited)
11 minutes ago, Jos said:

Try this regex line:

Local $aResult = StringRegExp($sData, "https?:\/\/(?:www.)?(.*)\.\w*[\/]*", $STR_REGEXPARRAYGLOBALMATCH)

Jos

yes i want this,This works great, thanks :huggles:

Edit:but something like this if the extension changes :D

https://www.autoit.script.com.us

 

Edited by youtuber
Posted

essentially a string between op, but i have to drop a non-regex way :)

#include <Array.au3>
#include <Constants.au3>

example()

Func example()
    Local $aData = [ _
                    "http://autoit.script.com/blabla1/", _
                    "http://autoit-script.com/blabla2/blabla/", _
                    "http://autoitscript.com/bla-bla-bla3/", _
                    "http://autoit%20script.com", _
                    "http://autoit_script.com/", _
                    "https://www.autoit.script.com.us" _
                   ]

   for $each in $aData
      msgbox(0, '', stringtrimright(stringsplit($each , "/" , 2)[2], stringlen(stringsplit($each , "/" , 2)[2]) - StringInStr($each , "." , 0 , -1)))
   next


EndFunc

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Posted (edited)

I'm no longer sure what you want.  Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain?  Below has both:   :)

#include <Array.au3>
#include <Constants.au3>

example()

Func example()
    Local $aResult
    Local $aData = [ _
                    "http://host_no_tld/blah", _
                    "http://autoit.script.com/blabla1/", _
                    "http://www.autoit-script.com/blabla2/blabla/", _
                    "http://autoitscript.com/bla-bla-bla3/", _
                    "http://autoit%20script.com", _
                    "http://autoit%20script.scripts.com", _
                    "http://autoit_script.com/", _
                    "https://www.autoit.script.com.us" _
                   ]
    Local $sData = _ArrayToString($aData, @CRLF)

    $aResult = StringRegExp($sData, "https?://(?:www\.)?([-.\w~%]+)(?:\.[-.\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ; host name excluding top level domain
    If IsArray($aResult) Then _ArrayDisplay($aResult, "host name excluding top level domain")

    $aResult = StringRegExp($sData, "https?://(?:www\.)?([-\w~%]+)", $STR_REGEXPARRAYGLOBALMATCH) ;Only 1st word
    If IsArray($aResult) Then _ArrayDisplay($aResult, "Only 1st word")
EndFunc

 

Edited by TheXman
Corrected first regex
Posted
9 hours ago, youtuber said:

I wonder why it doesn't match here

It's because using these test strings the expression is not correct. The trailing slash must be optional : \/\/(.+)\.\w+\/?

Posted
10 hours ago, TheXman said:

I'm no longer sure what you want.  Is it what your title says, which is the first word after the "//" or is it the host name excluding the top level domain?  Below has both:   :)

 

 

I changed the subject title

the regex pattern I want is like this

https?:\/\/(?:www.)?(.+)\.\w+\/?
and
https?:\/\/(?:www.)?(.*)\.\w*[\/]*


If you change the url structure like

https://www.autoit.script.com.us

you would like to output it here

autoit.script

  • Developers
Posted (edited)
4 minutes ago, youtuber said:

If you change the url structure like


https://www.autoit.script.com.us

you would like to output it here

autoit.script

So you are expecting miracles? :)

The previous posted RegEx simply strips the last ".xxx" and when there are suffixes with a dot inside you will have to hardcode them. One option could be to create an Array with all possible hardcoded domain suffixes and use that to strip the end of the domainname.

Jos

Edited by Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Posted
3 hours ago, youtuber said:

If you change the url structure like


https://www.autoit.script.com.us

you would like to output it here

autoit.script

 

This changes the rules completely, both of the regex and of subdomaining in general. If you always want it to end at 'script' I would specify that in the expression.  It's not a cool regex, but seems to meet all criteria.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

  • Developers
Posted

A complete public suffix domain list can be found here: https://publicsuffix.org/list/public_suffix_list.dat

So you can imagine the "can of worms" you are dabbling in. :) 

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...