youtuber

replace url and split?

19 posts in this topic

For this operation
How do make the short way

$aUrl = "https://www.autoitscript.com.tr/"

$pattern1 = '(.*?)(http://|https://|\www.)'
$pattern = '(.com|\.net|\.org|\.info|\.biz|\.eu|\.fr|\.ch|\.kr|\.edu|\.xyz|\.com.tr)(.*)'
$ReplaceAndSplitUrl = StringRegExpReplace($aUrl,$pattern1, '$1')
$ReplaceAndSplitUrl = StringRegExpReplace($ReplaceAndSplitUrl,$pattern, '$1')

ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $ReplaceAndSplitUrl = ' & $ReplaceAndSplitUrl & @CRLF & '>Error code: ' & @error & @CRLF)

that's all I need! : autoitscript.com.tr

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Hi.

This should work (only tested at https://regex101.com/r/e0X9Si/2):

#include <Array.au3>
$aUrl = "https://www.autoitscript.com.tr/"
$pattern = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aArray = StringRegExp($aUrl, $pattern, 3)
_ArrayDisplay($aArray)

If your URL is more complex (special characters or so) then you have to tune the part between the last brackets.

Conrad 

Edited by Simpel
Link and added hint for complex URL
1 person likes this

SciTE = 3.6.2.0/full   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win7Pro SP1   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE   H:\...\AutoIt3   H:\...\AutoIt3\Include   H: = Network Drive

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Update (https://regex101.com/r/e0X9Si/5):

$pattern = "(?m)(?:https?:\/\/)?(?:www\.)?((?:[a-zA-Z\x{00a1}-\x{ffff}0-9.\-])+(?:\.[a-zA-Z]{2,63}))"

- there are more characters allowed in sublevel domain

- there are less characters in toplevel domains allowed

- toplevel domain following the last dot has to have 2-63 characters

Conrad

Edited by Simpel
Link to regex101.com
1 person likes this

SciTE = 3.6.2.0/full   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win7Pro SP1   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE   H:\...\AutoIt3   H:\...\AutoIt3\Include   H: = Network Drive

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Share this post


Link to post
Share on other sites

I do not want the right side of the url how do I do it for it

Local $Count = 1
$aFindofrowsReadUrl = "https://www.autoitscript.com"
$PatternURL = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aUrlArrayFind = StringRegExp($aFindofrowsReadUrl, $PatternURL, 1)

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$pattern = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aArray = StringRegExp($aUrl, $pattern, 3)

For $i = 0 To UBound($aArray) - 1
    If StringInStr($aArray[$i], $aUrlArrayFind[0]) Then
        ConsoleWrite($Count & "." & $aArray[$i] & @CRLF)
 $Count = $Count +1
EndIf
Next

 

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

This will match for your first and second example:

(?:http?:\/\/)?(?:www\.)?([\w+\.]+\.[\w]{2,63})

Your examples 3-5 are not allowed as urls (as far as I know).  Conrad

Edited by Simpel
typo

SciTE = 3.6.2.0/full   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win7Pro SP1   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE   H:\...\AutoIt3   H:\...\AutoIt3\Include   H: = Network Drive

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Share this post


Link to post
Share on other sites

The result is not correct

AMTIyO-ZQ0iBANtU1sjD_A.png

Share this post


Link to post
Share on other sites

I meant the example you  marked as #1 and the one above in variable $aFindofrowsReadUrl. All the other urls are invalid. But you find valid parts inside the last 3 invalid urls. If you want to avoid this it is a bit more complicated.

Here are some of the possibilities matching an url: https://mathiasbynens.be/demo/url-regex

When you decided which one to take then you have to look how to strip 'https://www.'


SciTE = 3.6.2.0/full   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win7Pro SP1   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE   H:\...\AutoIt3   H:\...\AutoIt3\Include   H: = Network Drive

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Share this post


Link to post
Share on other sites

It's not a pattern I want, no problem pattern
your pattern works well3p8n02.gif

What I need is the first url on the left

 

Share this post


Link to post
Share on other sites

i think the same

If StringLeft($aArray[$i], StringInStr($aUrlArrayFind[0]) Then

Share this post


Link to post
Share on other sites

It's not very clear. Is this you want ?  (first urls on the left)

#Include <Array.au3>

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl, 'https?://[^/]+', 3)
_ArrayDisplay($aArray)

 

1 person likes this

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

no no my just need me left --> autoitscript.com

;I'm not sure exactly like these codes

#Include <Array.au3>
Global $Count = 1
Global $aUrl ,$aUrlArrayFind
$aFindofrowsReadUrl = "https://www.autoitscript.com"
$PatternURL = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aUrlArrayFind = StringRegExp($aFindofrowsReadUrl, 'https?://[^/]+', 3)

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl, 'https?://[^/]+', 3)


$aUniques=_ArrayUnique($aUrl, 0, 0, 1)
    For $i = 0 To UBound($aUniques) - 1
        $aPos = _ArrayFindAll($aUrl, $aUniques[$i], 0, 0, 1)
        If UBound($aPos) = $aUrlArrayFind Then
 ConsoleWrite($Count & "." & $aUniques[$i] & @CRLF)
  $Count = $Count +1
        EndIf
    Next

 

Edited by youtuber

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

so you just want the ones that could be actual autoitscript.com urls?

#Include <Array.au3>


$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl , "https://www\.(autoitscript.com/.*)" , 3)


_ArrayDisplay($aArray)

I left the remainder of the URL, because unless you are just counting them you wouldnt be able to differentiate which line they came from

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

;~ $aFindofrowsReadUrl = "https://www.autoitscript.com"

Unfortunately $aFindofrowsReadUrl is a user input

$aFindofrowsReadUrl = GUICtrlRead($InputUrL)

Edited by youtuber

Share this post


Link to post
Share on other sites
On 11.09.2017 at 11:02 PM, mikell said:

It's not very clear. Is this you want ?  (first urls on the left)

#Include <Array.au3>

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl, 'https?://[^/]+', 3)
_ArrayDisplay($aArray)

 

@mikell If so, what about the pattern?

without http:// and https:// www

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript3.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscript.com" & @CRLF & _

 

Share this post


Link to post
Share on other sites

Hi.

Hope this will fit:

(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)

Conrad 

1 person likes this

SciTE = 3.6.2.0/full   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win7Pro SP1   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE   H:\...\AutoIt3   H:\...\AutoIt3\Include   H: = Network Drive

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Share this post


Link to post
Share on other sites

@Simpel Thank you very much 


If the site exists in the queue, why is it giving a message box?

#include <Array.au3>
#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>

$Form1 = GUICreate("Form1", 338, 438)
$aRankingEdit = GUICtrlCreateEdit("", 32, 32, 225, 193)
GUICtrlSetData(-1, "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscriptasdf.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript3.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscriptasdf.com")
$Button1 = GUICtrlCreateButton("Button1", 96, 264, 75, 25)

GUISetState(@SW_SHOW)
While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit
        Case $Button1
$aFindofrowsReadUrl = "https://www.autoitscript.com/test213213"
Local $aUrlArrayFind2 = StringRegExp($aFindofrowsReadUrl, '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)',1)
Local $sitelisturl = StringSplit(StringStripCR(GUICtrlRead($aRankingEdit)), @LF)

 For $i = 1 To UBound($sitelisturl) - 1
$sitenamestr = StringRegExp($sitelisturl[$i], '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)', 3)
$sitenames = $sitenamestr[0]
If StringInStr($sitenames, $aUrlArrayFind2[0]) Then
ConsoleWrite($sitenames & " This site " & $i & "." & "ranking" & @CRLF)

EndIf
 Next
 If Not StringInStr($sitenames, $aUrlArrayFind2[0]) Then
    MsgBox(0,"", $aUrlArrayFind2[0] & " Not in rankings")
 EndIf
    EndSwitch
WEnd

 

Share this post


Link to post
Share on other sites

Added an additional bool to make it work:

#include <Array.au3>
#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>

$Form1 = GUICreate("Form1", 338, 438)
$aRankingEdit = GUICtrlCreateEdit("", 32, 32, 225, 193)
GUICtrlSetData(-1, "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscriptasdf.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript3.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscriptasdf.com")
$Button1 = GUICtrlCreateButton("Button1", 96, 264, 75, 25)

GUISetState(@SW_SHOW)
While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit
        Case $Button1
            $bFound = False
            $aFindofrowsReadUrl = "https://www.autoitscript.com/test213213"
            Local $aUrlArrayFind2 = StringRegExp($aFindofrowsReadUrl, '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)',1)
            Local $sitelisturl = StringSplit(StringStripCR(GUICtrlRead($aRankingEdit)), @LF)

            For $i = 1 To UBound($sitelisturl) - 1
                $sitenamestr = StringRegExp($sitelisturl[$i], '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)', 3)
                $sitenames = $sitenamestr[0]
                If StringInStr($sitenames, $aUrlArrayFind2[0]) Then
                    $bFound = True
                    ConsoleWrite($sitenames & " This site " & $i & "." & "ranking" & @CRLF)
                    ExitLoop
                EndIf
            Next
            If Not $bFound Then
                MsgBox(0,"", $aUrlArrayFind2[0] & " Not in rankings")
            EndIf
    EndSwitch
WEnd

 

1 person likes this

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

Thank you, but

this time does not count the order in the other rows :(

If this is the case

$aRankingEdit = GUICtrlCreateEdit("", 32, 32, 225, 193)
GUICtrlSetData(-1, "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscriptasdf.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscriptasdf.com")

 ConsoleWrite($sitenames & " This site " & $i & "." & "ranking" & @CRLF)

console

--> Press Ctrl+Alt+Break to Restart or Ctrl+Break to Stop
autoitscript.com This site 1.ranking
+>04:04:49 AutoIt3.exe ended.rc:0
+>04:04:49 AutoIt3Wrapper Finished.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now