Jump to content

replace url and split?


youtuber
 Share

Recommended Posts

Hi.

This should work (only tested at https://regex101.com/r/e0X9Si/2):

#include <Array.au3>
$aUrl = "https://www.autoitscript.com.tr/"
$pattern = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aArray = StringRegExp($aUrl, $pattern, 3)
_ArrayDisplay($aArray)

If your URL is more complex (special characters or so) then you have to tune the part between the last brackets.

Conrad 

Edited by Simpel
Link and added hint for complex URL
SciTE4AutoIt = 3.7.3.0   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win_10   Build = 19044   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE     H:\...\AutoIt3      H:\...\AutoIt3\Include     (H:\ = Network Drive)

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Link to comment
Share on other sites

Update (https://regex101.com/r/e0X9Si/5):

$pattern = "(?m)(?:https?:\/\/)?(?:www\.)?((?:[a-zA-Z\x{00a1}-\x{ffff}0-9.\-])+(?:\.[a-zA-Z]{2,63}))"

- there are more characters allowed in sublevel domain

- there are less characters in toplevel domains allowed

- toplevel domain following the last dot has to have 2-63 characters

Conrad

Edited by Simpel
Link to regex101.com
SciTE4AutoIt = 3.7.3.0   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win_10   Build = 19044   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE     H:\...\AutoIt3      H:\...\AutoIt3\Include     (H:\ = Network Drive)

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Link to comment
Share on other sites

I do not want the right side of the url how do I do it for it

Local $Count = 1
$aFindofrowsReadUrl = "https://www.autoitscript.com"
$PatternURL = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aUrlArrayFind = StringRegExp($aFindofrowsReadUrl, $PatternURL, 1)

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$pattern = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aArray = StringRegExp($aUrl, $pattern, 3)

For $i = 0 To UBound($aArray) - 1
    If StringInStr($aArray[$i], $aUrlArrayFind[0]) Then
        ConsoleWrite($Count & "." & $aArray[$i] & @CRLF)
 $Count = $Count +1
EndIf
Next

 

Link to comment
Share on other sites

This will match for your first and second example:

(?:http?:\/\/)?(?:www\.)?([\w+\.]+\.[\w]{2,63})

Your examples 3-5 are not allowed as urls (as far as I know).  Conrad

Edited by Simpel
typo
SciTE4AutoIt = 3.7.3.0   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win_10   Build = 19044   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE     H:\...\AutoIt3      H:\...\AutoIt3\Include     (H:\ = Network Drive)

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Link to comment
Share on other sites

I meant the example you  marked as #1 and the one above in variable $aFindofrowsReadUrl. All the other urls are invalid. But you find valid parts inside the last 3 invalid urls. If you want to avoid this it is a bit more complicated.

Here are some of the possibilities matching an url: https://mathiasbynens.be/demo/url-regex

When you decided which one to take then you have to look how to strip 'https://www.'

SciTE4AutoIt = 3.7.3.0   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win_10   Build = 19044   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE     H:\...\AutoIt3      H:\...\AutoIt3\Include     (H:\ = Network Drive)

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Link to comment
Share on other sites

It's not very clear. Is this you want ?  (first urls on the left)

#Include <Array.au3>

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl, 'https?://[^/]+', 3)
_ArrayDisplay($aArray)

 

Link to comment
Share on other sites

no no my just need me left --> autoitscript.com

;I'm not sure exactly like these codes

#Include <Array.au3>
Global $Count = 1
Global $aUrl ,$aUrlArrayFind
$aFindofrowsReadUrl = "https://www.autoitscript.com"
$PatternURL = '(?:http[s]?:\/\/)?(?:www\.)?([\w+\.]+)'
$aUrlArrayFind = StringRegExp($aFindofrowsReadUrl, 'https?://[^/]+', 3)

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl, 'https?://[^/]+', 3)


$aUniques=_ArrayUnique($aUrl, 0, 0, 1)
    For $i = 0 To UBound($aUniques) - 1
        $aPos = _ArrayFindAll($aUrl, $aUniques[$i], 0, 0, 1)
        If UBound($aPos) = $aUrlArrayFind Then
 ConsoleWrite($Count & "." & $aUniques[$i] & @CRLF)
  $Count = $Count +1
        EndIf
    Next

 

Edited by youtuber
Link to comment
Share on other sites

so you just want the ones that could be actual autoitscript.com urls?

#Include <Array.au3>


$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl , "https://www\.(autoitscript.com/.*)" , 3)


_ArrayDisplay($aArray)

I left the remainder of the URL, because unless you are just counting them you wouldnt be able to differentiate which line they came from

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

On 11.09.2017 at 11:02 PM, mikell said:

It's not very clear. Is this you want ?  (first urls on the left)

#Include <Array.au3>

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com"

$aArray = StringRegExp($aUrl, 'https?://[^/]+', 3)
_ArrayDisplay($aArray)

 

@mikell If so, what about the pattern?

without http:// and https:// www

$aUrl = "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscript.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript3.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscript.com" & @CRLF & _

 

Link to comment
Share on other sites

Hi.

Hope this will fit:

(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)

Conrad 

SciTE4AutoIt = 3.7.3.0   AutoIt = 3.3.14.2   AutoItX64 = 0   OS = Win_10   Build = 19044   OSArch = X64   Language = 0407/german
H:\...\AutoIt3\SciTE     H:\...\AutoIt3      H:\...\AutoIt3\Include     (H:\ = Network Drive)

   88x31.png  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind.

Link to comment
Share on other sites

@Simpel Thank you very much 


If the site exists in the queue, why is it giving a message box?

#include <Array.au3>
#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>

$Form1 = GUICreate("Form1", 338, 438)
$aRankingEdit = GUICtrlCreateEdit("", 32, 32, 225, 193)
GUICtrlSetData(-1, "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscriptasdf.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript3.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscriptasdf.com")
$Button1 = GUICtrlCreateButton("Button1", 96, 264, 75, 25)

GUISetState(@SW_SHOW)
While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit
        Case $Button1
$aFindofrowsReadUrl = "https://www.autoitscript.com/test213213"
Local $aUrlArrayFind2 = StringRegExp($aFindofrowsReadUrl, '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)',1)
Local $sitelisturl = StringSplit(StringStripCR(GUICtrlRead($aRankingEdit)), @LF)

 For $i = 1 To UBound($sitelisturl) - 1
$sitenamestr = StringRegExp($sitelisturl[$i], '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)', 3)
$sitenames = $sitenamestr[0]
If StringInStr($sitenames, $aUrlArrayFind2[0]) Then
ConsoleWrite($sitenames & " This site " & $i & "." & "ranking" & @CRLF)

EndIf
 Next
 If Not StringInStr($sitenames, $aUrlArrayFind2[0]) Then
    MsgBox(0,"", $aUrlArrayFind2[0] & " Not in rankings")
 EndIf
    EndSwitch
WEnd

 

Link to comment
Share on other sites

Added an additional bool to make it work:

#include <Array.au3>
#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>

$Form1 = GUICreate("Form1", 338, 438)
$aRankingEdit = GUICtrlCreateEdit("", 32, 32, 225, 193)
GUICtrlSetData(-1, "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscriptasdf.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript4.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript3.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscriptasdf.com")
$Button1 = GUICtrlCreateButton("Button1", 96, 264, 75, 25)

GUISetState(@SW_SHOW)
While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit
        Case $Button1
            $bFound = False
            $aFindofrowsReadUrl = "https://www.autoitscript.com/test213213"
            Local $aUrlArrayFind2 = StringRegExp($aFindofrowsReadUrl, '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)',1)
            Local $sitelisturl = StringSplit(StringStripCR(GUICtrlRead($aRankingEdit)), @LF)

            For $i = 1 To UBound($sitelisturl) - 1
                $sitenamestr = StringRegExp($sitelisturl[$i], '(?i)(?m)^(?:https?:\/\/)?(?:www\.)?(\w[^\/]+)', 3)
                $sitenames = $sitenamestr[0]
                If StringInStr($sitenames, $aUrlArrayFind2[0]) Then
                    $bFound = True
                    ConsoleWrite($sitenames & " This site " & $i & "." & "ranking" & @CRLF)
                    ExitLoop
                EndIf
            Next
            If Not $bFound Then
                MsgBox(0,"", $aUrlArrayFind2[0] & " Not in rankings")
            EndIf
    EndSwitch
WEnd

 

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

Thank you, but

this time does not count the order in the other rows :(

If this is the case

$aRankingEdit = GUICtrlCreateEdit("", 32, 32, 225, 193)
GUICtrlSetData(-1, "https://www.autoitscript.com/testurlsimilar" & @CRLF & _
"https://www.autoitscriptasdf.com/autoitscript2.com/autoitscriptasdf.com" & @CRLF & _
"https://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"https://www.autoitscript.com/autoitscript5.com/autoitscript.com" & @CRLF & _
"www.autoitscript.com/asdf" & @CRLF & _
"http://www.autoitscript3.com/autoitscript.com" & @CRLF & _
"autoitscript3.com/asd.com" & @CRLF & _
"autoitscriptasdf.com")

 ConsoleWrite($sitenames & " This site " & $i & "." & "ranking" & @CRLF)

console

--> Press Ctrl+Alt+Break to Restart or Ctrl+Break to Stop
autoitscript.com This site 1.ranking
+>04:04:49 AutoIt3.exe ended.rc:0
+>04:04:49 AutoIt3Wrapper Finished.

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...