lordsocke

Count links in a txt file

23 posts in this topic

Hi guys is there a function to count the number of links in a txt file? Or maybe to count the number of "https://" which is in every link

Thanks already :D

Share this post


Link to post
Share on other sites



How large is the file?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Links not always have https they also can have http and someothers but there maybe would be a function but it depends on somehow how you the links are formatted within the txt file

Edited by RaiNote

  • C++/AutoIt/OpenGL Easy Coder
  • I will be Kind to you and try to help you
  • till what you want isn't against the Forum
  • Rules~

 

Share this post


Link to post
Share on other sites

The OP posted that all his links start with "https://"

"https://" which is in every link.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

If the file isn't too long I would do it this way

Global $sFile = FileRead("Your filename goes here") ; Read the whole file into a variable
StringReplace($sFile, "https://", "https://") ; Replace the link with itself
ConsoleWrite("Number of links in the file: " & @extended) ; @extended holds the number of replacements

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Did you insert the space intentionally?

"https: //"

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Thanks a lot the file is about 5kb should work for me :D

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

@water Little question @Extended this does what exactly? Does it Returns the Count of operations of a Function does or something other?

Edited by RaiNote

  • C++/AutoIt/OpenGL Easy Coder
  • I will be Kind to you and try to help you
  • till what you want isn't against the Forum
  • Rules~

 

Share this post


Link to post
Share on other sites

@extended is set by StringReplace and returns the number of replacements that have been done.

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

ah ok thank you very much.


  • C++/AutoIt/OpenGL Easy Coder
  • I will be Kind to you and try to help you
  • till what you want isn't against the Forum
  • Rules~

 

Share this post


Link to post
Share on other sites

It is described in the help file: StringRegExp

Return Value

Returns the new string with the number of replacements performed stored in the @extended macro.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

If the file isn't too long I would do it this way

Global $sFile = FileRead("Your filename goes here") ; Read the whole file into a variable
StringReplace($sFile, "https://", "https://") ; Replace the link with itself
ConsoleWrite("Number of links in the file: " & @extended) ; @extended holds the number of replacements

 

Sorry if my question is really stupid but how can I save the counted links number into a variable?

Share this post


Link to post
Share on other sites

$count = @extended   :)

Share this post


Link to post
Share on other sites

tanks :sweating:

 

Share this post


Link to post
Share on other sites

try this: (CODE TESTED AND VERIFIED) this will check for links and will check them if they are real (requires internet connection added to make your functions better)

#include <string.au3>
#include <Array.au3>

$occur = _FindlinkOcuurance("https://www.autoitscript.com text in between https://www.autoitscript.com just some text " & @CRLF & "https://www.google.com https://thereisnoserverlikethis.com")
_ArrayDisplay($occur)

; #FUNCTION# ====================================================================================================================
; Name ..........: _FindlinkOcuurance
; Description ...:
; Syntax ........: _FindlinkOcuurance($string[, $check = True[, $timout = 4000]])
; Parameters ....: $string              - the main string to be checked
;                  $check               - [optional] True or false. Default is True.if true the link will be
;                                         checken if exists in theinternet (requires data connection)
;                  $timout              - [optional] the timeout period to check for the link in the internet
;                                         set a large value for poor network connection and vice versa
; Return values .: $ary                 - A two dimensional array where the first element of the first coulmn
;                                         is the number of links found and the first elemnt in the second column
;                                         is the links that are found.the second element in first column is the
;                                         number of true links that exists in the internet and the second element
;                                         in the second column has the true links that exists in the internet
; Author ........: Surya Saradhi.B
; Modified ......: 05/09/15
; Remarks .......: Requires internet connection if the link is to be checked,the second element in the first column and the
;                  second element in the second column are set if the links are to be verified in the internet
; ===============================================================================================================================
Func _FindlinkOcuurance($string, $check = True, $timout = 4000)
    $strs = StringSplit(StringReplace($string, @CRLF, " "), " ")
    Local $find[2][2] = [[0, ""], [0, ""]]
    For $i = 1 To $strs[0]
        If StringInStr($strs[$i], "https://") Then
            $find[0][0] += 1
            $subs = _StringBetween($strs[$i], "https://", ".com")
            If Not @error Then $find[0][1] = $find[0][1] & "|" & "https://" & $subs[0] & ".com"
            If $check Then
                $linked = _StringBetween($strs[$i], "https://", ".com")
                If Not @error Then
                    $link = $linked[0] & ".com"
                    $pin = Ping($link, $timout)
                    If Not @error Then
                        $find[1][0] += 1
                        $find[1][1] = $find[1][1] & "|" & "https://" & $link
                    EndIf
                EndIf
            EndIf
        EndIf
    Next
    $find[1][1] = StringTrimLeft($find[1][1], 1)
    $find[0][1] = StringTrimLeft($find[0][1], 1)
    Return $find
EndFunc   ;==>_FindlinkOcuurance

 


No matter whatever the challenge maybe control on the outcome its on you its always have been.

MY UDF: Transpond UDF (Sent vriables to Programs) , Utter UDF (Speech Recognition)

Share this post


Link to post
Share on other sites

just because it starts with https:// would not assume it ends with .com, moreover would not assume that it could be pinged.  I dont really know what would be a solid method, maybe testing the @extended from _inetgetsource for a value greater than 0?


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
$sContent = FileRead("source.html")

$timer = TimerInit()
StringReplace($sContent, "https://", "")
$count = @extended
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
StringRegExpReplace($sContent, "https://", "")
$count = @extended
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
$count = UBound( StringRegExp($sContent, "https://", 3) )
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

 

Share this post


Link to post
Share on other sites

if you bring it in with stripped white space its even quicker, naturally.

 

#include <Inet.au3>
 $sContent = _INetGetSource("https://autoitscript.com")


$timer = TimerInit()
StringReplace($sContent, "https://", "https://")
$count = @extended
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
StringRegExpReplace($sContent, "https://", "")
$count = @extended
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
$count = UBound( StringRegExp($sContent, "https://", 3) )
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
$count = UBound( Stringsplit($sContent, "https://", 3)) - 1
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

;stripping ws

$sContent = stringstripws(_INetGetSource("https://autoitscript.com") , 8)

$timer = TimerInit()
StringReplace($sContent, "https://", "https://")
$count = @extended
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
StringRegExpReplace($sContent, "https://", "")
$count = @extended
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
$count = UBound( StringRegExp($sContent, "https://", 3) )
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

$timer = TimerInit()
$count = UBound( Stringsplit($sContent, "https://", 3)) - 1
ConsoleWrite($count & @TAB & TimerDiff($timer) & @CRLF)

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

These RE Replace examples returns all the links from the HTML document, and not just the links from within the body tag.
The "s?" in the RE pattern means "s" can appear once or not at all.

#include <Inet.au3>

$sContent = _INetGetSource("https://autoitscript.com")

StringRegExpReplace($sContent, 'https://', "")
$count = @extended
ConsoleWrite($count & @TAB & 'https://' & @CRLF)

StringRegExpReplace($sContent, 'https?://', "")
$count = @extended
ConsoleWrite($count & @TAB & 'https?://' & @CRLF)

StringRegExpReplace($sContent, '"https?://', "")
$count = @extended
ConsoleWrite($count & @TAB & '"https?://' & @CRLF)

#cs ; Returns:
    80  https://
    89  https?://
    71  "https?://
#ce

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now