Jump to content

Recommended Posts

Posted (edited)

Links Inspector

This AutoIt script designed to scan a text-based file (e.g., TXT, HTML, XML, MD) for URLs and check their current HTTP status code.  (to see if the link is active)

    Results can be filtered to show:
        All links  '*'
        All non-200 codes  '!'
        Specific codes e.g.  '404, 503, 301'
 

HTTP response status codes


_LinksInspector.au3

; https://www.autoitscript.com/forum/topic/213221-_linksinspector/
;----------------------------------------------------------------------------------------
; Title...........: _LinksInspector.au3
; Description.....: Searches a file for URLs and checks their status codes.
; AutoIt Version..: 3.3.16.1   Author: ioa747  Script Version: 0.4
; Note............: Testet in Win10 22H2       Date:03/10/2025
;----------------------------------------------------------------------------------------
#AutoIt3Wrapper_AU3Check_Parameters=-d -w 1 -w 2 -w 3 -w 4 -w 5 -w 6 -w 7
#NoTrayIcon
#include <GuiListView.au3>
#include <GUIConstants.au3>
#include <WinAPIShellEx.au3>

; Constants for Filtering
Global Const $LINKS_BROKEN = "0, 404, 500, 501, 502, 503, 504"
Global Const $LINKS_NEEDS_REVIEW = "301, 302, 307, 400, 401, 403" ; (Redirections, Unauthorized, Forbidden)

; Constant for WinHttp Options
Global Const $WinHttpRequestOption_EnableRedirects = 6

; Global variable
Global $g_hListView, $g_iListIndex = -1
Global $g_ObjErr = ObjEvent("AutoIt.Error", "__ObjAutoItErrorEvent")
Global $g_aLastComError[0] ;  Global variable to store the last COM error: [Description, Number, Source, ScriptLine]
Global $g_oHTTP = ObjCreate("WinHttp.WinHttpRequest.5.1")
If Not IsObj($g_oHTTP) Then
    MsgBox(16, "Error", "Failed to create WinHttp.WinHttpRequest.5.1 COM object.")
    Exit
EndIf

; #FUNCTION# ====================================================================================================================
; Name...........: _LinksInspector
; Description....: Searches a file for URLs and checks their status codes, filtering based on specified criteria.
; Syntax.........: _LinksInspector($sFilePath [, $sFilter = "*" [, $bAttribOnly = False [, $idProgress = 0]]])
; Parameters.....: $sFilePath   - The path to the file containing the text to be searched.
;                  $sFilter     - [Optional] Filtering mode:
;                                     "*": Show all results (default for full review).
;                                     "!": Show all except 200 (i.e., all errors and redirects).
;                                     "400, 404": Show only the specific comma-separated status codes.
;                  $bAttribOnly - [Optional] True = Search ONLY for URLs within HTML/XML attributes (e.g., href="..."). (Default = False)
;                : $idProgress  - [Optional] The control ID of the progress bar to update, if there is a GUI. Default 0 (no update).
; Return values..: Success      - Returns a 2D array: [LineNumbers (delimited by ';'), StatusCode, StatusText, URL].
;                  Failure      - Returns a empty 2D array and sets @error:
;                                 1 - The specified file path is invalid.
;                                 2 - No links found in the file content.
; Author ........: ioa747
; Modified ......:
; Remarks .......: This function it uses the WinHttp.WinHttpRequest.5.1 COM object for efficient and reliable network requests.
;                  Checks each unique URL only once, regardless of how many times it appears in the file.
;                  Uses the HEAD method to retrieve status codes without downloading the full page content.
;                  Automatically follows redirects (3xx codes) to find the final status (e.g., 200 or 404).
;                  Utilizes ObjEvent to silently capture and log COM errors (like timeouts or DNS failures) as Status Code 0.
; Related .......: __CheckLinkStatus, __ObjAutoItErrorEvent
; Link ..........: https://www.autoitscript.com/forum/topic/213221-_linksinspector/
;                  https://learn.microsoft.com/en-us/windows/win32/winhttp/winhttprequest
;                  https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status
; Example .......: _LinksInspector("C:\example.txt", "400, 404") ; to find and check URLs with specific status codes.
; ===============================================================================================================================
Func _LinksInspector($sFilePath, $sFilter = "*", $bAttribOnly = False, $idProgress = 0)
    Local $aResults[0][4]
    Local $aUniqueLinks[0][2] ; [URL, Line_Numbers_String (e.g., "12;24")]

    ; Define Regex Patterns based on the optional flag
    ; Pattern for ATTRIBUTE SEARCH (Higher precision for HTML/XML): Finds URLs starting after =" or ='
    Local $sPatternAttrib = '(?i)[=""''](https?:\/\/[^""''\s<>]+)'

    ; Pattern for FULL SEARCH (Includes Attributes and Plain Text): The original broad pattern
    Local $sPatternFull = '(?i)(https?:\/\/[^""''\s<>]+)'


    Local $aFileLines = FileReadToArray($sFilePath)
    If @error Then
        MsgBox(16, "Error", "Failed to read file: " & $sFilePath)
        Return SetError(1, 0, $aResults)
    EndIf

    ; Filter Preprocessing (Logic remains the same)
    $sFilter = StringStripWS($sFilter, 8)
    Local $bFilterAll = ($sFilter = "*")
    Local $bFilterExclude200 = ($sFilter = "!")
    Local $aFilterCodes = 0
    If Not $bFilterAll And Not $bFilterExclude200 Then
        $aFilterCodes = StringSplit($sFilter, ",", 2)
    EndIf

    Local $iLineCount = UBound($aFileLines)

    ; Select the appropriate pattern
    Local $sPattern = $sPatternFull
    If $bAttribOnly Then
        $sPattern = $sPatternAttrib
    EndIf

    ; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ; STAGE 1: Extract all links and record all lines where they appear (Handling Duplicates)
    ; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    For $i = 0 To $iLineCount - 1
        Local $sLine = $aFileLines[$i]
        Local $iLineNum = $i + 1

        ; Use the selected pattern to find links
        Local $aLinks = StringRegExp($sLine, $sPattern, 3)

        If Not @error And IsArray($aLinks) Then
            For $j = 0 To UBound($aLinks) - 1
                Local $sCleanURL = StringReplace($aLinks[$j], "&amp;", "&")
                $sCleanURL = StringRegExpReplace($sCleanURL, '[\)\(\"''<>,\.]$', "")
                $sCleanURL = StringStripWS($sCleanURL, 3)

                ; Find if the URL already exists in our unique list
                Local $iIndex = _ArraySearch($aUniqueLinks, $sCleanURL, 0, 0, 0, 0, 1, 0)

                If $iIndex = -1 Then
                    ; URL is new, add it to the unique list
                    _ArrayAdd($aUniqueLinks, $sCleanURL & "|" & $iLineNum, "|")
                Else
                    ; URL already exists, append the current line number to the string
                    $aUniqueLinks[$iIndex][1] = $aUniqueLinks[$iIndex][1] & ";" & $iLineNum
                EndIf
            Next
        EndIf
    Next

    If UBound($aUniqueLinks) = 0 Then Return SetError(2, 0, $aResults)

    ; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ; STAGE 2: Check the status of each UNIQUE link and apply filter
    ; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Local $iUniqueCount = UBound($aUniqueLinks)

    For $i = 0 To $iUniqueCount - 1

        ; *** Update GUI only if a valid Progress Bar ID is given ***
        If $idProgress <> 0 Then
            Local $iPercent = Int((($i + 1) / $iUniqueCount) * 100)
            GUICtrlSetData($idProgress, $iPercent)
            Sleep(10) ; Short pause for GUI response
        EndIf

        Local $sURL = $aUniqueLinks[$i][0]
        Local $sLineNums = $aUniqueLinks[$i][1]

        Local $aStatus = __CheckLinkStatus($sURL)
        Local $iStatusCode = $aStatus[0]

        ; Filtering Logic
        Local $bAddResult = False
        If $bFilterAll Then
            $bAddResult = True
        ElseIf $bFilterExclude200 Then
            If $iStatusCode <> 200 Then $bAddResult = True
        ElseIf IsArray($aFilterCodes) Then
            If _ArraySearch($aFilterCodes, $iStatusCode) <> -1 Then $bAddResult = True
        EndIf

        If $bAddResult Then
            _ArrayAdd($aResults, $sLineNums & "|" & $iStatusCode & "|" & $aStatus[1] & "|" & $sURL)
        EndIf

        ; for debugging purposes only
        ConsoleWrite(($bAddResult ? "+ " : "- ") & $sLineNums & " |> " & $aStatus[1] & " |> " & $sURL & @CRLF)
    Next

    If UBound($aResults) = 0 Then Return SetError(2, 0, $aResults)

    Return $aResults
EndFunc   ;==>_LinksInspector
;---------------------------------------------------------------------------------------
Func __CheckLinkStatus($sURL)
    Local $iStatusCode = 0
    Local $sStatusText = "Failed - Connection/Timeout Error"

    ; Set timeouts for the current request
    ; ResolveTimeout: 5 sec ; ConnectTimeout: 5 sec ; SendTimeout: 10 sec ; ReceiveTimeout: 10 sec
    $g_oHTTP.SetTimeouts(5000, 5000, 10000, 10000)

    ; *** WinHttp will follow up to 10 redirects to find the final code ($200 or $404).
    $g_oHTTP.SetOption($WinHttpRequestOption_EnableRedirects, True)

    ; Clear the global COM error log before the call
    ReDim $g_aLastComError[0]

    ; Open and Send the Request
    $g_oHTTP.Open("HEAD", $sURL, False)
    $g_oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")

    ; If a COM error occurs here (e.g. DNS fail), it will fill $g_aLastComError,
    ; but the script flow will continue without a MsgBox.
    $g_oHTTP.Send()

    ; Check the global COM error log immediately after the call
    If UBound($g_aLastComError) > 0 Then ; A COM errors
        $iStatusCode = 0
        $sStatusText = "Failed - COM Error: (" & StringReplace($g_aLastComError[0], @CRLF, " ") & ")"
    ElseIf @error Then ; AutoIt errors
        $iStatusCode = 0
        $sStatusText = "Failed - AutoIt Error (" & @error & ")"
    Else ; The call was successful, retrieve the HTTP status
        $iStatusCode = $g_oHTTP.Status
        $sStatusText = $g_oHTTP.StatusText
    EndIf

    ; Process Status Text for final output
    Select
        Case $iStatusCode == 0
            ; Status text is already set
        Case $iStatusCode == 200
            $sStatusText = "Alive - OK"
        Case $iStatusCode >= 300 And $iStatusCode < 400
            ; With automatic tracking, 3xx codes will rarely appear here,
            ; unless 10 redirects are exceeded.
            $sStatusText = "Redirected (Needs Review)"
        Case $iStatusCode == 404
            $sStatusText = "Not Found"
        Case $iStatusCode >= 400 And $iStatusCode < 500
            $sStatusText = "Client Error"
        Case $iStatusCode >= 500 And $iStatusCode < 600
            $sStatusText = "Server Error"
        Case Else
            If StringStripWS($sStatusText, 3) == "" Then $sStatusText = "Unknown Status (" & $iStatusCode & ")"
    EndSelect

    Local $aResults = [$iStatusCode, $sStatusText]
    Return $aResults
EndFunc   ;==>__CheckLinkStatus
;---------------------------------------------------------------------------------------
Func __ObjAutoItErrorEvent()

    If IsObj($g_ObjErr) Then

        ; This filters out false positives with an empty description.
        If $g_ObjErr.Number <> 0 And StringStripWS($g_ObjErr.Description, 3) <> "" Then

            ; Store the error details in the global array (instead of showing MsgBox)
            ReDim $g_aLastComError[4]
            $g_aLastComError[0] = $g_ObjErr.description
            $g_aLastComError[1] = Hex($g_ObjErr.Number, 8) ; $g_ObjErr.Number
            $g_aLastComError[2] = $g_ObjErr.Source
            $g_aLastComError[3] = $g_ObjErr.ScriptLine
            ; ConsoleWrite('@@(' & $g_aLastComError[3] & ') :: COM Error Logged: Desc.: "' & StringReplace($g_aLastComError[0], @CRLF, " ") & '"' & @CRLF)
        EndIf

        ; Clear the properties of ObjEvent
        $g_ObjErr.Description = ""
        $g_ObjErr.Number = 0
    EndIf

EndFunc   ;==>__ObjAutoItErrorEvent
;---------------------------------------------------------------------------------------
Func _LinksInspectorGUI($sFilePath = "") ; Function to create the main graphical user interface

    _WinAPI_SetCurrentProcessExplicitAppUserModelID(StringTrimRight(@ScriptName, 4))
    Local $hGUI = GUICreate("Links Inspector", 700, 500)
    GUISetIcon(@SystemDir & "\shell32.dll", -136)

    GUICtrlCreateLabel("File Path:", 10, 15, 50, 20) ; ***
    Local $idInputFile = GUICtrlCreateInput($sFilePath, 60, 10, 530, 24)
    Local $idBtnBrowse = GUICtrlCreateButton("Browse", 600, 10, 90, 24)
    GUICtrlCreateLabel("Filter:", 60, 45, 30, 20) ; ***
    GUICtrlSetTip(-1, " '*' Show all results" & @CRLF & " '!' Show all except 200" & @CRLF & " '400, 404' Show only the specific status codes.")
    Local $idInputFilter = GUICtrlCreateInput("*", 90, 40, 200, 24) ; ***
    GUICtrlSetFont(-1, 12)
    Local $idCheckboxAttrib = GUICtrlCreateCheckbox("Attribute Search Only", 310, 43, 150, 20) ; ***
    Local $idBtnInspect = GUICtrlCreateButton("Start Inspection", 600, 40, 90, 24)
    Local $idBtnSaveReport = GUICtrlCreateButton("Save Report", 500, 40, 90, 24)
    GUICtrlSetState(-1, $GUI_DISABLE)
    Local $idIconInfo = GUICtrlCreateIcon("wmploc.dll", -60, 20, 44, 16, 16)

    $g_hListView = _GUICtrlListView_Create($hGUI, "", 10, 80, 680, 380)
    Local $iExListViewStyle = BitOR($LVS_EX_FULLROWSELECT, $LVS_EX_SUBITEMIMAGES, $LVS_EX_GRIDLINES, $LVS_EX_DOUBLEBUFFER, $LVS_EX_INFOTIP)
    _GUICtrlListView_SetExtendedListViewStyle($g_hListView, $iExListViewStyle)

    Local $idProgress = GUICtrlCreateProgress(10, 470, 680, 20)
    GUISetState(@SW_SHOW)

    _GUICtrlListView_RegisterSortCallBack($g_hListView, 0)
    GUIRegisterMsg($WM_NOTIFY, "WM_NOTIFY")

    ; Add columns to $g_hListView "Line(s)|Code|Status|URL"
    _GUICtrlListView_AddColumn($g_hListView, "Line(s)", 50)
    _GUICtrlListView_AddColumn($g_hListView, "Code", 40)
    _GUICtrlListView_AddColumn($g_hListView, "Status", 80)
    _GUICtrlListView_AddColumn($g_hListView, "URL", 500)

    Local $mCODES[]
    Local $aHTTP_STATUS = _HTTP_STATUS($mCODES)

    Local $nMsg, $aResults, $iLastStatusID, $sTipTitle, $sTipText, $iLastIndex = -1

    While 1
        $nMsg = GUIGetMsg()
        Switch $nMsg
            Case $GUI_EVENT_CLOSE
                Exit
            Case $idBtnBrowse
                $sFilePath = FileOpenDialog("Select File to Inspect", @ScriptDir, "All Files (*.*)", 1, GUICtrlRead($idInputFile))
                If @error Then ContinueLoop
                GUICtrlSetData($idInputFile, $sFilePath)
                _GUICtrlListView_DeleteAllItems($g_hListView) ; Clear Listview
                GUICtrlSetState($idBtnSaveReport, $GUI_DISABLE) ; disable the Save_Report button
                $aResults = 0

            Case $idBtnInspect
                ; Reset UI elements
                _GUICtrlListView_DeleteAllItems($g_hListView) ; Clear Listview
                GUICtrlSetData($idProgress, 0)
                GUICtrlSetState($idBtnSaveReport, $GUI_DISABLE) ; disable the Save_Report button
                $aResults = 0

                ; Get user input
                $sFilePath = GUICtrlRead($idInputFile)
                Local $sFilter = GUICtrlRead($idInputFilter)
                Local $bAttribOnly = GUICtrlRead($idCheckboxAttrib) = $GUI_CHECKED

                ; Input validation
                If Not FileExists($sFilePath) Then
                    MsgBox(48, "Error", "File not found: " & $sFilePath)
                    ContinueLoop
                EndIf

                GUICtrlSetState($idBtnInspect, $GUI_DISABLE) ; Temporarily disable the Inspect button during inspection
                $aResults = _LinksInspector($sFilePath, $sFilter, $bAttribOnly, $idProgress)

                ; Handle results
                If @error = 1 Then
                    ; Error 1 is already handled inside _LinksInspector (FileReadToArray error)
                ElseIf @error = 2 Then
                    MsgBox(64, "Info", "No links found matching the criteria in the file.")
                Else
                    ; Add results to Listview
                    _GUICtrlListView_SetItemCount($g_hListView, UBound($aResults))
                    _GUICtrlListView_AddArray($g_hListView, $aResults)
                    ; MsgBox(64, "Success", "Inspection complete. Found " & $iCount & " results.")
                EndIf

                Sleep(500) ; give some time to show the ProgressBar
                GUICtrlSetData($idProgress, 0) ; Update progress bar to 0%
                GUICtrlSetState($idBtnInspect, $GUI_ENABLE) ; enable Inspect button
                If UBound($aResults) > 0 Then GUICtrlSetState($idBtnSaveReport, $GUI_ENABLE) ; enable Save_Report button

            Case $idBtnSaveReport
                Local $sReportPath = FileSaveDialog("Save LinksInspector Report", @ScriptDir, _
                        "Text Files (*.txt)", 1, "LinksInspector Report.txt")
                If Not @error And $sReportPath <> "" Then
                    If FileExists($sReportPath) Then
                        If MsgBox($MB_YESNO + $MB_ICONWARNING, "File already exists", $sReportPath & @CRLF & _
                                "Do you want to replace it?") = $IDNO Then ContinueLoop
                        FileDelete($sReportPath)
                    EndIf
                    Local $sReportData = _ArrayToString($aResults)
                    $sReportData = "Line(s)|Code|Status|URL" & @CRLF & $sReportData
                    FileWrite($sReportPath, $sReportData)
                EndIf

        EndSwitch

        ; Update the ToolTip of the info icon
        If $iLastIndex <> $g_iListIndex Then
            $iLastIndex = $g_iListIndex
            ConsoleWrite("$iLastIndex=" & $iLastIndex & @CRLF)
            $iLastStatusID = Int(_GUICtrlListView_GetItemText($g_hListView, $iLastIndex, 1))
            If $iLastStatusID = 0 Then
                $sTipTitle = "(0) COM Error"
                $sTipText = _GUICtrlListView_GetItemText($g_hListView, $iLastIndex, 2)
            Else
                $sTipTitle = ""
                $sTipText = ""
                If MapExists($mCODES, $iLastStatusID) Then
                    $sTipTitle = "(" & $aHTTP_STATUS[$mCODES[$iLastStatusID]][0] & ") " & $aHTTP_STATUS[$mCODES[$iLastStatusID]][1]
                    $sTipText = StringFormat($aHTTP_STATUS[$mCODES[$iLastStatusID]][2])
                EndIf
            EndIf
            GUICtrlSetTip($idIconInfo, $sTipText, $sTipTitle, $TIP_INFOICON)
        EndIf
    WEnd
EndFunc   ;==>_LinksInspectorGUI
;---------------------------------------------------------------------------------------
Func WM_NOTIFY($hWnd, $iMsg, $wParam, $lParam)
    #forceref $hWnd, $iMsg, $wParam
    Local $hWndFrom, $iCode, $tNMHDR, $tInfo, $index, $subitem, $sURL
    $tNMHDR = DllStructCreate($tagNMHDR, $lParam)
    $hWndFrom = HWnd(DllStructGetData($tNMHDR, "hWndFrom"))
    $iCode = DllStructGetData($tNMHDR, "Code")

    Switch $hWndFrom
        Case $g_hListView
            Switch $iCode
                Case $LVN_COLUMNCLICK
                    $tInfo = DllStructCreate($tagNMLISTVIEW, $lParam)
                    ;$index = DllStructGetData($tInfo, "Index")
                    $subitem = DllStructGetData($tInfo, "SubItem")
                    ; Kick off the sort callback
                    _GUICtrlListView_SortItems($hWndFrom, $subitem)
                    ; No return value
                Case $NM_DBLCLK
                    $tInfo = DllStructCreate($tagNMITEMACTIVATE, $lParam)
                    $index = DllStructGetData($tInfo, "Index")
                    $subitem = DllStructGetData($tInfo, "SubItem")
                    $g_iListIndex = $index
                    $sURL = _GUICtrlListView_GetItemText($g_hListView, $index, 3)
                    If $subitem = 3 Then ShellExecute($sURL)
                    ; No return value
                Case $NM_CLICK
                    $tInfo = DllStructCreate($tagNMITEMACTIVATE, $lParam)
                    $index = DllStructGetData($tInfo, "Index")
                    ;$subitem = DllStructGetData($tInfo, "SubItem")
                    $g_iListIndex = $index
                    ConsoleWrite("$g_iListIndex=" & $g_iListIndex & @CRLF)
                    ; No return value
            EndSwitch
    EndSwitch
    Return $GUI_RUNDEFMSG
EndFunc   ;==>WM_NOTIFY
;---------------------------------------------------------------------------------------
Func _HTTP_STATUS(ByRef $mMap)
    Local $aHTTP_STATUS_CODES[63][3] = [ _
            [100, "Continue", "This interim response indicates that the client \nshould continue the request or \nignore the response if the request is already finished."], _
            [101, "Switching Protocols", "This code is sent in response to \nan Upgrade request header from the \nclient and indicates the protocol the server is switching to."], _
            [102, "Processing Deprecated", "This code was used in WebDAV contexts to indicate \nthat a request has been received by the server, \nbut no status was available at the time of the response."], _
            [103, "Early Hints", "This status code is primarily intended to be used with the Link header, \nletting the user agent start preloading resources while the server \nprepares a response or preconnect to an origin from which the page will need resources."], _
            [200, "OK", "The request succeeded. The result and meaning of 'success' depends on the HTTP method:\nGET: The resource has been fetched and transmitted in the message body.\nHEAD: Representation headers are included in the response without any message body.\nPUT or POST: The resource describing the result of the action is transmitted in the message body. \nTRACE: The message body contains the request as received by the server."], _
            [201, "Created", "The request succeeded, \nand a new resource was created as a result. \nThis is typically the response sent after POST requests, \nor some PUT requests."], _
            [202, "Accepted", "The request has been received but not yet acted upon. \nIt is noncommittal, since there is no way in HTTP to later send an \nasynchronous response indicating the outcome of the request. \nIt is intended for cases where another process \nor server handles the request, or for batch processing."], _
            [203, "Non-Authoritative Information", "This response code means the returned metadata \nis not exactly the same as is available from the origin server, \nbut is collected from a local or a third-party copy. \nThis is mostly used for mirrors or backups of another resource. \nExcept for that specific case, the 200 OK response is preferred to this status."], _
            [204, "No Content", "There is no content to send for this request, but the headers are useful. \nThe user agent may update its cached headers for this resource with the new ones."], _
            [205, "Reset Content", "Tells the user agent to reset the document which sent this request."], _
            [206, "Partial Content", "This response code is used in response to a range request \nwhen the client has requested a part or parts of a resource."], _
            [207, "Multi-Status (WebDAV)", "Conveys information about multiple resources, \nfor situations where multiple status codes might be appropriate."], _
            [208, "Already Reported (WebDAV)", "Used inside a <dav:propstat> response element to avoid \nrepeatedly enumerating the internal members of multiple bindings to the same collection."], _
            [226, "IM Used (HTTP Delta encoding)", "The server has fulfilled a GET request for the resource, \nand the response is a representation of the result of one or more \ninstance-manipulations applied to the current instance."], _
            [300, "Multiple Choices", "In agent-driven content negotiation, \nthe request has more than one possible response and \nthe user agent or user should choose one of them. \nThere is no standardized way for clients to automatically \nchoose one of the responses, so this is rarely used."], _
            [301, "Moved Permanently", "The URL of the requested resource has been changed permanently. \nThe new URL is given in the response."], _
            [302, "Found", "This response code means that the URI of \nrequested resource has been changed temporarily. \nFurther changes in the URI might be made in the future, \nso the same URI should be used by the client in future requests."], _
            [303, "See Other", "The server sent this response to direct the client \nto get the requested resource at another URI with a GET request."], _
            [304, "Not Modified", "This is used for caching purposes. \nIt tells the client that the response has not been modified, \nso the client can continue to use the same cached version of the response."], _
            [305, "Use Proxy Deprecated", "Defined in a previous version of the HTTP specification \nto indicate that a requested response must be accessed by a proxy. \nIt has been deprecated due to security concerns regarding in-band configuration of a proxy."], _
            [306, "unused", "This response code is no longer used; \nbut is reserved. It was used in a previous version of the HTTP/1.1 specification."], _
            [307, "Temporary Redirect", "The server sends this response to direct the client to get the requested resource \nat another URI with the same method that was used in the prior request. \nThis has the same semantics as the 302 Found response code, \nwith the exception that the user agent must not change the HTTP method used: \nif a POST was used in the first request, a POST must be used in the redirected request."], _
            [308, "Permanent Redirect", "This means that the resource is now permanently located at another URI, \nspecified by the Location response header. \nThis has the same semantics as the 301 Moved Permanently HTTP response code, \nwith the exception that the user agent must not change the HTTP method used: \nif a POST was used in the first request, \na POST must be used in the second request."], _
            [400, "Bad Request", "The server cannot or will not process the request due \nto something that is perceived to be a client error \n(e.g., malformed request syntax, invalid request message framing, \nor deceptive request routing)."], _
            [401, "Unauthorized", "Although the HTTP standard specifies 'unauthorized', \nsemantically this response means 'unauthenticated'. \nThat is, the client must authenticate itself to get the requested response."], _
            [402, "Payment Required", "The initial purpose of this code was for digital payment systems, \nhowever this status code is rarely used and no standard convention exists."], _
            [403, "Forbidden", "The client does not have access rights to the content; \nthat is, it is unauthorized, so the server is refusing \nto give the requested resource. \nUnlike 401 Unauthorized, \nthe client's identity is known to the server."], _
            [404, "Not Found", "The server cannot find the requested resource. \nIn the browser, this means the URL is not recognized. \nIn an API, this can also mean that the endpoint is valid but the resource itself does not exist. \nServers may also send this response instead of 403 Forbidden \nto hide the existence of a resource from an unauthorized client. \nThis response code is probably the most well known \ndue to its frequent occurrence on the web."], _
            [405, "Method Not Allowed", "The request method is known by the server \nbut is not supported by the target resource. \nFor example, an API may not allow DELETE on a resource, \nor the TRACE method entirely."], _
            [406, "Not Acceptable", "This response is sent when the web server, \nafter performing server-driven content negotiation, \ndoesn't find any content that conforms to the criteria \ngiven by the user agent."], _
            [407, "Proxy Authentication Required", "This is similar to 401 Unauthorized but \nauthentication is needed to be done by a proxy."], _
            [408, "Request Timeout", "This response is sent on an idle connection by some servers, \neven without any previous request by the client. \nIt means that the server would like to shut down this unused connection. \nThis response is used much more since some browsers use HTTP pre-connection mechanisms to speed up browsing. \nSome servers may shut down a connection without sending this message."], _
            [409, "Conflict", "This response is sent when a request conflicts with the current state of the server. \nIn WebDAV remote web authoring, \n409 responses are errors sent to the client so that a user might be \nable to resolve a conflict and resubmit the request."], _
            [410, "Gone", "This response is sent when the requested content has been \npermanently deleted from server, \nwith no forwarding address. \nClients are expected to remove their caches and links to the resource. \nThe HTTP specification intends this status code to be used for 'limited-time, \npromotional services'. \nAPIs should not feel compelled to indicate resources \nthat have been deleted with this status code."], _
            [411, "Length Required", "Server rejected the request because \nthe Content-Length header field is not defined and \nthe server requires it."], _
            [412, "Precondition Failed", "In conditional requests, \nthe client has indicated preconditions in its headers \nwhich the server does not meet."], _
            [413, "Content Too Large", "The request body is larger than limits defined by server. \nThe server might close the connection or \nreturn an Retry-After header field."], _
            [414, "URI Too Long", "The URI requested by the client is \nlonger than the server is willing to interpret."], _
            [415, "Unsupported Media Type", "The media format of the requested data is not supported by the server, \nso the server is rejecting the request."], _
            [416, "Range Not Satisfiable", "The ranges specified by the Range header field in the request cannot be fulfilled. \nIt's possible that the range is outside the size of the target resource's data."], _
            [417, "Expectation Failed", "This response code means the expectation indicated by \nthe Expect request header field cannot be met by the server."], _
            [418, "I'm a teapot", "The server refuses the attempt to brew coffee with a teapot."], _
            [421, "Misdirected Request", "The request was directed at a server that is not able to produce a response. \nThis can be sent by a server that is not configured \nto produce responses for the combination of scheme and \nauthority that are included in the request URI."], _
            [422, "Unprocessable Content (WebDAV)", "The request was well-formed but was unable to be followed due to semantic errors."], _
            [423, "Locked (WebDAV)", "The resource that is being accessed is locked."], _
            [424, "Failed Dependency (WebDAV)", "The request failed due to failure of a previous request."], _
            [425, "Too Early Experimental", "Indicates that the server is unwilling to \nrisk processing a request that might be replayed."], _
            [426, "Upgrade Required", "The server refuses to perform the request using the current protocol but \nmight be willing to do so after the client upgrades to a different protocol. \nThe server sends an Upgrade header in a 426 response to indicate the required protocol(s)."], _
            [428, "Precondition Required", "The origin server requires the request to be conditional. \nThis response is intended to prevent the 'lost update' problem, \nwhere a client GETs a resource's state, \nmodifies it and PUTs it back to the server, \nwhen meanwhile a third party has modified the state on the server, \nleading to a conflict."], _
            [429, "Too Many Requests", "The user has sent too many requests in a given amount of time (rate limiting)."], _
            [431, "Request Header Fields Too Large", "The server is unwilling to process the request because its header fields are too large. \nThe request may be resubmitted after reducing the size of the request header fields."], _
            [451, "Unavailable For Legal Reasons", "The user agent requested a resource that cannot legally be provided, \nsuch as a web page censored by a government."], _
            [500, "Internal Server Error", "The server has encountered a situation it does not know how to handle. \nThis error is generic, indicating that the server cannot find \na more appropriate 5XX status code to respond with."], _
            [501, "Not Implemented", "The request method is not supported by the server and cannot be handled. \nThe only methods that servers are required to support \n(and therefore that must not return this code) are GET and HEAD."], _
            [502, "Bad Gateway", "This error response means that the server, \nwhile working as a gateway to get a response needed to handle the request, \ngot an invalid response."], _
            [503, "Service Unavailable", "The server is not ready to handle the request. \nCommon causes are a server that is down for maintenance or that is overloaded. \nNote that together with this response, \na user-friendly page explaining the problem should be sent. \nThis response should be used for temporary conditions and the Retry-After HTTP header should, \nif possible, contain the estimated time before the recovery of the service. \nThe webmaster must also take care about the caching-related headers that are sent along with this response, \nas these temporary condition responses should usually not be cached."], _
            [504, "Gateway Timeout", "This error response is given when the server is \nacting as a gateway and cannot get a response in time."], _
            [505, "HTTP Version Not Supported", "The HTTP version used in the request is not supported by the server."], _
            [506, "Variant Also Negotiates", "The server has an internal configuration error: during content negotiation, \nthe chosen variant is configured to engage in content negotiation itself, \nwhich results in circular references when creating responses."], _
            [507, "Insufficient Storage (WebDAV)", "The method could not be performed on the resource because the server is unable \nto store the representation needed to successfully complete the request."], _
            [508, "Loop Detected (WebDAV)", "The server detected an infinite loop while processing the request."], _
            [510, "Not Extended", "The client request declares an HTTP Extension (RFC 2774) that should be used to process the request, \nbut the extension is not supported."], _
            [511, "Network Authentication Required", "Indicates that the client needs to authenticate to gain network access."] _
            ]

    Local $m[]
    Local $STATUS_CODES
    For $i = 0 To UBound($aHTTP_STATUS_CODES) - 1
        $STATUS_CODES = Int($aHTTP_STATUS_CODES[$i][0])
        $m[$STATUS_CODES] = $i
    Next

    $mMap = $m
    Return $aHTTP_STATUS_CODES
EndFunc   ;==>_HTTP_STATUS
;---------------------------------------------------------------------------------------
; ##### Example Usage demonstrating filters #####
;---------------------------------------------------------------------------------------

_Example()

Func _Example()
    Local $sTestFilePath = @ScriptDir & "\links_test.txt"

    ; With GUI
    _LinksInspectorGUI($sTestFilePath)

    ; or just as function
    Local $aLinks

;~  $aLinks = _LinksInspector($sTestFilePath, "*")                 ; Show ALL links
;~  $aLinks = _LinksInspector($sTestFilePath, "*", True)           ; Show ALL, but ONLY for URLs within HTML/XML attributes (e.g., href="...").
;~  $aLinks = _LinksInspector($sTestFilePath, "!")                 ; Show ALL results except 200
;~  $aLinks = _LinksInspector($sTestFilePath, "400, 404")          ; Show ONLY the 400 and 404
;~  $aLinks = _LinksInspector($sTestFilePath, $LINKS_BROKEN)       ; Show $LINKS_BROKEN = "0, 404, 500, 501, 502, 503, 504"
;~  $aLinks = _LinksInspector($sTestFilePath, $LINKS_NEEDS_REVIEW) ; Show $LINKS_NEEDS_REVIEW = "301, 302, 307, 400, 401, 403"

    _ArrayDisplay($aLinks, "$aLinks", "", 0, Default, "Line(s)|Code|Status|URL")

EndFunc   ;==>_Example

 

Please, every comment is appreciated!
leave your comments and experiences here!
Thank you very much  :)

 

Edited by ioa747
update to version 0.4

I know that I know nothing

Posted

@wakillon   Thanks to your comments, 
                          Regex has become stricter to exclude <> characters, and LinksInspector has become better

The first post has been updated with version 0.3

Thank you very much
for your intervention. :)

I know that I know nothing

Posted

Hi @ioa747 👋 ,

my pleasure to see your script and your creativitly like often. Thanks for sharing.

On 10/3/2025 at 4:59 AM, ioa747 said:

[...] and check their current HTTP status code.  (to see if the link is active)

To be honest, I am not sure I understand your approach and goal correctly. Let's assume I have such html, xml ... markdown file with links in it. Usually you can request such URLs like you describe and all is fine. But what conclusion would you aim for, when you get a 401 or a 403 as status code? In case you have URLs with a basic authentication or a OAuth2 authentification mechanism, you expect such status codes. In your general assumption this would be a failure or declared as unreachable or something like that.

I mean, if I understand the approach correctly, I like it, but in terms of analyzation it lakes of context for the specific URL. I don't see any use case where I can truly rely on the analysis. That's my problem. The implementation and logic are well done, but for what?

I hope you understand what I mean 😅 ?

Best regards

Sven

==> AutoIt related: 🔗 GitHub, 🔗 Discord Server, 🔗 Cheat Sheet🔗 autoit-webdriver-boilerplate

Spoiler

🌍 Au3Forums

🎲 AutoIt (en) Cheat Sheet

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔮 Me on GitHub

💬 Opinion about new forum sub category

📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...