Jump to content

Retrieve all links from website and save to Notepad


Recommended Posts

Hi all,

I am new to AutoIT and hope that you can help me with a pretty simple problem.

Problem: I have a website with multiple links that need to be extracted

Task: First I want to extract all links that contain the substring "https://www.twst.com/interview/" in their HTML tagging. Secondly I want to write the links (URLs) into a Notepad (-txt) file. Please finde attached two pictures of exaple URLs in the HTML of the website.

Example2.PNG.fefb9537a36066a6c187208346fe3231.PNG

Result: What the Notepad file should look like (linewise entries of the URLs):

https://www.twst.com/interview/actively-managing-a-blockchain-and-cryptocurrency-etf

https://www.twst.com/interview/sticking-to-the-process-to-add-yield-and-deliver-returns

.....

Any ideas how to do that? Many thanks in advance!

Example.PNG

Link to comment
Share on other sites

I'm bored so here is a working example:

#include <IE.au3>
Local $bFileOpen = False, $hFileOpen, $sFileOpen = @ScriptDir & "\twstlinks.txt"
Local $sSearch = "https://www.twst.com/interview/"
Local $iSearch = StringLen($sSearch)
Local $oIE = _IECreate("https://www.twst.com/", 1)
Local $oLinks = _IELinkGetCollection($oIE)
If IsObj($oLinks) Then
    For $oLink In $oLinks
        If StringLeft($oLink.href, $iSearch) = $sSearch Then _WriteLinks($oLink.href)
    Next
EndIf
If $bFileOpen Then FileClose($hFileOpen)

Func _WriteLinks($_sLink)
    If $bFileOpen Then
        FileWrite($hFileOpen, $_sLink & @CRLF)
    Else
        $hFileOpen = FileOpen($sFileOpen, 1)
        FileWrite($hFileOpen, $_sLink & @CRLF)
        $bFileOpen = True
    EndIf
EndFunc

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...