Jump to content

How to find extract links on a website (keystroke / routine)


babyjoe
 Share

Recommended Posts

Hello,

I would be happy with some advice on my project:

Purpose: I have to make a crawler that searches 1 single internet page, and visits every link on that page.

Then, I just need to copy-paste the content from that visited link into a DB file.

Back again to central page, visit next link.

I have some mediocre programming skills in Java, but have just discovered AutoIt.

It seems to me that the simplest thing to do, is passing keystrokes to the browser, especially for the copy/paste.

However, this is my problem: How can I direct the browser to the next link?? 'Tab' doesn't work as an advancing keystroke, so I do not know how to extract the links.

Should I try to find out what pixels are blue (only the hyperlinks are blue) and try to click those pixels?

Is there anyone who has had a similar project and has some advice?

Thanks!

Edited by babyjoe
Link to comment
Share on other sites

So what you what to do is:

- Browse through X pages (or links)

Question: Are the links preset?

Or is it just like: browse through www.example.com\ + any pages related (not preset)

i.e.

www.example.com, www.example.com\page1, www.example.com\page2, etc

Edited by _Kurt

Awaiting Diablo III..

Link to comment
Share on other sites

Browse through all links on a page (they do not link to other physical pages, they are DB generated)

I have come up with the following, but I have one caveat.

I used Firefox, because you can use the cursor to select text, however, when you push the back button, you are back, but the link you clicked is not selected (in IE, when you go back, the last link you clicked is the active item).

If I could make FF select the last visited link, I could keep using the down button to go to the next link.

Now I have to keep a counter that tracks how many times I went down, and add 1

Hotkeyset('{F9}', 'Zoek')

While 1

sleep (100)

wend

Func Zoek()

Opt("WinTitleMatchMode", 2)

send("{DOWN}")

sleep (50)

send("{ENTER}")

sleep (50)

send("^F")

sleep (50)

send("^a")

sleep (50)

send("{SHIFTDOWN}")

sleep (50)

send("{END}")

sleep (50)

send("{SHIFTUP}")

sleep (50)

send("^c")

sleep (50)

WinActivate("WordPad")

send("^v")

sleep (50)

send("{ENTER}")

sleep(50)

WinActivate("Firefox")

EndFunc

Edited by babyjoe
Link to comment
Share on other sites

  • Moderators

See if this is like what you are wanting.

#include <IE.au3>

$sURL = "http://www.google.com"
$oIE = _IECreate($sURL)
$oLinks = _IELinkGetCollection($oIE)
For $oLink In $oLinks
    $sHREF = $oLink.href
    $oIE2 = _IECreate($sHREF, 0, 0)
    $sText = _IEBodyReadText($oIE2)
    ConsoleWrite("<<<<<<<<<<>>>>>>>>>>" & @CR)
    ConsoleWrite($sText & @CR)
    ConsoleWrite(">>>>>>>>>><<<<<<<<<<" & @CR & @CR)
    _IEQuit($oIE2)
Next
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...