Jump to content

click link, without know text


drrehak
 Share

Recommended Posts

hi, been working on this project for hours now. I'd like to scrape info from this site:

My link

the info I need though is on the page by clicking "Show" under Interactions. It's about halfway down the page. The problem I am having is there are multiple links named "Show". I tried cycling through them with this code:

#include <IE.au3>

$id = "DB00001"
$oIE = _IECreate ("http://www.drugbank.ca/drugs/" & $id)
$linkcount = 0

_IELoadWait ($oIE)
$sMyString = "show"
$oLinks = _IELinkGetCollection($oIE)
For $oLink in $oLinks
    $sLinkText = _IEPropertyGet($oLink, "innerText")
    If StringInStr($sLinkText, $sMyString) Then
        $linkcount = $linkcount+1
        ;MsgBox(0, "Link Info", $oLink.href)
        If $linkcount=8 Then
        _IEAction($oLink, "click")
        ExitLoop
        EndIf
    
    
    EndIf
Next

however, my larger script will be cycling through drugs, and the Interaction link is not always the 8th "Show" link.

Is there any way to assign focus to an unknown link that is next to an html tag? Thanks in advance for any help/direction here.

-Kevin

Link to comment
Share on other sites

I'm not sure how to do it using the _IE functions as I usually evaluate the source with StringRegEx. (it's faster)

Because I don't know if you want to actually have the information visible in the browser, or if you are just using it do get to the information you want I made a couple of examples.

This code tries to look exaclty like your example. It navigates to the main page for a drug, then to it's Interactions page. (if available) This makes it pretty slow.

#include <IE.au3>
$oIE = _IECreate ("about:blank")
For $i = 1 To 9
    $id = "DB" & StringFormat("%05i",$i)
    _IENavigate($oIE,"http://www.drugbank.ca/drugs/" & $id)
    $sSource = _IEBodyReadHTML($oIE)
    $aLink = StringRegExp($sSource,'<TD>Interactions</TD>\s+<TD><A class=link-out href="(.*?)"',1)
    If @error Then
        ConsoleWrite("No interactions link available for " & $id & @CRLF)
    Else
        _IENavigate($oIE,$aLink[0])
    EndIf
Next

This code navigates directly between Interaction pages, without loading the main pages in between. Slightly faster.

#include <IE.au3>
$oIE = _IECreate ("about:blank")
For $i = 1 To 9
    $id = "DB" & StringFormat("%05i",$i)
    $sSource = InetRead("http://www.drugbank.ca/drugs/" & $id)
    $sSource = BinaryToString($sSource)
    $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1)
    If @error Then
        ConsoleWrite("No interactions link available for " & $id & @CRLF)
    Else
        _IENavigate($oIE,$aLink[0])
    EndIf
Next

This code doesn't open IE, but instead returns an array with direct links to the Interactions pages for the different drugs.

#include <array.au3>
Local $aInteractions[10][2]
For $i = 1 To UBound($aInteractions) -1
    $aInteractions[$i][0] = "DB" & StringFormat("%05i",$i)
    $sSource = InetRead("http://www.drugbank.ca/drugs/" & $aInteractions[$i][0])
    $sSource = BinaryToString($sSource)
    $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1)
    If @error Then
        ConsoleWrite("No interactions link available for " & $aInteractions[$i][0] & @CRLF)
        $aInteractions[$i][1] = "Not Available"
    Else
        ConsoleWrite("Link saved for " & $aInteractions[$i][0] & @CRLF)
        $aInteractions[$i][1] = $aLink[0]
    EndIf
Next
_ArrayDisplay($aInteractions)

To increase the speed even more you could do simultaneous downloads with INetGet, but that takes a bit more time to throw together, so unless you are interested I'm not going to supply an example.

Link to comment
Share on other sites

I'm not sure how to do it using the _IE functions as I usually evaluate the source with StringRegEx. (it's faster)

Because I don't know if you want to actually have the information visible in the browser, or if you are just using it do get to the information you want I made a couple of examples.

This code tries to look exaclty like your example. It navigates to the main page for a drug, then to it's Interactions page. (if available) This makes it pretty slow.

#include <IE.au3>
$oIE = _IECreate ("about:blank")
For $i = 1 To 9
    $id = "DB" & StringFormat("%05i",$i)
    _IENavigate($oIE,"http://www.drugbank.ca/drugs/" & $id)
    $sSource = _IEBodyReadHTML($oIE)
    $aLink = StringRegExp($sSource,'<TD>Interactions</TD>\s+<TD><A class=link-out href="(.*?)"',1)
    If @error Then
        ConsoleWrite("No interactions link available for " & $id & @CRLF)
    Else
        _IENavigate($oIE,$aLink[0])
    EndIf
Next

This code navigates directly between Interaction pages, without loading the main pages in between. Slightly faster.

#include <IE.au3>
$oIE = _IECreate ("about:blank")
For $i = 1 To 9
    $id = "DB" & StringFormat("%05i",$i)
    $sSource = InetRead("http://www.drugbank.ca/drugs/" & $id)
    $sSource = BinaryToString($sSource)
    $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1)
    If @error Then
        ConsoleWrite("No interactions link available for " & $id & @CRLF)
    Else
        _IENavigate($oIE,$aLink[0])
    EndIf
Next

This code doesn't open IE, but instead returns an array with direct links to the Interactions pages for the different drugs.

#include <array.au3>
Local $aInteractions[10][2]
For $i = 1 To UBound($aInteractions) -1
    $aInteractions[$i][0] = "DB" & StringFormat("%05i",$i)
    $sSource = InetRead("http://www.drugbank.ca/drugs/" & $aInteractions[$i][0])
    $sSource = BinaryToString($sSource)
    $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1)
    If @error Then
        ConsoleWrite("No interactions link available for " & $aInteractions[$i][0] & @CRLF)
        $aInteractions[$i][1] = "Not Available"
    Else
        ConsoleWrite("Link saved for " & $aInteractions[$i][0] & @CRLF)
        $aInteractions[$i][1] = $aLink[0]
    EndIf
Next
_ArrayDisplay($aInteractions)

To increase the speed even more you could do simultaneous downloads with INetGet, but that takes a bit more time to throw together, so unless you are interested I'm not going to supply an example.

Wow! This is a really good example. I believe this will help me with a project I'm working on. One question in the "Else" states what does the brackets mean after the variables? "$aInteractions[$i][1] = $aLinks[0]" If you can can explain that, that would be awesome.

Edited by JayFran
Link to comment
Share on other sites

No. The the other way around :

$aInteractions[$i][1] = $aLinks[0] means that the value that is hold in the $aLinks array at index 0 will be copied to

the value hold at the aInteractions array at index $i (for example 1) ,1 so to give you an excel example if $i=1 then the value of $aLinks array at index 0 will be copied to the column"A" row 1 .

Link to comment
Share on other sites

you guys rock. thanks so much Jay! haven't had the time to try the examples. but I'm sure this will get me started. I think I will try what ever you say is fastest. I don't need to have the info visible in my browser, thats just the only way I know how to scrape. So I think I will experiment with inetget. I plan on running a script that will step through about 1500 drugs, and add this info to my database.

Link to comment
Share on other sites

No. The the other way around :

$aInteractions[$i][1] = $aLinks[0] means that the value that is hold in the $aLinks array at index 0 will be copied to

the value hold at the aInteractions array at index $i (for example 1) ,1 so to give you an excel example if $i=1 then the value of $aLinks array at index 0 will be copied to the column"A" row 1 .

Ahh! I see how that works! Right on! Thank you for the understanding on that.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...