Sign in to follow this  
Followers 0
the123punch

Challenging question about IE link clicking

9 posts in this topic

#1 ·  Posted (edited)

Hi guys,

I need to do the following:

Go on a database page at the following link: http://mint.bio.uniroma2.it/mint/search/se...eryType=protein

I have to search for some proteins and extract information.

I know how to do almost everything but I am having a basic problem with some entries.

For those who are willing to help, when you go on that page, type in the following gene name: APLP2

Then click on the search. You get 2 results. What I need to do is to click on the one that has Homo Sapiens in it.

I know how to get there, I need to use the IE function _IELinkClickByText or _IELinkClickByIndex

The problem is both links have the same text so I won't really always get the Homo Sapiens link.

I tried to verify how to do it with index but there is no way for me to identify which index is with which link because it is random. In other words, Homo Sapiens is not always the first link or the second or the third. It could be any of them. So basically, the only way to identify my link before clicking it is by look at the html text next to it and check if it says Homo Sapiens or not. I dont know how to do that? How can I refer to a link by the text that comes after it???

I am here giving you the example for only one entry but I have a bunch of them and for some of them I do get more then one result and so I need a solution that will for all of them please..

If anyone can help me, I'll be more than happy.!

Thanks.

Edited by the123punch

Share this post


Link to post
Share on other sites



I need to do some similar web-crawling on another site for work, so this interests me. The relevant HTML on that page is:

<b><a href="/mint/search/search.do?interactorAc=MINT-1177305&selectedStatus=clean">aplp2</a></b> 
        <b><font color="green">Mus musculus (10090)</font></b>

        <br><b>uniprotkb ac:</b>
        
        <a href="http://www.ebi.uniprot.org/entry/Q06335" target="_blank">Q06335</a>, 
        <!-- <a href="/mint/search/viewer.do?interactorAc=MINT-1177305" target="left">Viewer</a> -->
        <br><b>Amyloid-like protein 2 precursor</b> Aplp2  
        <br><b>domains:</b>
        
            
                <table>
                    <tr>

                        <td bgcolor="#FEFFE3">A4_APP</td><td bgcolor="#FEFFE3"><a href="http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR008155" target="_blank">IPR008155</a></td><td bgcolor="#FEFFE3">A4_extra</td><td bgcolor="#FEFFE3"><a href="http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR008154" target="_blank">IPR008154</a></td>
                    </tr>
                </table>
            
        <table width="100%" bgcolor="#FEFFE3"  border="0" cellspacing="0" cellpadding="0">
        <tr  bgcolor="#669999" >
            <td><img src="/mint/images/mint_blank.gif" /></td>
        </tr>

        </table>
    
        <b><a href="/mint/search/search.do?interactorAc=MINT-1186713&selectedStatus=clean">aplp2</a></b> 
        <b><font color="green">Homo sapiens (9606)</font></b>
        <br><b>uniprotkb ac:</b>

Looks like you could just pull all the <b> tags, find the one with "Homo sapiens" in it, and use the link from the <b> tag just before it. I'll take a crack at it when I have some time...

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Yup, yup, that sounds like a good solution. It might be easier to parse the HTML to get the href for the link instead of trying to reference it in the heirarchy though (then you could use _IENavigate). Just a suggestion.


IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font]

Share this post


Link to post
Share on other sites

$href = FindHSLink($oIE)
MsgBox(0,"","Homo sapiens found: " & @CRLF & $href)

Func FindHSLink($ieObj)

    $oDoc = _IEDocGetObj($ieObj)
    $oBolds = $oDoc.getElementsByTagName("b")

    If $oBolds.length > 0 Then 
        For $i = 1 To ($oBolds.length-1) 
            If StringInStr($oBolds.item($i).innerHTML,"Homo sapiens") Then
                $oBolds.item($i-1).firstChild.style.background = "red"
                Return ($oBolds.item($i-1).firstChild.href)
            EndIf
        Next
    EndIf
    
EndFunc


[font="Fixedsys"][list][*]All of my AutoIt Example Scripts[*]http://saneasylum.com[/list][/font]

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Is this any easier than lod3n's solution? You tell me.

#include <IE.au3>

_IEErrorHandlerRegister () ; survive COM errors with $adjacentText
_IEErrorNotify(False) ; supress COM errors to console

$oIE = _IEAttach("MINT")

$oLinks = _IELinkGetCollection($oIE)

$adjacentText = String($oLink.parentNode.nextSibling.nextSibling.innerText)

For $oLink in $oLinks
    If StringInStr($adjacentText, "Homo sapiens") Then
        ConsoleWrite($oLink.href & @CR) ; show matching link href
        _IEAction($oLink, "click") ; click on it
    EndIf
Next

You really need a DOM analyzer tool to know the parent, child, sibling relationships. I've been having trouble with IE Developer Toolbar and I found another GREAT one called DebugBar -- free for non-commercial use.

Dale

Edit: moved $adjacentText definition out of loop...

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

$href = FindHSLink($oIE)
MsgBox(0,"","Homo sapiens found: " & @CRLF & $href)

Func FindHSLink($ieObj)

    $oDoc = _IEDocGetObj($ieObj)
    $oBolds = $oDoc.getElementsByTagName("b")

    If $oBolds.length > 0 Then 
        For $i = 1 To ($oBolds.length-1) 
            If StringInStr($oBolds.item($i).innerHTML,"Homo sapiens") Then
                $oBolds.item($i-1).firstChild.style.background = "red"
                Return ($oBolds.item($i-1).firstChild.href)
            EndIf
        Next
    EndIf
    
EndFunc

Hi,

Great thanks for all you guys.. This code seems to return the href of the link that I need. Although, I don't know how to click by href.. Is there any way??

Thanks.

Share this post


Link to post
Share on other sites

See my reply entered while you were typing yours...

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

You really need a DOM analyzer tool to know the parent, child, sibling relationships. I've been having trouble with IE Developer Toolbar and I found another GREAT one called DebugBar -- free for non-commercial use.

Dom Inspector

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0