the123punch Posted July 26, 2007 Share Posted July 26, 2007 (edited) Hi guys,I need to do the following:Go on a database page at the following link: http://mint.bio.uniroma2.it/mint/search/se...eryType=proteinI have to search for some proteins and extract information.I know how to do almost everything but I am having a basic problem with some entries. For those who are willing to help, when you go on that page, type in the following gene name: APLP2Then click on the search. You get 2 results. What I need to do is to click on the one that has Homo Sapiens in it.I know how to get there, I need to use the IE function _IELinkClickByText or _IELinkClickByIndexThe problem is both links have the same text so I won't really always get the Homo Sapiens link.I tried to verify how to do it with index but there is no way for me to identify which index is with which link because it is random. In other words, Homo Sapiens is not always the first link or the second or the third. It could be any of them. So basically, the only way to identify my link before clicking it is by look at the html text next to it and check if it says Homo Sapiens or not. I dont know how to do that? How can I refer to a link by the text that comes after it???I am here giving you the example for only one entry but I have a bunch of them and for some of them I do get more then one result and so I need a solution that will for all of them please..If anyone can help me, I'll be more than happy.!Thanks. Edited July 26, 2007 by the123punch Link to comment Share on other sites More sharing options...
PsaltyDS Posted July 26, 2007 Share Posted July 26, 2007 I need to do some similar web-crawling on another site for work, so this interests me. The relevant HTML on that page is: <b><a href="/mint/search/search.do?interactorAc=MINT-1177305&selectedStatus=clean">aplp2</a></b> <b><font color="green">Mus musculus (10090)</font></b> <br><b>uniprotkb ac:</b> <a href="http://www.ebi.uniprot.org/entry/Q06335" target="_blank">Q06335</a>, <!-- <a href="/mint/search/viewer.do?interactorAc=MINT-1177305" target="left">Viewer</a> --> <br><b>Amyloid-like protein 2 precursor</b> Aplp2 <br><b>domains:</b> <table> <tr> <td bgcolor="#FEFFE3">A4_APP</td><td bgcolor="#FEFFE3"><a href="http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR008155" target="_blank">IPR008155</a></td><td bgcolor="#FEFFE3">A4_extra</td><td bgcolor="#FEFFE3"><a href="http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR008154" target="_blank">IPR008154</a></td> </tr> </table> <table width="100%" bgcolor="#FEFFE3" border="0" cellspacing="0" cellpadding="0"> <tr bgcolor="#669999" > <td><img src="/mint/images/mint_blank.gif" /></td> </tr> </table> <b><a href="/mint/search/search.do?interactorAc=MINT-1186713&selectedStatus=clean">aplp2</a></b> <b><font color="green">Homo sapiens (9606)</font></b> <br><b>uniprotkb ac:</b> Looks like you could just pull all the <b> tags, find the one with "Homo sapiens" in it, and use the link from the <b> tag just before it. I'll take a crack at it when I have some time... Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
mikehunt114 Posted July 26, 2007 Share Posted July 26, 2007 Yup, yup, that sounds like a good solution. It might be easier to parse the HTML to get the href for the link instead of trying to reference it in the heirarchy though (then you could use _IENavigate). Just a suggestion. IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font] Link to comment Share on other sites More sharing options...
lod3n Posted July 26, 2007 Share Posted July 26, 2007 $href = FindHSLink($oIE) MsgBox(0,"","Homo sapiens found: " & @CRLF & $href) Func FindHSLink($ieObj) $oDoc = _IEDocGetObj($ieObj) $oBolds = $oDoc.getElementsByTagName("b") If $oBolds.length > 0 Then For $i = 1 To ($oBolds.length-1) If StringInStr($oBolds.item($i).innerHTML,"Homo sapiens") Then $oBolds.item($i-1).firstChild.style.background = "red" Return ($oBolds.item($i-1).firstChild.href) EndIf Next EndIf EndFunc [font="Fixedsys"][list][*]All of my AutoIt Example Scripts[*]http://saneasylum.com[/list][/font] Link to comment Share on other sites More sharing options...
DaleHohm Posted July 26, 2007 Share Posted July 26, 2007 (edited) Is this any easier than lod3n's solution? You tell me.#include <IE.au3> _IEErrorHandlerRegister () ; survive COM errors with $adjacentText _IEErrorNotify(False) ; supress COM errors to console $oIE = _IEAttach("MINT") $oLinks = _IELinkGetCollection($oIE) $adjacentText = String($oLink.parentNode.nextSibling.nextSibling.innerText) For $oLink in $oLinks If StringInStr($adjacentText, "Homo sapiens") Then ConsoleWrite($oLink.href & @CR) ; show matching link href _IEAction($oLink, "click") ; click on it EndIf NextYou really need a DOM analyzer tool to know the parent, child, sibling relationships. I've been having trouble with IE Developer Toolbar and I found another GREAT one called DebugBar -- free for non-commercial use.DaleEdit: moved $adjacentText definition out of loop... Edited July 27, 2007 by DaleHohm Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
the123punch Posted July 26, 2007 Author Share Posted July 26, 2007 $href = FindHSLink($oIE) MsgBox(0,"","Homo sapiens found: " & @CRLF & $href) Func FindHSLink($ieObj) $oDoc = _IEDocGetObj($ieObj) $oBolds = $oDoc.getElementsByTagName("b") If $oBolds.length > 0 Then For $i = 1 To ($oBolds.length-1) If StringInStr($oBolds.item($i).innerHTML,"Homo sapiens") Then $oBolds.item($i-1).firstChild.style.background = "red" Return ($oBolds.item($i-1).firstChild.href) EndIf Next EndIf EndFunc Hi, Great thanks for all you guys.. This code seems to return the href of the link that I need. Although, I don't know how to click by href.. Is there any way?? Thanks. Link to comment Share on other sites More sharing options...
DaleHohm Posted July 26, 2007 Share Posted July 26, 2007 See my reply entered while you were typing yours... Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
Will66 Posted July 27, 2007 Share Posted July 27, 2007 You really need a DOM analyzer tool to know the parent, child, sibling relationships. I've been having trouble with IE Developer Toolbar and I found another GREAT one called DebugBar -- free for non-commercial use.Dom Inspector Link to comment Share on other sites More sharing options...
lod3n Posted July 27, 2007 Share Posted July 27, 2007 or: $href = FindHSLink($oIE) _IENavigate($oIE,$href) [font="Fixedsys"][list][*]All of my AutoIt Example Scripts[*]http://saneasylum.com[/list][/font] Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now