drrehak Posted September 13, 2010 Share Posted September 13, 2010 hi, been working on this project for hours now. I'd like to scrape info from this site:My linkthe info I need though is on the page by clicking "Show" under Interactions. It's about halfway down the page. The problem I am having is there are multiple links named "Show". I tried cycling through them with this code:#include <IE.au3> $id = "DB00001" $oIE = _IECreate ("http://www.drugbank.ca/drugs/" & $id) $linkcount = 0 _IELoadWait ($oIE) $sMyString = "show" $oLinks = _IELinkGetCollection($oIE) For $oLink in $oLinks $sLinkText = _IEPropertyGet($oLink, "innerText") If StringInStr($sLinkText, $sMyString) Then $linkcount = $linkcount+1 ;MsgBox(0, "Link Info", $oLink.href) If $linkcount=8 Then _IEAction($oLink, "click") ExitLoop EndIf EndIf Nexthowever, my larger script will be cycling through drugs, and the Interaction link is not always the 8th "Show" link.Is there any way to assign focus to an unknown link that is next to an html tag? Thanks in advance for any help/direction here. -Kevin Link to comment Share on other sites More sharing options...
Tvern Posted September 13, 2010 Share Posted September 13, 2010 I'm not sure how to do it using the _IE functions as I usually evaluate the source with StringRegEx. (it's faster) Because I don't know if you want to actually have the information visible in the browser, or if you are just using it do get to the information you want I made a couple of examples. This code tries to look exaclty like your example. It navigates to the main page for a drug, then to it's Interactions page. (if available) This makes it pretty slow. #include <IE.au3> $oIE = _IECreate ("about:blank") For $i = 1 To 9 $id = "DB" & StringFormat("%05i",$i) _IENavigate($oIE,"http://www.drugbank.ca/drugs/" & $id) $sSource = _IEBodyReadHTML($oIE) $aLink = StringRegExp($sSource,'<TD>Interactions</TD>\s+<TD><A class=link-out href="(.*?)"',1) If @error Then ConsoleWrite("No interactions link available for " & $id & @CRLF) Else _IENavigate($oIE,$aLink[0]) EndIf Next This code navigates directly between Interaction pages, without loading the main pages in between. Slightly faster. #include <IE.au3> $oIE = _IECreate ("about:blank") For $i = 1 To 9 $id = "DB" & StringFormat("%05i",$i) $sSource = InetRead("http://www.drugbank.ca/drugs/" & $id) $sSource = BinaryToString($sSource) $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1) If @error Then ConsoleWrite("No interactions link available for " & $id & @CRLF) Else _IENavigate($oIE,$aLink[0]) EndIf Next This code doesn't open IE, but instead returns an array with direct links to the Interactions pages for the different drugs. #include <array.au3> Local $aInteractions[10][2] For $i = 1 To UBound($aInteractions) -1 $aInteractions[$i][0] = "DB" & StringFormat("%05i",$i) $sSource = InetRead("http://www.drugbank.ca/drugs/" & $aInteractions[$i][0]) $sSource = BinaryToString($sSource) $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1) If @error Then ConsoleWrite("No interactions link available for " & $aInteractions[$i][0] & @CRLF) $aInteractions[$i][1] = "Not Available" Else ConsoleWrite("Link saved for " & $aInteractions[$i][0] & @CRLF) $aInteractions[$i][1] = $aLink[0] EndIf Next _ArrayDisplay($aInteractions) To increase the speed even more you could do simultaneous downloads with INetGet, but that takes a bit more time to throw together, so unless you are interested I'm not going to supply an example. Link to comment Share on other sites More sharing options...
wakillon Posted September 13, 2010 Share Posted September 13, 2010 @TvernGreats Examples !I learn something today...Thanks AutoIt 3.3.14.2 X86 - SciTE 3.6.0 - WIN 8.1 X64 - Other Example Scripts Link to comment Share on other sites More sharing options...
JayFran Posted September 13, 2010 Share Posted September 13, 2010 (edited) I'm not sure how to do it using the _IE functions as I usually evaluate the source with StringRegEx. (it's faster) Because I don't know if you want to actually have the information visible in the browser, or if you are just using it do get to the information you want I made a couple of examples. This code tries to look exaclty like your example. It navigates to the main page for a drug, then to it's Interactions page. (if available) This makes it pretty slow. #include <IE.au3> $oIE = _IECreate ("about:blank") For $i = 1 To 9 $id = "DB" & StringFormat("%05i",$i) _IENavigate($oIE,"http://www.drugbank.ca/drugs/" & $id) $sSource = _IEBodyReadHTML($oIE) $aLink = StringRegExp($sSource,'<TD>Interactions</TD>\s+<TD><A class=link-out href="(.*?)"',1) If @error Then ConsoleWrite("No interactions link available for " & $id & @CRLF) Else _IENavigate($oIE,$aLink[0]) EndIf Next This code navigates directly between Interaction pages, without loading the main pages in between. Slightly faster. #include <IE.au3> $oIE = _IECreate ("about:blank") For $i = 1 To 9 $id = "DB" & StringFormat("%05i",$i) $sSource = InetRead("http://www.drugbank.ca/drugs/" & $id) $sSource = BinaryToString($sSource) $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1) If @error Then ConsoleWrite("No interactions link available for " & $id & @CRLF) Else _IENavigate($oIE,$aLink[0]) EndIf Next This code doesn't open IE, but instead returns an array with direct links to the Interactions pages for the different drugs. #include <array.au3> Local $aInteractions[10][2] For $i = 1 To UBound($aInteractions) -1 $aInteractions[$i][0] = "DB" & StringFormat("%05i",$i) $sSource = InetRead("http://www.drugbank.ca/drugs/" & $aInteractions[$i][0]) $sSource = BinaryToString($sSource) $aLink = StringRegExp($sSource,'<td>Interactions</td>\s+<td><a href="(.*?)"',1) If @error Then ConsoleWrite("No interactions link available for " & $aInteractions[$i][0] & @CRLF) $aInteractions[$i][1] = "Not Available" Else ConsoleWrite("Link saved for " & $aInteractions[$i][0] & @CRLF) $aInteractions[$i][1] = $aLink[0] EndIf Next _ArrayDisplay($aInteractions) To increase the speed even more you could do simultaneous downloads with INetGet, but that takes a bit more time to throw together, so unless you are interested I'm not going to supply an example. Wow! This is a really good example. I believe this will help me with a project I'm working on. One question in the "Else" states what does the brackets mean after the variables? "$aInteractions[$i][1] = $aLinks[0]" If you can can explain that, that would be awesome. Edited September 13, 2010 by JayFran Link to comment Share on other sites More sharing options...
Juvigy Posted September 13, 2010 Share Posted September 13, 2010 Those are Arrays . It is 2 dimensional array . Array [2][2] means 2 rows and 2 column. Link to comment Share on other sites More sharing options...
JayFran Posted September 13, 2010 Share Posted September 13, 2010 Those are Arrays . It is 2 dimensional array . Array [2][2] means 2 rows and 2 column. Thank you... So $aInteractions[$i][1] = $aLinks[0].... that would be whatever is found in for $aInteractions will be stored as an array in $aLinks Link to comment Share on other sites More sharing options...
Juvigy Posted September 13, 2010 Share Posted September 13, 2010 No. The the other way around : $aInteractions[$i][1] = $aLinks[0] means that the value that is hold in the $aLinks array at index 0 will be copied to the value hold at the aInteractions array at index $i (for example 1) ,1 so to give you an excel example if $i=1 then the value of $aLinks array at index 0 will be copied to the column"A" row 1 . Link to comment Share on other sites More sharing options...
drrehak Posted September 13, 2010 Author Share Posted September 13, 2010 you guys rock. thanks so much Jay! haven't had the time to try the examples. but I'm sure this will get me started. I think I will try what ever you say is fastest. I don't need to have the info visible in my browser, thats just the only way I know how to scrape. So I think I will experiment with inetget. I plan on running a script that will step through about 1500 drugs, and add this info to my database. Link to comment Share on other sites More sharing options...
JayFran Posted September 13, 2010 Share Posted September 13, 2010 No. The the other way around : $aInteractions[$i][1] = $aLinks[0] means that the value that is hold in the $aLinks array at index 0 will be copied to the value hold at the aInteractions array at index $i (for example 1) ,1 so to give you an excel example if $i=1 then the value of $aLinks array at index 0 will be copied to the column"A" row 1 . Ahh! I see how that works! Right on! Thank you for the understanding on that. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now