Jump to content

retreive all the links after seeing a keyword


cwem
 Share

Recommended Posts

A lot of info exists on the html page,

<snippet>

Until I see the following "String Downloads", I know the following links are to "click" and download to local hard drive. Since the page is dynamically generated, we do not know the orders of the tables after "String Downloads", is there any workaround to solve this problem?

<h2>String Downloads</h2>

<table class="offset-table">

<snippet>

<a href="/annotation/genome/tbdb/download/?sp=EASupercontigsFasta&amp;sp=SMT_H37RV_V2&amp;sp=S.zip" title="Download mycobacterium_tuberculosis_h37rv_supercontigs.fasta.zip"><img src="/annotation/genome/assets/images/download.gif" border="0"/></a>

<!--<span jwcid="@Else">—</span> --></td>

<td class="offset-table-content">

<a href="/annotation/genome/tbdb/download/?sp=EAContigsFasta&amp;sp=SMT_H37RV_V2&amp;sp=S.zip" title="Download mycobacterium_tuberculosis_h37rv_contigs.fasta.zip"><img src="/annotation/genome/assets/images/download.gif" border="0"/></a><snippet>

Link to comment
Share on other sites

If you wanna use FireFox and FF.au3, you can try this:

#include <Array.au3>
#include <FF.au3>

If _FFConnect() Then
    $a = _FFXPath("//td[@class='offset-table-content']/a","href",7)
    _ArrayDisplay($a)
EndIf

This works if all tds with links have the class "offset-table-content".

I can switch to FF, but I don't know how to replace _IECreate, _IENavigate etc. to corresponding functions in your library. Furthermore, the above codes store all the file links into an array, do it? But then how can I get them? I prefer downloading them into a local disk.

$oIE = _IECreate ()

_IELoadWait ($oIE)

_IENavigate ($oIE, $murl & $qurl)

Link to comment
Share on other sites

You can get them with InetGet() e.g.:

#include <FF.au3>

If _FFStart("http://www.foo.bar") Then
    $a = _FFXPath("//td[@class='offset-table-content']/a","href",7)
    If Not @error Then
        For $i = 1 To $a[0]
            $src = "http://www.foo.bar" & $a[$i]
            ; saving them with the original filename
            InetGet($src,"DestDir\" & StringMid($a[$i],StringInStr($a[$i],"/",2,-1)+1) )
        Next
    EndIf
EndIf

_IECreate = _FFStart

_IELoadWait = _FFLoadWait

_IENavigate = _FFOpenURL

a complete list is here.

The complete documentation is here (all docs are tomorrow online, again)

Link to comment
Share on other sites

Would you please tell me why the following error occurs? I've already downloaded FF.au3 and included that

_FFConnect ==> Timeout: Can not connect to FireFox/MozRepl on: 127.0.0.1:4242

C:\Test.au3 (12) : ==> Expected a variable in user function call.:

$a = _FFXPath("//td[@class='offset-table-content']/a","href",7)

$a = _FFXPath(^ ERROR

>Exit code: 1 Time: 65.012

Link to comment
Share on other sites

Would you please tell me why the following error occurs? I've already downloaded FF.au3 and included that

_FFConnect ==> Timeout: Can not connect to FireFox/MozRepl on: 127.0.0.1:4242

C:\Test.au3 (12) : ==> Expected a variable in user function call.:

$a = _FFXPath("//td[@class='offset-table-content']/a","href",7)

$a = _FFXPath(^ ERROR

>Exit code: 1 Time: 65.012

Do you have FF.au3 > V0.5.x.x?
Link to comment
Share on other sites

With IE.au3 you could:

If StringInStr(_IEBodyReadHTML($oIE), your-search-string) Then $oLinks = _IELinkGetCollection($oIE)

There would be more efficient ways to do this, but they would require more knowledge of the HTML structure.

Dale

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

With IE.au3 you could:

If StringInStr(_IEBodyReadHTML($oIE), your-search-string) Then $oLinks = _IELinkGetCollection($oIE)

There would be more efficient ways to do this, but they would require more knowledge of the HTML structure.

Dale

I've thought about using that before, but after using "<h2>String Downloads</h2>" as a search-string, how to tell autoit to retrieve the links delimited by one or more tables following this search-string? thx again for your advice~

Link to comment
Share on other sites

Would you please tell me why the following error occurs? I've already downloaded FF.au3 and included that

_FFConnect ==> Timeout: Can not connect to FireFox/MozRepl on: 127.0.0.1:4242

C:\Test.au3 (12) : ==> Expected a variable in user function call.:

$a = _FFXPath("//td[@class='offset-table-content']/a","href",7)

$a = _FFXPath(^ ERROR

>Exit code: 1 Time: 65.012

This errors appears if you have a FF.au3 Version <0.5.0.0
Link to comment
Share on other sites

This errors appears if you have a FF.au3 Version <0.5.0.0

I downloaded the file in the link you specified previously and the FF.au3 content says:

Global Const $_FF_AU3VERSION = "0.4.1.5b-0"

I renamed FF_3.au3 to FF.au3 and put it under Include directory of AutoIt in order to make it "visible" by my program.

Link to comment
Share on other sites

Hmm the current version is here FF.au3.

Thanks for your update but the following error does not disappear:

>"C:\Program Files\AutoIt3\SciTE\..\autoit3.exe" /ErrorStdOut "C:\TEST.au3"

__FFStartProcess: ""C:\Program Files\Mozilla Firefox\firefox.exe" -new-window "http://genome.tbdb.org/annotation/genome/tbdb/MultiDownloads.html" -repl 4242

_FFConnect: OS: WIN_XP WIN32_NT 2600 Service Pack 3

_FFConnect: AutoIt: 3.3.0.0

_FFConnect: FF.au3: 0.5.0.1b-2

_FFConnect: IP: 127.0.0.1

_FFConnect: Port: 4242

_FFConnect: Delay: 2ms

_FFConnect ==> Timeout: Can not connect to FireFox/MozRepl on: 127.0.0.1:4242

__FFSend ==> Socket Error

>Exit code: 0 Time: 65.185

Link to comment
Share on other sites

Did you install MozRepl and started it? If you're using Minefield or Shiretoko you may need to check if the browser is compatible with the MozRepl it's latest version. If it's not you need to use the regular Mozilla Firefox.

To enable and start the MozRepl on Firefox start up it's "Tools->MozRepl->Activate on startup" and also if it's "Tools -> MozRepl -> Start" then it's not running, click on it, if you'll see "Stop" then it's running.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...