Sign in to follow this  
Followers 0
finet

download files from website without having their url

13 posts in this topic

Is there a way to download all files from a website directory?

E.g

http://www.site.com/docs/1.htm

http://www.site.com/docs/2.htm

http://www.site.com/docs/3.htm

http://www.site.com/docs/x.htm

I would like to download all the files in http://www.site.com/docs/ without knowing how many files there are and without knowing their name.

(I have a file with URLS in it. Via _FileReadToArray and InetGet I can download them all. The problem with that is that I have to know the URL of each file)

Thanks,

Dirk

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Is there a way to download all files from a website directory?

E.g

http://www.site.com/docs/1.htm

http://www.site.com/docs/2.htm

http://www.site.com/docs/3.htm

http://www.site.com/docs/x.htm

I would like to download all the files in http://www.site.com/docs/ without knowing how many files there are and without knowing their name.

(I have a file with URLS in it. Via _FileReadToArray and InetGet I can download them all. The problem with that is that I have to know the URL of each file)

Thanks,

Dirk

If you made them, wouldn't you know them? :lmao:

Edit:

On another note, I do believe someone did this in FTP.

http://www.autoitscript.com/forum/index.ph...st&p=124032

Edited by SmOke_N

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

If you made them, wouldn't you know them? :lmao:

Edit:

On another note, I do believe someone did this in FTP.

http://www.autoitscript.com/forum/index.ph...st&p=124032

==================================================

The problem is making such an url file and keeping it uptodate.

Thanks for your idea on ftp, but I am no programmer, new to AutoIt and don't know anything of ftp.

It would be nice to have something like

InetGet("http://www.site.com/docs/*.*",...

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

If you enter "http://www.site.com/docs/" in a web browser does it give you a list of the files, or does it redirect you to the index page?

Edited by big_daddy

Share this post


Link to post
Share on other sites

If you enter "http://www.site.com/docs/" in a web browser does it give you a list of the files, or does it redirect you to the index page?

Neither of both.

It gives: Page not found. Error 404

Share this post


Link to post
Share on other sites

Neither of both.

It gives: Page not found. Error 404

The only other option I can think of is using a loop, but this will only work if the files are as your example above.

$sURL = "http://www.site.com/docs/"
For $i = 1 To 10
    InetGet($sURL & $i & ".htm", @ScriptDir & "\" & $i & ".htm")
    If @error Then ExitLoop
Next

Share this post


Link to post
Share on other sites

It seems it would be difficult, considering you can't get a listing of contents. However, if the links to each file is included *somewhere* on the website, then it can be done I guess...

1) Direct your script to go to the search webpage for this site (for example).

2) Get your script to perform a search on this page, to return all results with the required name.

3) Loop through each return for URL including the name.

It really depends on how the server has been set up to deliver content.


Please correct me if I am wrong in any of my posts. I like learning from my mistakes too.

Share this post


Link to post
Share on other sites

The only other option I can think of is using a loop, but this will only work if the files are as your example above.

$sURL = "http://www.site.com/docs/"
For $i = 1 To 10
    InetGet($sURL & $i & ".htm", @ScriptDir & "\" & $i & ".htm")
    If @error Then ExitLoop
Next
Thanks for your fast reply! But... I get one doc in the script directory: "1.htm" or "1.html"

BTW the real adress is where all the files are listed in a webpage is (Dutch language site)

http://www2.vlaanderen.be/ned/sites/ruimte...besluiten2.html

The files to download are

ambtenarenmb.html

ambtenarennamen.html

gemachtigdeambtenaar.html

kbgewestplan.html

kleinewerken.html

merplicht.html

normatieve.html

voetgangersverkeer.html

etcetera....

Greetings,

Dirk

Share this post


Link to post
Share on other sites

Thanks for your fast reply! But... I get one doc in the script directory: "1.htm" or "1.html"

BTW the real adress is where all the files are listed in a webpage is (Dutch language site)

http://www2.vlaanderen.be/ned/sites/ruimte...besluiten2.html

The files to download are

ambtenarenmb.html

ambtenarennamen.html

gemachtigdeambtenaar.html

kbgewestplan.html

kleinewerken.html

merplicht.html

normatieve.html

voetgangersverkeer.html

etcetera....

Greetings,

Dirk

The administrator of the website can set Apache (or whatever web server service is used) to allow browsing the files. It appears the admin chose not to do so in this case. If you'd like that changed, contact the admin.

:lmao:


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

The administrator of the website can set Apache (or whatever web server service is used) to allow browsing the files. It appears the admin chose not to do so in this case. If you'd like that changed, contact the admin.

:lmao:

This has nothing to do with AutoIt, but if you want to retrieve the files, check out FlashGet http://www.flashget.com

It is a very nice, free download manager.

Specifically what you want is the "Site Explorer" in the Tools menu.

Trust me, check it out.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

This has nothing to do with AutoIt, but if you want to retrieve the files, check out FlashGet http://www.flashget.com

It is a very nice, free download manager.

Specifically what you want is the "Site Explorer" in the Tools menu.

Trust me, check it out.

Dale

That link is a 404 for me. Perhaps you meant FlashGot for FireFox?

Still, that's not a security hack. If the web site admin doesn't allow it, I don't think you can list directory files. FlashGot follows the published links on the page(s), it doesn't search directories and list them if the admin didn't permit that already.

:lmao:

Edit: Aha, found reference to FlashGet in Wikipedia.

Edited by PsaltyDS

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

That link is a 404 for me. Perhaps you meant FlashGot for FireFox?

Still, that's not a security hack. If the web site admin doesn't allow it, I don't think you can list directory files. FlashGot follows the published links on the page(s), it doesn't search directories and list them if the admin didn't permit that already.

:lmao:

Edit: Aha, found reference to FlashGet in Wikipedia.

The link works fine for me. Correct - it is not bypassing the security, but they are using some sort of crawling technique to find what is on the site and in the folder.

And as I said, FlashGet, not FlashGot.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Thank you all for your kind assistance and advice!

Dirk

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0