Sign in to follow this  
Followers 0
Mcky

Finally,Mcky's Web Extractor is out!

21 posts in this topic

#1 ·  Posted (edited)

After 1 month of intense testing and debugging,Mcky's Web Extrator is out.

This script extracts http,ftp,https,news,e-mail and file:/// internet links from

text and local HTML documents(file type can be customized). Not only that,it can do batch processing;that is extract links from all text and HTML files

inside a folder. It saves the links to a HTML file which you can customize its appearance easily using the configuration file,WECFG.INI. See the TODO list way down below.

Direct from the help file:

;The first and only links extraction program to saves the links to a HTML file complete with formatting of text and background.

;Extract http,https,news,ftp,e-mail and file links and saves them to a specified HTML file

;Does not mess up the registry. It only keep settings in the configuration file,WECFG.INI which you edit it easily.

;In just a few clicks,you can make a HTML links file from the file you extract the links from.

;Easy to use wizard-like interface,suitable for any user beginner or advanced to use the program

;Supports extraction of an infinite number of links;no limit,just be patient

;Supports folder processing;extract links from all text and HTML files inside a folder

;Supports custom link extraction type,example:gopher,res,aol,etc.

;Supports custom file filter type. Extract links from file type other than HTML and text.(Applies to single file operation)

;Extract links from text and HTML files,removes white spaces and other unnecessary characters including "&quot" and "&nbsp".(See FAQ if links doesn't work.)

;Supports "BaseURL" setting,extract only the base URL of the link,stripping all other unnecessary characters.

;Resulted HTML page can be fully customized in the configuration file,WECFG.INI

;Resulted HTML file can be imported into any HTML editor/WYSIWYG HTML editor for further customization. The resultant HTML page's code is clean and easy to understand.

;Gives clear detailed report at end of processing;can save the report to a log file

;ESC stops the program immediately,even while processing data.

;Nicely layout HTML-based help file

Right click on the link and select "Save Target As":

Download the script,zipped!

Take a look at the demo folder on how the links are stored.

Please post your comments!!

TODO List:

Improve processing speed

Folder recursing(extract files in a folder within a folder)

GUI

Cleaner,leaner,easier to understand code

Remove duplicate links

Link alphabet sorting

Edited by Mcky

My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote]

Share this post


Link to post
Share on other sites



Bad zip file . just put the code here .

Share this post


Link to post
Share on other sites

This file is hosted by Tripod, a Lycos®Network Site, and is not available for download.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Has everyone here gone batty? The message SlimShady got is EXACTLY why Mcky put in instructions like

Right click on the link and select "Save Target As":

(kinda duh...) Edited by this-is-me

Who else would I be?

Share this post


Link to post
Share on other sites

Summary:

Lycos webhosting sucks.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Has everyone here gone batty? The message SlimShady got is EXACTLY why Mcky put in instructions

That's what I did; but when I try to open the 10.7 KB zip file, it appears corrupted....

EDIT: See this post if Lycos webhosting doesn't suffice

Edited by CyberSlug

Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!

Share this post


Link to post
Share on other sites

In internet explorer,right click on the link.

Select "Save Target As". Then you can save the zip file.(85KB)

This is tripod's remote loading service limitation.


My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote]

Share this post


Link to post
Share on other sites
Bad zip

Share this post


Link to post
Share on other sites

The file is 79Kb here and I can edit webextract.au3.

Bad zip

Tip: Clear your IE cache and download again.

Share this post


Link to post
Share on other sites

Got it .

Share this post


Link to post
Share on other sites

Tip: Clear your IE cache and download again.

Looks to somehow be a issue with Firefox, I tried from IE and it downloaded the proper zip file.

I suggest the "IE View" Extension for Firefox for such situations.


"I'm not even supposed to be here today!" -Dante (Hicks)

Share this post


Link to post
Share on other sites

Looks to somehow be a issue with Firefox, I tried from IE and it downloaded the proper zip file.

Worked for me under Firefox. I click it, the link opens in a new window, and it asks me where I want to save my zip...

Using Firefox 0.9.2


[font="Optima"]"Standing in the rain, twisted and insane, we are holding onto nothing.Feeling every breath, holding no regrets, we're still looking out for something."[/font]Note: my projects are off-line until I can spend more time to make them compatable with syntax changes.

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

Well,post some comments on the script.

I will continue improving Web Extractor if i got enough comments,suggestions,bug reports. Thanks in advance!

Btw,if you want to extract only the base link,open the configuration file,WECFG.INI,edit BaseURL=1

Also,try this:

Open the script,choose to extract links from an internet webpage. Enter the address:

http://www.mnsi.net/~jhlavac/nps/

It will extract over 400 links in 4-6 seconds.

New download location(For people who can't download it from tripod server)

http://www.autoitscript.com/fileman/users/public/Mcky/webext_src.zip

Edited by Mcky

My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote]

Share this post


Link to post
Share on other sites

Zip file is fine

did you right mouse click on it as he suggested?

Rick

See above post that says "got it" .

Share this post


Link to post
Share on other sites

comments on the actual script is appreciable. Btw,the new new download link is at:

http://www.autoitscript.com/fileman/users/public/Mcky/webext_src.zip


My website: (Lots of AutoIt compiled programs+GameMaker games)http://mcky.sitesled.comMy AutoIt projects:Mcky's CalEntry - Calendar schedulingMcky's Web Extractor - Web page links extractorMcky's Appkey- Powerful Hotkey-listing tool[quote]I wish I was never born. I am just a lonely soul in this world... :([/quote]

Share this post


Link to post
Share on other sites

comments on the actual script is appreciable. Btw,the new new download link is at:

http://www.autoitscript.com/fileman/users/public/Mcky/webext_src.zip

ok, here's one... I'm not sure exactly what I'd use this for, what kinds of pages would you parse with it to make the output valuable to you? I thought about news pages, but without the surrounding context, the links are fairly useless...

educate me :ph34r:


"I'm not even supposed to be here today!" -Dante (Hicks)

Share this post


Link to post
Share on other sites

Hi!

At me it does only etract one link per file line.

Please fix that.

btw. is it possible to generate a plain text file with the links in it?

peehtebee


vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvGerman Forums: http://www.autoit.deGerman Help File: http://autoit.de/hilfe vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Share this post


Link to post
Share on other sites

Nice. Yea you do need fix the not being able to extract multiple links in 1 line of code. I used this code and cut it down alot and used it to extract Myspace Usernames/IDs...Then make an auto-friend adder. I did the same for a few multiple sites. You should have two filters for the "custom filters" one for the start, one for the end.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0