Urldownloadtofile

mcfr1es · August 9, 2004

Forgive my noobness....

Hypothetically speaking, lets say a webpage contains alot of text that is useless but is also scattered with usernames (i.e forums). Using URLDownloadtofile, is there any way to save only the usernames listed on that page onto a text file?

BTW, when I view source i see that each username is cotained within these tags...

<span class="blistSmall">"username"</span>

and also contained within this tag...

is this info of any use to me or do i need to attain these usernames by other means :ph34r:

Edited August 9, 2004 by mcfr1es

Bartokv · August 10, 2004

Yes there should be several ways to extract the user names into a seperate file...

Unfortunately most pages that are automatically generated, don't utilize pretty printing for their source HTML code. (I've seen some web pages that have everything crammed into a few massive lines of code - very ugly)

If you're lucky, and the source uses line feeds after each line of code, then you could make a function using a simple while loop and a few calls to FileReadLine and StringInStr.

However, if you're unlucky enough to get one of the ugly source pages then you'll have to do more work:

1) You could either cheat and call DOS's find function (Probably the easier option if you're short on time)

2) Or build your own native AutoIt routine to search each byte in the file looking for the desired tags.

ie: Search until you find the '<' character. Do a quick check to ensure that the next character is not a slash '/' (close tag designation). If it's an opening tag, then save the current position, and scan ahead to the end of tag marker '>'. Use StringInStr to see if the tag that you found is the one that you're looking for (<span class="blistSmall">) and then copy the characters between the end of the open tag statement ('>') and the beginning of its closing tag statement. ('</')

I know this may sound a little confusing, but I hope that it makes at least a little sense. I would normally provide an example, but I don't have the time at the moment. :ph34r:

Hope this helps!

mcfr1es · August 10, 2004

Thank you bartokv your help is much appreciated...

if anyone has anything else to add (especially an example) feel free

I am currently at the point where i scan for "<" but i do not know how to check the text infront of the pointed bracket for a slash or to do any of the rest to tell you the truth :ph34r:

Edited August 10, 2004 by mcfr1es

trids · August 10, 2004

You could use this routine .. with one slight modifcation: edit the three lines in the section marked ;Break it into chewable bytes.

Just use the token you identified, <span class="blistSmall">, instead of the href= in the routine.

Play around with it :ph34r:

Sign In

Urldownloadtofile

Recommended Posts

mcfr1es

Link to comment

Share on other sites

Bartokv

Link to comment

Share on other sites

mcfr1es

Link to comment

Share on other sites

trids

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta