Sign in to follow this  
Followers 0
7121

Search HTML?

11 posts in this topic

#1 ·  Posted (edited)

Does anyone know if autoit is capable of searching a source of a certain website and find a certain link? o_0?

FOR EXAMPLE:

THere are multiple files that have been sorted out to like 1A, 1B, 1C and so on.

However within the source, the link to these sites for the download is given through "</a>"

Such as:

<a href="/download.aspx?objectId=25708" class="level1">1b</a>

basically is there a way for autoit to search for that certain link and retreive.

My basic plan is to just activate the view source window, copy and paste it to some notepad and let the autoit run through the list... but just thought i asked.

OKAY, forget that plan, the main GOAL is to detect links on a website and click on it... anyone? haha

Edited by 7121

Share this post


Link to post
Share on other sites



Does anyone know if autoit is capable of searching a source of a certain website and find a certain link? o_0?

FOR EXAMPLE:

THere are multiple files that have been sorted out to like 1A, 1B, 1C and so on.

However within the source, the link to these sites for the download is given through "</a>"

Such as:

<a href="/download.aspx?objectId=25708" class="level1">1b</a>

basically is there a way for autoit to search for that certain link and retreive.

My basic plan is to just activate the view source window, copy and paste it to some notepad and let the autoit run through the list... but just thought i asked.

You can get the source by using _InetGetSource() How much of the link do you want?

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

EDIT: nvm I suck at using StringRegExp()

Edited by toonboon

[right]~What can I say, I'm a Simplistic person[/right]

Share this post


Link to post
Share on other sites

You can get the source by using _InetGetSource() How much of the link do you want?

LIKE, around 10-30 links each time hahah.

_InGetSource() huh?

i'll have a look at it thanks. :)

Share this post


Link to post
Share on other sites

You may want to take a look at the example in the helpfile for _IELinkGetCollection

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

thanks Dale, i took a look at the _IELinkGetCollection, however, not what i wanted though becuase i'm using Firefox.

I decided to go with the routine of copying a certain chunk in the source where the links are given, the links are then converted to strings and trimmed or sorted until a final link is displayed. it is then inserted into the browser for download.

However, i have a problem. During the sorting process or replacement process to be exact, i can't replace a certain pattern of a string.

For Example:

http://watchview.com/watch.aspx?episodeId=26114" class=volume2>1a</a>

Now, i can remove the </a> but the number after the = sign is hard to deal with.

Does anyone know a way to replace the 26114 so that AutoIt knows that pattern can be any number.

Also, The Number varies from 1000 somethign about about 1000000 something. (4-7) digits to be exact. So how do i put in the pattern match for that 26114, considering the number can extend.... or shorten.

So bottomline is basically this:

you got number 123456.

Those digits can be any number and range from 4 to 7 digits.

Example: 0001, 12353, 1375091, 7654321, and so on....

How do i make a matching pattern so that AutoIt can replace that number with a number i desire.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

$sStr = 'http://watchview.com/watch.aspx?episodeId=26114" class=volume2>1a</a>'

$sStr = StringRegExpReplace($sStr, "(?i)(.+=)\d{4,7}.+</a>", "\1")

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

ZOMG!!!, thank you...

but can you expalin it too haha. I know that whole line right there solved my problem but can you tell me what each Match did? like the (.+=) for example?

or do i have to read the manual :)?

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

ZOMG!!!, thank you...

but can you expalin it too haha. I know that whole line right there solved my problem but can you tell me what each Match did? like the (.+=) for example?

or do i have to read the manual :)?

Reading the manual is never a bad idea anyway.

(/i) Just says that it's not case sensitivw (wouldn't matter here anyway)

(.+=) means catch anything up to and including the "=" sign.

\d{4,7}</a> because it's outside the brackets means to stop matching when it finds 4 to 7 digits followed by </a>

If there was a more specific portion of the string that you wanted then it could all be done in one step. Very likely the whole list can be extracted (as an array) from the block with a single RegExp but I didn't have a sample block of code to work from.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Reading the manual is never a bad idea anyway.

(/i) Just says that it's not case sensitivw (wouldn't matter here anyway)

(.+=) means catch anything up to and including the "=" sign.

\d{4,7}</a> because it's outside the brackets means to stop matching when it finds 4 to 7 digits followed by </a>

If there was a more specific portion of the string that you wanted then it could all be done in one step. Very likely the whole list can be extracted (as an array) from the block with a single RegExp but I didn't have a sample block of code to work from.

THAAANK YOOOU i shall continue to study more it.

I also didn't really have any idea on how to use the maths like (/i) but this example might able to help. THNX

Share this post


Link to post
Share on other sites

THAAANK YOOOU i shall continue to study more it.

I also didn't really have any idea on how to use the maths like (/i) but this example might able to help. THNX

Acually (/i) is a mistake on my part. It should have been (?i)

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0