Sign in to follow this  
Followers 0
Rydextillxixdiex

Read a webpage's source code and assign to variable?

17 posts in this topic

I am trying to read a webpage's source code and assign a URL found in the webpage to a variable to be used in my script. The URL is always located in the same location in the source code, and is the URL of an iframe. But the URL changes daily, and I need to parse the new URL to be used. Is it possible to obtain this without constant manual updates? Thanks in advanced.


...will never learn all there is to know about autoit, no worries...i came to the forums :)

Share this post


Link to post
Share on other sites



have you tried looking up the _IE() functions in the help file?


Lofting the cyberwinds on teknoleather wings, I am...The Blue Drache

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Well, I kind-of have it working. I am able to display it in a msg box (as per the help file example), however, when I try to assign it to a variable it isn't accurate.

This works and outputs the correct url to a msgbox:

#include <IE.au3>
$oIE =  _IECreate("mytesturlhere", 0, 1, 1)
$oFrames = _IEFrameGetCollection ($oIE)
$iNumFrames = @extended
For $i = 0 to ($iNumFrames - 1)
    $oFrame = _IEFrameGetCollection ($oIE, $i)
    MsgBox(0, "Frame Info", _IEPropertyGet ($oFrames, "locationurl"))
Next

This, however does not work. I have tried to assign the url to a variable, and then am checking it by displaying it outside the loop and it isn't accurate. It is displaying 'mytesturlhere' and not the iframe url. I don't need it displayed, I need it assigned to a variable.

#include <IE.au3>
$oIE =  _IECreate("mytesturlhere", 0, 1, 1)
$oFrames = _IEFrameGetCollection ($oIE)
$iNumFrames = @extended
For $i = 0 to ($iNumFrames - 1)
    $oFrame = _IEFrameGetCollection ($oIE, $i)
$test = _IEPropertyGet ($oFrames, "locationurl")
Next
MsgBox(0,"Info", $test)

There is only 1 frame in the page if that helps..

Edited by Rydextillxixdiex

...will never learn all there is to know about autoit, no worries...i came to the forums :)

Share this post


Link to post
Share on other sites

I am trying to read a webpage's source code and assign a URL found in the webpage to a variable

The _INetGetSource function, along with several of the string functions such as StringInStr, StringSplit, StringMid, and _StringBetween, may be able to be used for this.

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

I have tooled around with the above suggestions, and am still not having any luck.

#Include <String.au3>
#include <INet.au3>
$s_URL = "myurl.com"
$source = _INetGetSource ( $s_URL)
$string = BinaryToString($source)
$url = _StringBetween($string, "<frame src=", "></frameset>")
MsgBox(0, "out", $url)

If I output "$string" to the messagebox, i.e. the source code in string format, it shows me the source. However, _StringBetween isn't working and is giving me an empty output. Any input is appreciated.

update: I even tried simply using _StringBetween with <html> and </html> to see if it would output everything between the html tags of the page, and it is still giving me a null output. Hmm

Edited by Rydextillxixdiex

...will never learn all there is to know about autoit, no worries...i came to the forums :)

Share this post


Link to post
Share on other sites

Try taking out the BinaryToString line. I don't think you need that. At least, in my script, _StringBetween works as expected without using that function.


- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

Try taking out the BinaryToString line. I don't think you need that. At least, in my script, _StringBetween works as expected without using that function.

While it may be superfluous and unnecessary, it didn't fix my issue unfortunately.


...will never learn all there is to know about autoit, no worries...i came to the forums :)

Share this post


Link to post
Share on other sites

Now I'm thinking the quotes around the frame source target (in the pages' source code) may be the problem. Try this:

$url = _StringBetween($string, '<frame src="', '"></frameset>')

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

Now I'm thinking the quotes around the frame source target (in the pages' source code) may be the problem. Try this:

$url = _StringBetween($string, '<frame src="', '"></frameset>')

I had actually already tried this. My output goes from null to '0'. This doesn't make any sense.


...will never learn all there is to know about autoit, no worries...i came to the forums :)

Share this post


Link to post
Share on other sites

No, it doesn't make sense. I'm perplexed. Would you mind posting the URL here? I'll give it a shot too.


- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

No, it doesn't make sense. I'm perplexed. Would you mind posting the URL here? I'll give it a shot too.

http://omegle.com/, they use iframes as a load balancing measure and I was trying to extract the iframed URL to access the page directly. This should be rather simple, but is proving difficult.


...will never learn all there is to know about autoit, no worries...i came to the forums :)

Share this post


Link to post
Share on other sites

_StringBetween returns an array. I feel like a dumbass for forgetting this. This code works as I think you expect..

#Include <String.au3>
#include <INet.au3>
$s_URL = "http://omegle.com/"
$source = _INetGetSource ($s_URL)
$url = _StringBetween($source, '<frame src="', '"')
MsgBox(0, "out", $url[0])

I also changed the parameters sent to the function so it only searches that line, since the end frameset tag is on the next line. This would probably include a CRLF in the variable, which is probably what you don't want.


- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

_StringBetween returns an array. I feel like a dumbass for forgetting this. This code works as I think you expect..

#Include <String.au3>
#include <INet.au3>
$s_URL = "http://omegle.com/"
$source = _INetGetSource ($s_URL)
$url = _StringBetween($source, '<frame src="', '"')
MsgBox(0, "out", $url[0])

I also changed the parameters sent to the function so it only searches that line, since the end frameset tag is on the next line. This would probably include a CRLF in the variable, which is probably what you don't want.

Ah, how about that. It was working all along just required some simple indexing. Thanks for all of your help!

...will never learn all there is to know about autoit, no worries...i came to the forums :)

Share this post


Link to post
Share on other sites

Ah, how about that. It was working all along just required some simple indexing. Thanks for all of your help!

You bet. Good luck with the rest of your project.

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

Sorry for replying to an old post.

somedcomputerguy, I am trying to use your code, to get some info out of a URL source code.  I tried one site and it was working, tried another and I get a ton a information.  You said you wrote that to only search that line, might that be my problem?

Working to get the correct version:

Func funcGetVLCVersion()
   $sVersionURL = "http://www.videolan.org/vlc/download-windows.html"
   $sVersionsource = _INetGetSource ($sVersionURL)
   $sVersion = _StringBetween($sVersionsource, '<h1>Download latest VLC - ', '</h1>')
   MsgBox(0, "out", $sVersion[0])
EndFunc

Not working, same code different URL and betweens:

Func funcGetFilebotVersion()
   $sVersionURL = "http://www.filebot.net/"
   $sVersionsource = _INetGetSource ($sVersionURL)
   $sVersion = _StringBetween($sVersionsource, 'FileBot_', '-setup.exe')
   MsgBox(0, "out", $sVersion[0])
EndFunc

Thanks for any help!

Edited by esullivan

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

Must be an invalid character, tried different stings and it's working.

EDIT:  Still having issues here and there with some.  Sorry new to AutoIT, what is the [0] after the variable?

Edited by esullivan

Share this post


Link to post
Share on other sites

Sorry new to AutoIT, what is the [0] after the variable?

OK, as a  new user you should rush to the help file ...
_StringBetween > Return Value > Success: > ... guess what ?

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0