Jump to content

Pick URL out of html source


mwpeck
 Share

Recommended Posts

Basically, I need to pick a URL out of the source code of a web page. The URL is always different, but it is always in the src="" section of an iFrame. Through autoit, how can I read the html source and find the URL (inside the src="<url is here>") within the iFrame section. There are multiple iFrames within the page, but the iFrame ID with the URL is always the same.

Link to comment
Share on other sites

Cant link to it (its on an internal company page), but here's an excerpt of the page (had to change parts of the link, but it should still get my question across), in particular the iFrame secion:

<iframe id="mainMenu" name="mainMenu" src="http://google.com/render.app#Nb13TXIyCqJuienk5jafPRlFSdC14N3n73xPQH1XHDIZQnkCzxHbGdC6%2baYZ2oFwzoLygr8t2iwpmqnfnzeXzP6i3pOSi%2ftnZMK9RcxY6kI%3d&installState=1&country=US&lang=en" width="100%" height="3600" scrolling="no" frameborder="0" allowtransparency="true" style="border: none; background:transparent;"></iframe>

Basically the encrypted string changes every time, I need to be able to find that full URL every time the program runs.

Link to comment
Share on other sites

Does the rest of it ever change?

But this should work :mellow:

#Include <String.au3>
#include <array.au3>

$string = '<iframe id="mainMenu" name="mainMenu" src="http://google.com/render.app#Nb13TXIyCqJuienk5jafPRlFSdC14N3n73xPQH1XHDIZQnkCzxHbGdC6%2baYZ2oFwzoLygr8t2iwpmqnfnzeXzP6i3pOSi%2ftnZMK9RcxY6kI%3d&installState=1&country=US&lang=en" width="100%" height="3600" scrolling="no" frameborder="0" allowtransparency="true" style="border: none; background:transparent;"></iframe>'
$aArray1 = _StringBetween($string, '<iframe id="mainMenu" name="mainMenu" src="', '" width', 1)
_ArrayDisplay($aArray1, 'Default Search')
Link to comment
Share on other sites

Thats the problem, the

http://google.com/render.app#Nb13TXIyCqJuienk5jafPRlFSdC14N3n73xPQH1XHDIZQnkCzxHbGdC6%2baYZ2oFwzoLygr8t2iwpmqnfnzeXzP6i3pOSi%2ftnZMK9RcxY6kI%3d&installState=1&country=US&lang=en
does change, none of the rest does, so how do I plug the html source into that? I tried:

$string = _IEDocReadHTML($oIE)

With $oIE being the link to the base web page (the page that contains the iFrame), and it didnt do anything, how would I plug the html source into it?

Link to comment
Share on other sites

Thats the problem, the

http://google.com/render.app#Nb13TXIyCqJuienk5jafPRlFSdC14N3n73xPQH1XHDIZQnkCzxHbGdC6%2baYZ2oFwzoLygr8t2iwpmqnfnzeXzP6i3pOSi%2ftnZMK9RcxY6kI%3d&installState=1&country=US&lang=en
does change, none of the rest does, so how do I plug the html source into that? I tried:

$string = _IEDocReadHTML($oIE)

With $oIE being the link to the base web page (the page that contains the iFrame), and it didnt do anything, how would I plug the html source into it?

Bert has already solved getting the string from it using _StringBetween, so no more worries about that.

if your _IEDocReadHTML is failing, you may want to make sure $oIE is real and is valid. Use $oIE = _IEAttach or $oIE = _IECreate first.

You can also check for any potential errors in SciTE debug window.

Link to comment
Share on other sites

Bert has already solved getting the string from it using _StringBetween, so no more worries about that.

if your _IEDocReadHTML is failing, you may want to make sure $oIE is real and is valid. Use $oIE = _IEAttach or $oIE = _IECreate first.

You can also check for any potential errors in SciTE debug window.

$oIE is completely valid, I have other functions that are called and utilize $oIE and they work fine.

For example:

Func Refresh(ByRef $oIE)
    $oDivs = _IETagNameGetCollection($oIE, "DIV")
    $sSearch = "Refresh"
    
    For $oDiv In $oDivs
        If String(_IEPropertyGet($oDiv, "InnerText")) = $sSearch Then
            _IEAction($oDiv, "click")
            _IELoadWait($oIE)
            ExitLoop
        EndIf
    Next
EndFunc

Works perfectly fine, it looks in the page for a link named Refresh, if it finds it, it clicks it, otherwise nothing happens. I can call that function right after I call $oIE = _IEAttach and because the page I'm using has the refresh button, it finds and clicks it.

However, the following doesnt work:

$string = _IEDocReadHTML($oIE)

$aArray1 = _StringBetween($string, '<iframe id="mainMenu" name="mainMenu" src="', '" width', -1)
_ArrayDisplay($aArray1, 'Default Search')

If I go to the page that $oIE instance is sitting at, I can view the source and find where that particular iframe is, but the autoit script for some reason is unable to find the page, it simply runs and exits straight away. I even tried popping up a message box of $string, and it just said "0", like it couldnt find any source code on the page. But again, at the same time, I can call a function that also uses $oIE right before my $string (or right after) and it works perfectly fine.

Ok well it turns out thanks to the "This function returns the document source after any client-side modifications (e.g. by AutoIt or by client-side Javascript)." part of _IEDocReadHTML, my string was incorrect, after writting the $string = _IEDocReadHTML($oIE) to a text file and finding the link manually, AutoIt was seeing that string as:

<IFRAME id=mainMenu style="BACKGROUND: none transparent scroll repeat 0% 0%; BORDER-TOP-STYLE: none; BORDER-RIGHT-STYLE: none; BORDER-LEFT-STYLE: none; HEIGHT: 3600px; BORDER-BOTTOM-STYLE: none" name=mainMenu src="
instead of what I was seeing through the page source in my browser.

That all said and done, its working fine now, thanks.

Edited by mwpeck
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...