Sign in to follow this  
Followers 0
Jdop

INETGETSOURCE not loading valid web page

15 posts in this topic

Having problems with a particular web site, maybe it's something obvious but I can't figure it out.

This was working for me until recently, could they be doing something weird on the server side?

The page loads perfectly in IE and Firefox, but the built in Autoit functions , all of a sudden return just the html header:

This is what I get :

<html><head><meta http-equiv="Refresh" content="0; URL=http://www.citronresearch.com/"></head><body></body></html>

Here's the simple code for testing.

#include <INet.au3>

$mUrl="http://www.citronresearch.com"

$temp=_INetGetSource($mUrl)

ConsoleWrite( $temp & @CRLF)

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Having problems with a particular web site, maybe it's something obvious but I can't figure it out.

This was working for me until recently, could they be doing something weird on the server side?

The page loads perfectly in IE and Firefox, but the built in Autoit functions , all of a sudden return just the html header:

This is what I get :

<html><head><meta http-equiv="Refresh" content="0; URL=http://www.citronresearch.com/"></head><body></body></html>

Here's the simple code for testing.

#include <INet.au3>

$mUrl="http://www.citronresearch.com"

$temp=_INetGetSource($mUrl)

ConsoleWrite( $temp & @CRLF)

You get that, because that is the only html on the page outside of a javascript.

You need to use the ie.au3 library and use that to read innerhtml or whatever you want to get.

Something like this perhaps :

#include 
#include 

$MainForm = GUICreate("hidden",0,0,0,0)
$oIE = _IECreateEmbedded() ;making an embedded ie window
GUICtrlCreateObj($oIE, 99999, 99999, 0, 0)
GUISetState(@SW_HIDE) ; made a ui thats hidden, so that we can embed the IE windows in it, and not bother anyone.
$mUrl = "www.citronresearch.com"

HotKeySet("{F2}",'_dostuff') ; press f2 on your keyboard to do it
HotKeySet("{f3}","_noloop") ; press f3 to quit the script
while 1
Sleep(100)
WEnd

func _dostuff()
ConsoleWrite($mUrl)
_IENavigate($oIE, $mUrl)

$temp = _IEDocReadHTML($oIE)
ConsoleWrite( $temp & @CRLF)


EndFunc ;=> end of function

func _noloop()
Exit
EndFunc
Edited by TagK

Programming Novice, interested in c++ (i know maybe 1%) AutoIT and many more.Projects : Anime renamer

Share this post


Link to post
Share on other sites

Ok, and having the same problem with http://www.citronresearch.com/feed/ , which is the rss feed . Other sites with similar code work fine.

Isn't INETGETSOURCE supposed to load the entire source for the page?

Share this post


Link to post
Share on other sites

if INETGETSOURCE is not the right call, isn't there a reliable way to retrieve the entire source for the web page into a string variable?

Share this post


Link to post
Share on other sites

Inetgetsource works fine, if the webite does not hide its source within a javascript. For those sites, use my previously posted example.


Programming Novice, interested in c++ (i know maybe 1%) AutoIT and many more.Projects : Anime renamer

Share this post


Link to post
Share on other sites

j&#097;v&#097;script hides source ?

Not exactly, i may have chosen my words badly.

Pages that use javascripts seem to generate the sourcecode as the page loads and the way the inetget function loads the source causes it to fail.

So you can ofc see the code yourself if ypu navigate to the site and view it. But inetgetsource does not.

The ie library does because it emulates a browser and then reads the endresult.


Programming Novice, interested in c++ (i know maybe 1%) AutoIT and many more.Projects : Anime renamer

Share this post


Link to post
Share on other sites

I usually see javascript when I use INet* functions.

maybe it's php you're thinking off, or maybe I'm just wrong.

As far as I could see, the citron page you originally wanted the source from does not use PHP, they use a combination of xml, html and javascripts.

From what I have found out, the inetgetsource bit freaks out when it encounters javascript, but then again. I do not know much about how the function was written.

Long story short, The code it fetches is not the correct one, if you try it on a "pure" html website it works fine. for example http://www.sau.no, should give the result of this :

<html>
<head>
<title>www.sau.no</title>
</head>
<body bgcolor=#FFFFFF>
<center>
<br>
<br>
<font face="Arial,Helvetica">
<font size=+1><b>www.sau.no</b></font>
<br>
<br>
<img src="sau.jpg" width=288 height=211><br><br>
Sauer er dumme dyr.<br>Sauen er mat for bl.a. <a href="http://www.ulv.no/">ulv</a>.
<br>
<br>
<br>
<br>
<font size=0>Domenet er registrert gjennom <a href="http://www.domeneshop.no/">domeneshop.no</a>.</font>
</font>
</center>
</body>
</html>

Programming Novice, interested in c++ (i know maybe 1%) AutoIT and many more.Projects : Anime renamer

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

is this not j&#097;v&#097;script returned here from this page?

#include <String.au3>
#include <INet.au3>




$start = '<script type="text/javascript">'
$end = '</script>'
$mUrl = "http://www.autoitscript.com/forum/topic/144503-inetgetsource-not-loading-valid-web-page/"
$temp = _INetGetSource($mUrl)




$aTemp = _StringBetween($temp, $start, $end)








For $i = 0 To UBound($aTemp) - 1




    ConsoleWrite($start & @LF)




    ConsoleWrite($aTemp[$i] & @CRLF & @CRLF & @CRLF)




    ConsoleWrite($end & @LF & @LF & @LF)




Next
Edited by JohnOne

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

is this not j&#097;v&#097;script returned here from this page?

#include <String.au3>
#include <INet.au3>




$start = '<script type="text/javascript">'
$end = '</script>'
$mUrl = "http://www.autoitscript.com/forum/topic/144503-inetgetsource-not-loading-valid-web-page/"
$temp = _INetGetSource($mUrl)




$aTemp = _StringBetween($temp, $start, $end)








For $i = 0 To UBound($aTemp) - 1




    ConsoleWrite($start & @LF)




    ConsoleWrite($aTemp[$i] & @CRLF & @CRLF & @CRLF)




    ConsoleWrite($end & @LF & @LF & @LF)




Next

I did not try your code, but probably yes.

The only thing I can think of that may cause THAT one to work and not the others is the specific page's webserver configuration. And that is well beyond me. Perhaps a bug or something?

I just tried the function in one my scripts, found it did not do what I wanted, so i exchanged it for something else. if you have found a way now to get it to read javascript, try it on the page you originally wanted to read. See if it works there.

^_^


Programming Novice, interested in c++ (i know maybe 1%) AutoIT and many more.Projects : Anime renamer

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

probably

but better definition of explanation is

<html><head><meta http-equiv="Refresh" content="0; URL=http://www.citronresearch.com/"></head><body></body></html>

so it say refresh the page imideatly

and if you look at http://en.wikipedia.org/wiki/Meta_refresh

youl get that they probably block refresh after first load (on second load) with php or maby with java generated along with it

so INet works fine, problem is that they refresh page on first load ;)

Edited by bogQ

TCP server and client - Learning about TCP servers and clients connection
Au3 oIrrlicht - Irrlicht project
Au3impact - Another 3D DLL game engine for autoit. (3impact 3Drad related)



460px-Thief-4-temp-banner.jpg
There are those that believe that the perfect heist lies in the preparation.
Some say that it’s all in the timing, seizing the right opportunity. Others even say it’s the ability to leave no trace behind, be a ghost.

 

Share this post


Link to post
Share on other sites

Reading the discussion here, I THINK I was able to avoid rewriting already extensive code by loading the troublesome web page TWICE. The second load seems to get all the data

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0