Sign in to follow this  
Followers 0
jksmurf

Difference between way wget saves html and FileWrite does?

6 posts in this topic

#1 ·  Posted (edited)

Is there a difference between the way wget saves html and FileWrite does it?

I'm having an odd propblem with the way FileWrite writes html files, in that a program (TVxb) I then use to parse the data in those files, does not recognise them and goes on to try to download them itself, rather than using the cached file prouced by thr script.

TVxb uses wget (as below) to download the files (there is a separate issue with TVxb not doing javascript which is why I am using autoit to download the files).

filewrite("C:\Users\MyName\Desktop\SetantaCache\TVxb-setanta.hk-" & stringreplace($savedate,"/","") & ".html",$sHTML)

Here is what wget does in TVxb:

wget -E -t 5 --header="Accept-Language: en-us,en;q=0.5" http://www.setanta.com/hongkong/TV-Listings/

I attach html's produced by both the script and TVxb.

Thanks!

k.

htmls_from_two_versions.zip

Edited by jksmurf

Share this post


Link to post
Share on other sites



Apologies for the bump; anyone?

Share this post


Link to post
Share on other sites

Apologies for the bump; anyone?

The Cable provider pulled the EPG, so I now really need to get thsi working off the "Cable Provider" Provider website.

Would anyone be able to shed some light on the difference between the way wget saves html and FileWrite does it?

Thanks

k.

Share this post


Link to post
Share on other sites

The Cable provider pulled the EPG, so I now really need to get thsi working off the "Cable Provider" Provider website.

Would anyone be able to shed some light on the difference between the way wget saves html and FileWrite does it?

Thanks

k.

I have NetZero and my intuition and experience have it that it isn't worth the trouble your trying to hack your way out of getting all the spam that comes with certain ISP and cable service providers. Look at the situation from their perspective for a moment -- they have a budget and they have a way to prevent hackers from providing ways around the companies anti-hacking regimen -- these companies have certain expectations about their profit and to stay in the business of giving people like you and me a chance to afford any cable or internet at all, they need prove to the banks that help them out with financing, what the budget will be.

But as for your code, we would have to see the entire of the AutoIt script you are running; and from just what programming language is "wget" anyway? It looks a little familiar, but I can't remember...

Share this post


Link to post
Share on other sites

Well I'm not really trying to get past any spam. I'm just trying to setup a script which will help me download a series of webpages for a TV EPG, that can processed by TVxB, a "scraper" which uses wget for use in my Software based PVR (nPVR). Unfortunately the site I am trying to scrape uses Javascript, so the wget doesn't work.

My script

1. Loads http://www.setanta.com/HongKong/TV-Listings/ which loads today's EPG.

2. Saves that web page to a local dir in the format TVxb-Setanta.hk-20110215.html so that TVxB can parse it.

3. Clicks the NEXT date which uses Javascript in the form javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl02$btnDay','') to load the next days page.

4. Save that web page to a local dir in the fromat TVxb-Setanta.hk-20110216.html so that TVxB can parse it.

5. and so on.

The script itself is attached. Glad for any help at all.

k.

SetantapBSwithUpdatePost19Mod4.zip

Share this post


Link to post
Share on other sites

:unsure: I think that you should contact the vendor(s) of the (non-AutoIt) software(s) you are using in order to find out more about those other technologies.

But good luck.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0