Sign in to follow this  
Followers 0
kylomas

jksmurf tvxb processor

2 posts in this topic

jksmurf,

here's the code we have been talking about. Read the comments section before running!

kylomas

#cs

jksmurf,

This ran at 22:30 and produced valid output.  Hpoefully it will do the same for you.  After the last several hours
I am convinced that these web pages are possessed.

I spent about three hours looking for how other people do this but cannot find anyhting that I understand.  I'm sure that the
answer is there, just over my head.

One interesting thing I found was in IE8 under TOOLS|DEVELOPER TOOLS I could examine the web page complete with current data.

Things you need to be aware of:

    1 - I write the files to a work drive.  You will probably want to change that.
    2 - I found a way to embed the date from the web page in the file name.  I used this to match the result to the web page
        for verification.
    3 - I changed most of "savetbvxhtml".  Especially the way files are written.
    4 - "proc_files" gets all files from a dir on the work drive that I mentioned.
    5 - The final result is a 2X array.  You can process it with a simple row/col nested loop.  For an example, look at how
        the array is constructed in "proc_files".

Manadar and daleholm responded to your origional post, albeit vaguely.  I am REALLY interested in thier opinion and am going
to open another thread asking why the sleep works and ieloadwait does not.

The parser is a very rudimentary post processor, lot of room for improvement.

10:55PM - ran it again, the fucker still works!!!

I am going to be unavailable starting next week for the following several months except for short periods every couple days.
If you find anything interesting or have any problems that I can help with just PM me.

I'm sure you're aware of this but you can leave blank lines in your script without putting semi-colons all over the place.

kylomas

11:04 - still works, GOOD NIGHT!

#ce

;#include <Date.au3>
#include <IE.au3>
#include <array.au3>

global $file,$files,$aTV[30][15],$aTMP,$str2,$str1,$fhnd,$savedate

_IEErrorHandlerRegister()

$oIE = _IECreate()

_IENavigate($oIE, "http://www.setanta.com/HongKong/TV-Listings/")
sleep(3000)
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl00$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl01$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl02$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl03$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl04$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl05$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl06$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$btnNextWeek','')",0)
sleep(3000)
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl00$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl01$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl02$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl03$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl04$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl05$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()
_IENavigate($oIE, "javascript:__doPostBack('ctl00$cphForm$AllCols$tvlHeader$rptDays$ctl06$btnDay','')",0)
sleep(3000)
SaveTVxBhtml()

_IEQuit($oie)

proc_files()

_arraydisplay($atv)

Func SaveTVxBhtml()
    $sHTML    = _IEbodyReadHTML($oIE)
    $savedate = stringregexp($shtml, 'class=selected><A(.+?)'', ''Tab-Date',3)
    local $fl = fileopen("d:\tvbtest\test-" & stringright($savedate[0],6),2)
    if $fl    = -1 then msgbox(0,'','Fileopen failed for file = ' & "d:\tvbtest\test-" & StringReplace($savedate, "/", ""))
    FileWrite($fl, $sHTML)
    fileclose($fl)
EndFunc

func proc_files()

    local $i = 0

    $files = FileFindFirstFile('d:\tvbtest\*.*')
    switch @error
        case -1
            msgbox(0,'','No Matching Folders / Directories')
            Exit
        case 1
            msgbox(0,'','Folder is empty')
            Exit
    endswitch

    while 1

        $file = FileFindNextFile($files)
        if @error then exitloop

        $fhnd = fileopen('d:\tvbtest\' & $file)
        if $fhnd = -1 then msgbox(0,'','Error opening - ' & $file)
        $str2 = stripall(fileread($fhnd))
        fileclose($fhnd)

        $str2 = StringRegExpReplace($str2,'`',' ')

        $aTV[0][$i] = stringright($file,6)

        $aTMP = stringsplit($str2,@crlf,1)

        local $row = 1

        for $j = 1 to $aTMP[0]
            if stringinstr(stringleft($aTMP[$j],5),':') > 0 then
                $aTV[$row][$i] = $aTMP[$j]
                if $atmp[$j+1] = '' then
                    $aTV[$row+1][$i] = $aTMP[$j+2]
                Else
                    $aTV[$row+1][$i] = $aTMP[$j+1]
                endif
                $row += 1
            endif
        Next

        $i += 1

    wend

endfunc

func STRIPALL($str,$debug = 0)

    local $file,$arr10,$out10 = '',$sport = '',$dte = '',$ep,$i

    $str = StringRegExpreplace($str, '(?i)(?s)<script.*?</script>', "")
    $str = StringRegExpreplace($str, '[\r\n\t]', "")
    $str = StringRegExpreplace($str, '(?i)(?s)<div', "<div>" & @crlf & "</div><div")
    $str = StringRegExpreplace($str, '(?i)(?s)<tr', "<tr>" & @crlf & "</tr><tr")
    $str = StringRegExpreplace($str, '(?i)(?s)<li', "<li>" & @crlf & "</li><li")

    if $debug then
        $file = fileopen("d:\sd\temp\test10.txt", 2)
        filewrite($file, $str)
        FileClose($file)
    endif

    $arr10 = stringregexp($str,'(?s)>([^<].*?)<',3)

    $str   = _ArrayToString($arr10,"`")

    return $str

EndFunc

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites



Thanks kylomas will give it a try. I can't fathom why it sometimes produces duplicates and sometimes not, hopefully Dale or a similar expert throw some light on it.

K

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0