Jump to content
Sign in to follow this  
regisma

Cannot extract

Recommended Posts

regisma

Hi,

I have a lot of difficulties to extract the TV schedule of this Webpage with AutoIt V3.

hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en

Mainly my goal is to extract the info so I can then program my PVR.

thanks

Regis

Share this post


Link to post
Share on other sites
Uten

Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work?

I usually use

#include <array.au3>
#include <INet.au3>
;Obviously hxxp is not a valide protocol.
Local $url = 'hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en'
;Get the source
Local $data = _InetGetSource($url)
;Regular expression to extract data
Local $regexp = ".*" ;This will grab everything as I don't know what to extract
Local $arr = StringRegExp($data, $regexp, 3)
_ArrayDisplay($arr, "Title")
Edited by Uten

Share this post


Link to post
Share on other sites
Thatsgreat2345

Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work?

I usually use

#inlcude <array.au3>
;Obviously hxxp is not a valide protocol.
Local $url = hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en
;Get the source
Local $data = _InetGetSource($url)
;Regular expression to extract data
Local $regexp = ".*" ;This will grab everything as I don't know what to extract
Local $arr = StringRegExp($data, $regexp, 3)
_ArrayDisplay($arr, "Title")
id fix those errors such as , misspelling include, not including Inet.au3 , not http , no quotes :whistle: just the little things

Share this post


Link to post
Share on other sites
Uten

Yeah, thanks. AutoIt don't compute well on Linux so I wrote it from the top of my head.:whistle:

And hxxp is from TP and commented in the code. If you really pay attention to the task at hand.;)

Edited by Uten

Share this post


Link to post
Share on other sites
regisma

Hi,

I did not post a "Live Link" because in general in most forums it is forbidden.

The information that I'm looking for is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program.

I have extracted by hand the source code of the page so you can have an idea on how to extract it.

thanks

Reg

<tr>

<td bgcolor="#E5F2F8" width="4%" align="center">362</td>

<td bgcolor="#EDF6F9" width="6%" align="center">Vu12</td>

<td bgcolor="#FFFFFF" width="9%" align="center"><img src="images/thumb/vu12.gif" border=0></td>

<td bgcolor="#FFFFFF" valign="top" colspan="6" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<img src="images/arrows_earlier.gif">

<img src="images/icon_ppv.gif">

<!-- check javascript for popup -->

<a href="java script:popupwinPPV('9034349','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a>

</td>

<td bgcolor="#FFFFFF" valign="top" colspan="3" onmouseover="window.status='FREE Pre-Vu! The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<a href="java script:popupwin('9034350','EST');" class="ipg_gray" title="FREE Pre-Vu! The Sentinel (All Day)">FREE Pre-Vu! Th ..</a>

</td>

<td bgcolor="#FFFFFF" valign="top" colspan="15" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<img src="images/icon_ppv.gif">

<!-- check javascript for popup -->

<a href="java script:popupwinPPV('9034351','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a>

<img src="images/arrows_later.gif">

</td>

</tr>

Edited by regisma

Share this post


Link to post
Share on other sites
Uten

You could try somthing like

$regexp='title="([\w\s]+)"'
In the code I provided above. I'm sorry I can't test the code at the moment so take it as a suggestion :whistle:

Share this post


Link to post
Share on other sites
regisma

Hi,

Your RegExp will only extract the Tv program description.

I have tried the following using _IETableGetCollection and _IETableWriteToArray but it is just return me 23 tables and none contain the information that I'm looking for which is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program.

here is the sample code that I have tried:

#include <IE.au3>

Opt("WinTitleMatchMode", 2)

$Title = "- Microsoft Internet Explorer"

$hwnd = WinGetHandle($Title)

$oIE = _IEAttach ($hwnd, "HWND")

$oTable = _IETableGetCollection ($oIE, 0)

$iNumTables = @extended

$o_TableData = _IETableWriteToArray($oTable)

ConsoleWrite("Number of Tables " & $iNumTables & @CRLF)

For $i = 0 To UBound($o_TableData) - 1

If $o_TableData[0][$i] == 0 Then ContinueLoop

ConsoleWrite("[" & 3 & "]" & "[" & $i & "]: " & $o_TableData[0][$i] & @CRLF)

Next

Edited by regisma

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×