Sign in to follow this  
Followers 0
regisma

Cannot extract

9 posts in this topic

Hi,

I have a lot of difficulties to extract the TV schedule of this Webpage with AutoIt V3.

hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en

Mainly my goal is to extract the info so I can then program my PVR.

thanks

Regis

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work?

I usually use

#include <array.au3>
#include <INet.au3>
;Obviously hxxp is not a valide protocol.
Local $url = 'hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en'
;Get the source
Local $data = _InetGetSource($url)
;Regular expression to extract data
Local $regexp = ".*" ;This will grab everything as I don't know what to extract
Local $arr = StringRegExp($data, $regexp, 3)
_ArrayDisplay($arr, "Title")
Edited by Uten

Share this post


Link to post
Share on other sites

Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work?

I usually use

#inlcude <array.au3>
;Obviously hxxp is not a valide protocol.
Local $url = hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en
;Get the source
Local $data = _InetGetSource($url)
;Regular expression to extract data
Local $regexp = ".*" ;This will grab everything as I don't know what to extract
Local $arr = StringRegExp($data, $regexp, 3)
_ArrayDisplay($arr, "Title")
id fix those errors such as , misspelling include, not including Inet.au3 , not http , no quotes :whistle: just the little things

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Yeah, thanks. AutoIt don't compute well on Linux so I wrote it from the top of my head.:whistle:

And hxxp is from TP and commented in the code. If you really pay attention to the task at hand.;)

Edited by Uten

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Hi,

I did not post a "Live Link" because in general in most forums it is forbidden.

The information that I'm looking for is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program.

I have extracted by hand the source code of the page so you can have an idea on how to extract it.

thanks

Reg

<tr>

<td bgcolor="#E5F2F8" width="4%" align="center">362</td>

<td bgcolor="#EDF6F9" width="6%" align="center">Vu12</td>

<td bgcolor="#FFFFFF" width="9%" align="center"><img src="images/thumb/vu12.gif" border=0></td>

<td bgcolor="#FFFFFF" valign="top" colspan="6" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<img src="images/arrows_earlier.gif">

<img src="images/icon_ppv.gif">

<!-- check javascript for popup -->

<a href="java script:popupwinPPV('9034349','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a>

</td>

<td bgcolor="#FFFFFF" valign="top" colspan="3" onmouseover="window.status='FREE Pre-Vu! The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<a href="java script:popupwin('9034350','EST');" class="ipg_gray" title="FREE Pre-Vu! The Sentinel (All Day)">FREE Pre-Vu! Th ..</a>

</td>

<td bgcolor="#FFFFFF" valign="top" colspan="15" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<img src="images/icon_ppv.gif">

<!-- check javascript for popup -->

<a href="java script:popupwinPPV('9034351','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a>

<img src="images/arrows_later.gif">

</td>

</tr>

Edited by regisma

Share this post


Link to post
Share on other sites

You could try somthing like

$regexp='title="([\w\s]+)"'
In the code I provided above. I'm sorry I can't test the code at the moment so take it as a suggestion :whistle:

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Hi,

Your RegExp will only extract the Tv program description.

I have tried the following using _IETableGetCollection and _IETableWriteToArray but it is just return me 23 tables and none contain the information that I'm looking for which is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program.

here is the sample code that I have tried:

#include <IE.au3>

Opt("WinTitleMatchMode", 2)

$Title = "- Microsoft Internet Explorer"

$hwnd = WinGetHandle($Title)

$oIE = _IEAttach ($hwnd, "HWND")

$oTable = _IETableGetCollection ($oIE, 0)

$iNumTables = @extended

$o_TableData = _IETableWriteToArray($oTable)

ConsoleWrite("Number of Tables " & $iNumTables & @CRLF)

For $i = 0 To UBound($o_TableData) - 1

If $o_TableData[0][$i] == 0 Then ContinueLoop

ConsoleWrite("[" & 3 & "]" & "[" & $i & "]: " & $o_TableData[0][$i] & @CRLF)

Next

Edited by regisma

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0