Jump to content

Cannot extract


regisma
 Share

Recommended Posts

Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work?

I usually use

#include <array.au3>
#include <INet.au3>
;Obviously hxxp is not a valide protocol.
Local $url = 'hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en'
;Get the source
Local $data = _InetGetSource($url)
;Regular expression to extract data
Local $regexp = ".*" ;This will grab everything as I don't know what to extract
Local $arr = StringRegExp($data, $regexp, 3)
_ArrayDisplay($arr, "Title")
Edited by Uten
Link to comment
Share on other sites

Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work?

I usually use

#inlcude <array.au3>
;Obviously hxxp is not a valide protocol.
Local $url = hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en
;Get the source
Local $data = _InetGetSource($url)
;Regular expression to extract data
Local $regexp = ".*" ;This will grab everything as I don't know what to extract
Local $arr = StringRegExp($data, $regexp, 3)
_ArrayDisplay($arr, "Title")
id fix those errors such as , misspelling include, not including Inet.au3 , not http , no quotes :whistle: just the little things
Link to comment
Share on other sites

Yeah, thanks. AutoIt don't compute well on Linux so I wrote it from the top of my head.:whistle:

And hxxp is from TP and commented in the code. If you really pay attention to the task at hand.;)

Edited by Uten
Link to comment
Share on other sites

Hi,

I did not post a "Live Link" because in general in most forums it is forbidden.

The information that I'm looking for is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program.

I have extracted by hand the source code of the page so you can have an idea on how to extract it.

thanks

Reg

<tr>

<td bgcolor="#E5F2F8" width="4%" align="center">362</td>

<td bgcolor="#EDF6F9" width="6%" align="center">Vu12</td>

<td bgcolor="#FFFFFF" width="9%" align="center"><img src="images/thumb/vu12.gif" border=0></td>

<td bgcolor="#FFFFFF" valign="top" colspan="6" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<img src="images/arrows_earlier.gif">

<img src="images/icon_ppv.gif">

<!-- check javascript for popup -->

<a href="java script:popupwinPPV('9034349','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a>

</td>

<td bgcolor="#FFFFFF" valign="top" colspan="3" onmouseover="window.status='FREE Pre-Vu! The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<a href="java script:popupwin('9034350','EST');" class="ipg_gray" title="FREE Pre-Vu! The Sentinel (All Day)">FREE Pre-Vu! Th ..</a>

</td>

<td bgcolor="#FFFFFF" valign="top" colspan="15" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;">

<img src="images/icon_ppv.gif">

<!-- check javascript for popup -->

<a href="java script:popupwinPPV('9034351','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a>

<img src="images/arrows_later.gif">

</td>

</tr>

Edited by regisma
Link to comment
Share on other sites

You could try somthing like

$regexp='title="([\w\s]+)"'
In the code I provided above. I'm sorry I can't test the code at the moment so take it as a suggestion :whistle:
Link to comment
Share on other sites

Hi,

Your RegExp will only extract the Tv program description.

I have tried the following using _IETableGetCollection and _IETableWriteToArray but it is just return me 23 tables and none contain the information that I'm looking for which is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program.

here is the sample code that I have tried:

#include <IE.au3>

Opt("WinTitleMatchMode", 2)

$Title = "- Microsoft Internet Explorer"

$hwnd = WinGetHandle($Title)

$oIE = _IEAttach ($hwnd, "HWND")

$oTable = _IETableGetCollection ($oIE, 0)

$iNumTables = @extended

$o_TableData = _IETableWriteToArray($oTable)

ConsoleWrite("Number of Tables " & $iNumTables & @CRLF)

For $i = 0 To UBound($o_TableData) - 1

If $o_TableData[0][$i] == 0 Then ContinueLoop

ConsoleWrite("[" & 3 & "]" & "[" & $i & "]: " & $o_TableData[0][$i] & @CRLF)

Next

Edited by regisma
Link to comment
Share on other sites

Yes It will only take the part defined in the regexp. The rest is left as an exercise. :whistle:

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...