regisma Posted November 12, 2006 Share Posted November 12, 2006 Hi, I have a lot of difficulties to extract the TV schedule of this Webpage with AutoIt V3. hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en Mainly my goal is to extract the info so I can then program my PVR. thanks Regis Link to comment Share on other sites More sharing options...
Uten Posted November 12, 2006 Share Posted November 12, 2006 (edited) Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work? I usually use #include <array.au3> #include <INet.au3> ;Obviously hxxp is not a valide protocol. Local $url = 'hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en' ;Get the source Local $data = _InetGetSource($url) ;Regular expression to extract data Local $regexp = ".*" ;This will grab everything as I don't know what to extract Local $arr = StringRegExp($data, $regexp, 3) _ArrayDisplay($arr, "Title") Edited November 12, 2006 by Uten Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
Thatsgreat2345 Posted November 12, 2006 Share Posted November 12, 2006 Would probably help if you provided some information about the information you want to extract. As it is it's all the rest of us can do is guess. Do you want the time? Do you want to click the order buttons? How about providing a working link? Or information on why it does not work? I usually use #inlcude <array.au3> ;Obviously hxxp is not a valide protocol. Local $url = hxxp://www.bell.ca/ExpressVuEPG/loadVuGuide.do?lang=en ;Get the source Local $data = _InetGetSource($url) ;Regular expression to extract data Local $regexp = ".*" ;This will grab everything as I don't know what to extract Local $arr = StringRegExp($data, $regexp, 3) _ArrayDisplay($arr, "Title")id fix those errors such as , misspelling include, not including Inet.au3 , not http , no quotes just the little things Link to comment Share on other sites More sharing options...
Uten Posted November 12, 2006 Share Posted November 12, 2006 (edited) Yeah, thanks. AutoIt don't compute well on Linux so I wrote it from the top of my head. And hxxp is from TP and commented in the code. If you really pay attention to the task at hand. Edited November 12, 2006 by Uten Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
regisma Posted November 12, 2006 Author Share Posted November 12, 2006 (edited) Hi, I did not post a "Live Link" because in general in most forums it is forbidden. The information that I'm looking for is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program. I have extracted by hand the source code of the page so you can have an idea on how to extract it. thanks Reg <tr> <td bgcolor="#E5F2F8" width="4%" align="center">362</td> <td bgcolor="#EDF6F9" width="6%" align="center">Vu12</td> <td bgcolor="#FFFFFF" width="9%" align="center"><img src="images/thumb/vu12.gif" border=0></td> <td bgcolor="#FFFFFF" valign="top" colspan="6" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;"> <img src="images/arrows_earlier.gif"> <img src="images/icon_ppv.gif"> <!-- check javascript for popup --> <a href="java script:popupwinPPV('9034349','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a> </td> <td bgcolor="#FFFFFF" valign="top" colspan="3" onmouseover="window.status='FREE Pre-Vu! The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;"> <a href="java script:popupwin('9034350','EST');" class="ipg_gray" title="FREE Pre-Vu! The Sentinel (All Day)">FREE Pre-Vu! Th ..</a> </td> <td bgcolor="#FFFFFF" valign="top" colspan="15" onmouseover="window.status='The Sentinel (All Day)';return true;" onmouseout="window.status='';return true;"> <img src="images/icon_ppv.gif"> <!-- check javascript for popup --> <a href="java script:popupwinPPV('9034351','EST', 'https://www.bell.ca');" class="ipg_gray" title="The Sentinel (All Day)">The Sentinel (All Day)<br><img src="images/fr/vu_bo_order_fr.gif" border="0"></a> <img src="images/arrows_later.gif"> </td> </tr> Edited November 12, 2006 by regisma Link to comment Share on other sites More sharing options...
Uten Posted November 12, 2006 Share Posted November 12, 2006 Does the html source you have pulle by hand contain anything worth extracting? Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
Uten Posted November 12, 2006 Share Posted November 12, 2006 You could try somthing like $regexp='title="([\w\s]+)"' In the code I provided above. I'm sorry I can't test the code at the moment so take it as a suggestion Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
regisma Posted November 12, 2006 Author Share Posted November 12, 2006 (edited) Hi, Your RegExp will only extract the Tv program description. I have tried the following using _IETableGetCollection and _IETableWriteToArray but it is just return me 23 tables and none contain the information that I'm looking for which is to extract the channel number, channel name, the description of the TV program that is playing and it's associated colspan which represent the duration of this TV program. here is the sample code that I have tried: #include <IE.au3> Opt("WinTitleMatchMode", 2) $Title = "- Microsoft Internet Explorer" $hwnd = WinGetHandle($Title) $oIE = _IEAttach ($hwnd, "HWND") $oTable = _IETableGetCollection ($oIE, 0) $iNumTables = @extended $o_TableData = _IETableWriteToArray($oTable) ConsoleWrite("Number of Tables " & $iNumTables & @CRLF) For $i = 0 To UBound($o_TableData) - 1 If $o_TableData[0][$i] == 0 Then ContinueLoop ConsoleWrite("[" & 3 & "]" & "[" & $i & "]: " & $o_TableData[0][$i] & @CRLF) Next Edited November 12, 2006 by regisma Link to comment Share on other sites More sharing options...
Uten Posted November 12, 2006 Share Posted November 12, 2006 Yes It will only take the part defined in the regexp. The rest is left as an exercise. Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now