phatzilla Posted August 7, 2010 Share Posted August 7, 2010 (edited) So i'd like to 'collect' all of the video links on this page http://www.youtube.com/videos?s=mp&t=t&cr=CA&p=1 #include <IE.au3> $oIE = _IECreate ("http://www.youtube.com/videos?s=mp&t=t&cr=CA&p=1") $oLinks = _IELinkGetCollection ($oIE) $iNumLinks = @extended MsgBox(0, "Link Info", $iNumLinks & " links found") For $oLink In $oLinks MsgBox(0, "Link Info", $oLink) Next Thats the only script that i could find, but i mean it has like 150 links most of which are useless, i only want links to the actual 20 or so videos, and i don't want duplicates either. Can anyone tell me how i can approach this issue? I want the web links to be saved to a .txt Edited August 7, 2010 by phatzilla Link to comment Share on other sites More sharing options...
phatzilla Posted August 7, 2010 Author Share Posted August 7, 2010 (edited) Okay after Racking my Brain, i got to this #include <IE.au3> $oIE = _IECreate("http://www.youtube.com/videos?s=mp&t=t&cr=CA&p=1") $nOffset = 1 $oLinks = _IELinkGetCollection($oIE) For $oLink In $oLinks Sleep(10) $findlink = StringInStr($oLink.href, "http://www.youtube.com/watch?v=") $done=StringReplace ( $oLink.href, "http://www.youtube.com/watch?v=" , "") If $findlink = 0 Then Sleep(10) Else ;~ MsgBox(0,$oLink.href,$done) $file = FileOpen("links.txt", 1) FileWrite($file,$done & @CRLF) FileClose($file) EndIf Next Now it extracts all the video links, however i have quadruples of like every link, how do i make sure i get no duplicates added to my .txt? Here's the output I'd also like to remove the lines with "hd" and "cc" expandcollapse popupuelHwf8o7_U uelHwf8o7_U uelHwf8o7_U uelHwf8o7_U&hd=1 toBLte0n8z8 toBLte0n8z8 toBLte0n8z8 m3lvxTo4Oq8 m3lvxTo4Oq8 m3lvxTo4Oq8 m3lvxTo4Oq8&hd=1 68XP29hivmY 68XP29hivmY 68XP29hivmY 68XP29hivmY&hd=1 1qghtxXiBMU 1qghtxXiBMU 1qghtxXiBMU 1qghtxXiBMU&hd=1 homJ9pE-OCc homJ9pE-OCc homJ9pE-OCc ch1UBta1sg4 ch1UBta1sg4 ch1UBta1sg4 ch1UBta1sg4&hd=1 nQEEQEBySTk nQEEQEBySTk nQEEQEBySTk -sDL98j3ii0 -sDL98j3ii0 -sDL98j3ii0 -sDL98j3ii0&hd=1 -sDL98j3ii0&cc=1 4cqFbEDsZqE 4cqFbEDsZqE 4cqFbEDsZqE dDnvpGSskgc dDnvpGSskgc dDnvpGSskgc Lzf1v_Km2mc Lzf1v_Km2mc Lzf1v_Km2mc Lzf1v_Km2mc&hd=1 mcdKB17PDlg mcdKB17PDlg mcdKB17PDlg c9R-lIs35fM c9R-lIs35fM c9R-lIs35fM c9R-lIs35fM&hd=1 qqTMfmBEcPA qqTMfmBEcPA qqTMfmBEcPA 9lL0Wj_IXOk 9lL0Wj_IXOk 9lL0Wj_IXOk 9lL0Wj_IXOk&hd=1 Q_j6F66K-hE Q_j6F66K-hE Q_j6F66K-hE Q_j6F66K-hE&hd=1 OjNydJV4Iuw OjNydJV4Iuw OjNydJV4Iuw 0DWHb6ZIPBs 0DWHb6ZIPBs 0DWHb6ZIPBs 0DWHb6ZIPBs&hd=1 L53gjP-TtGE L53gjP-TtGE L53gjP-TtGE L53gjP-TtGE&hd=1 xIBCZfxUg6Q xIBCZfxUg6Q xIBCZfxUg6Q mS2L-dxeMOg mS2L-dxeMOg mS2L-dxeMOg hfhrqwe495g hfhrqwe495g hfhrqwe495g Edited August 7, 2010 by phatzilla Link to comment Share on other sites More sharing options...
DaleHohm Posted August 7, 2010 Share Posted August 7, 2010 Examining $oLink in that example is pretty useless... it is an object variable and isn't useful on its own. Replace MsgBox(0, "Link Info", $oLink) with ConsoleWrite("Link: " & $oLink.innerText & " href: " & $oLink.href) examine the results and you may be on your way. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
phatzilla Posted August 7, 2010 Author Share Posted August 7, 2010 Dale, i think you missed my second post, im sort of past that, thanks tho I got rid of the extra characters like "hd" and "cc" by using $done1 = StringLeft ($done, 11) So i only take the first 11 characters of each link name. Now, how would i go about removing duplicate lines? or maybe not even parsing them in the first place? Link to comment Share on other sites More sharing options...
katekitten Posted August 7, 2010 Share Posted August 7, 2010 i had figured it our thank you for your information. Hollywood Graphic Design Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now