wisem2540 Posted April 30, 2013 Share Posted April 30, 2013 I would like to pull all of my info either from Apple Trailers or HD-TRAILERS.netSo here are the things I am unsure about.I can use Inspect element in firefox to inspect a 1080P link here http://www.hd-trailers.net/movie/fast-furious-6/and I see this link http://trailers.apple.com/movies/universal/fastandfurious6/fastandfurious6-tlr3_h1080p.movSo now all I have to do is pass that link onto Wget (because apple only allows quicktime to download their trailers now) and it should work.So the question is, how can I utilize autoit to get the above link?This project will be used for personal use to download upcoming trailers. My movie server will play them before my actual movie starts, similar to an actual movie theater. I will also create a routine to delete trailers more than 30 days old. Link to comment Share on other sites More sharing options...
wisem2540 Posted April 30, 2013 Author Share Posted April 30, 2013 So I found _INetGetSource. Now. what is the best way to target a link such as the one above? I am sure its some type of string command. I looked at _StringBetween, but that doesnt seem to do what I want Link to comment Share on other sites More sharing options...
Vincor Posted April 30, 2013 Share Posted April 30, 2013 I'm not gonna go into the point of whether this is legal, personal use or not. I don't know.But talking about finding some text inside a larger text, the former following some specific patterns: that is definitely a case of regular expressions.See more here: http://www.autoitscript.com/autoit3/docs/tutorials/regexp/regexp.htm Link to comment Share on other sites More sharing options...
abberration Posted April 30, 2013 Share Posted April 30, 2013 The regexp code can be complicated. I played with it one day a few weeks ago. Here's an example of reading the links on the movie trailer page: #include <inet.au3> #include <array.au3> $i = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/") $string = StringRegExp($i, '(http.*?\.mov)', 3) _ArrayDisplay($string) It grabs everything that starts with 'http' and ends with '.mov'. Easy MP3 | Software Installer | Password Manager Link to comment Share on other sites More sharing options...
wisem2540 Posted April 30, 2013 Author Share Posted April 30, 2013 The regexp code can be complicated. I played with it one day a few weeks ago. Here's an example of reading the links on the movie trailer page: #include <inet.au3> #include <array.au3> $i = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/") $string = StringRegExp($i, '(http.*?\.mov)', 3) _ArrayDisplay($string) It grabs everything that starts with 'http' and ends with '.mov'. Complicated is right.... Im playing around trying to get it to grab the ones with TRL in the middle now, as well as what you specified. My original code had be using _Stringbetween with some pretty good success, but this seems much more flexible. Link to comment Share on other sites More sharing options...
wisem2540 Posted April 30, 2013 Author Share Posted April 30, 2013 at first glance it appears I may have to run the array through a different routine to return only the ones containing TLR? Is there a way to specify that routine in the same line as the first? Link to comment Share on other sites More sharing options...
wisem2540 Posted April 30, 2013 Author Share Posted April 30, 2013 StringRegExp($Source, '(http://trailers.apple.com.*?tlrd_h1080p.mov)', 3)this will work for any TRL number at 1080 resolution. Its not perfect, as I would like to include 720 as well, but i guess itll work for now. Any ideas are welcomethanks Link to comment Share on other sites More sharing options...
abberration Posted April 30, 2013 Share Posted April 30, 2013 720 or 1080p: #include <inet.au3> #include <array.au3> $Source = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/") $string = StringRegExp($Source, '(http://trailers\.apple\.com.*?tlr3_h1080p\.mov|http://trailers\.apple\.com.*?tlr3_h720p\.mov)', 3) _ArrayDisplay($string) Easy MP3 | Software Installer | Password Manager Link to comment Share on other sites More sharing options...
wisem2540 Posted April 30, 2013 Author Share Posted April 30, 2013 oh duh.... OR.... Sorry. That documentation is huge. Thanks for the tip Link to comment Share on other sites More sharing options...
Vincor Posted April 30, 2013 Share Posted April 30, 2013 (edited) More on the topic here: http://www.regular-expressions.info/reference.htmlThen you can try it further with a regex test program when building your scripts.I recommend this one from AZJIO: Edit: Can't make that link to appear correct - a regex bug in the forum! Edited April 30, 2013 by Vincor Link to comment Share on other sites More sharing options...
wisem2540 Posted May 1, 2013 Author Share Posted May 1, 2013 #Include <Inet.au3> #include <String.au3> #include <Array.au3> $Source = _INetGetSource('http://www.hd-trailers.net/') ;debugging ;FileWrite ("source.txt", $source) $string = StringRegExp($source, '(movie/.*?/)', 3) ;_ArrayDisplay($String) $Unique = _ArrayUnique ($String) _Arraydelete ($unique,0) _ArrayDisplay($unique) So I am getting closer. This should pull all of the new trailers (a list) from the main page. Can someone tell me how I might use Regexp or maybe even _Array remove the /movie and only return the movie names? The dash is ok for now. Link to comment Share on other sites More sharing options...
wisem2540 Posted May 1, 2013 Author Share Posted May 1, 2013 Looks like StringTrim might do it. I will begin testing. Link to comment Share on other sites More sharing options...
Vincor Posted May 1, 2013 Share Posted May 1, 2013 $string = StringRegExp($source, '(movie/.*?/)', 3) The parenthesis are the capturing element in your regex. Not all that matches must be captured. Try this: $string = StringRegExp($source, 'movie/(.*?/)', 3) That is shown in the very first example in the documentation of the function you are using: http://www.autoitscript.com/autoit3/docs/functions/StringRegExp.htm Now, how would you remove the dash only by changing the regular expression above? Link to comment Share on other sites More sharing options...
wisem2540 Posted May 1, 2013 Author Share Posted May 1, 2013 Well, I did get it working with StringTrim, but I guess I could try this. It could clean up my code a little. Link to comment Share on other sites More sharing options...
wisem2540 Posted May 1, 2013 Author Share Posted May 1, 2013 (edited) Meh, I am not really getting this. Let me ask this.... Everything outside of () is not included in the return value, but it used as a "starting point"? Correct? I see that if I move the / outside of the paren then it too is not included. But Im not understanding the .*? Even looking at the help file does not make it clear. It seems to be under "repeating characters" Edited May 1, 2013 by wisem2540 Link to comment Share on other sites More sharing options...
Vincor Posted May 2, 2013 Share Posted May 2, 2013 That is right. In that regex, the .*? means the following: - The dot will match any character - The * makes the dot match 0 or more times. To understand this better, please try with + instead of *. The plus means "match 1 or more times the previous". You would be just as well serve in your regex with the plus, if you assume there will always be a movie name between "movie/" and "/" - The question mark tell the regex to match the smallest match instead of the largest. Not sure why you put that there, I just didn't remove it. I would write it differently: $string = StringRegExp($source, 'movie/([^/]+)/', 3) That means, find the string "movie/", then capture everything after that that is NOT a slash (/), until you find a slash. Then stop. Link to comment Share on other sites More sharing options...
wisem2540 Posted May 2, 2013 Author Share Posted May 2, 2013 I was going off of Abbreattions example. I guess thats explains why I didnt understand the question mark either lol... ill play with this some more in a bit, and I will respond back Link to comment Share on other sites More sharing options...
wisem2540 Posted May 2, 2013 Author Share Posted May 2, 2013 expandcollapse popup#Include #include #include #include $MainUrl = ('http://www.hd-trailers.net/') ;Home Page URL $Source = _INetGetSource($MainUrl); Source Code to get new trailer list ;debugging ;FileWrite ("source.txt", $source) $string = StringRegExp($source, 'movie/(.*?)/', 3); Returns a movie list from source code, assumes everything after "Movie/" is a movie title ;Debugging ;_ArrayDisplay($String) $MovieList = _ArrayUnique ($String) ;Returns only unique movie entries _Arraydisplay ($Movielist, "Movie List") For $x = 1 To $MovieList [0] $Moviepage = ($MainUrl & "movie/" & $Movielist[$x]) MsgBox(0, "Element " & $x, $movielist[$x]) ;MsgBox(0, $x, $moviepage) $i = _INetGetSource($Moviepage) ; ;debugging ;FileWrite ("source.txt", $i) ;MsgBox(0, $x, $i) $string = StringRegExp($i, '(http://trailers.apple.*?\.mov)', 3); Returns only trailers from apple $String = _ArrayUnique ($String); filter out duplicates ;Debugging ;_ArrayDisplay($string) If $String = "" Then; If no trailers available, set CHOOSE value to -2 to be used below $Choose = -2 Else ;If $String does return a value Then Choose a trailer containing "1080" for 1080p resolution $Choose = _Arraysearch ($String, "1080","","","",1) EndIf If $Choose = -1 Then $Choose = _Arraysearch ($String, "720","","","",1); if no 1080p trailers are available then try 720 If $Choose = -1 Or $choose = -2 Then; If no 720 "-1" or if $Sring above returns null "-2" Then consider trailer unavailable $Trailer = "Unavailable" Else $Trailer = $String[$Choose]; If trailer is not unavailable, choose one EndIf MsgBox(0, "Restult " & $x , $trailer); return download link Next ; More to come I am sharing this code and opening it up for criticizim and improvement. I am only slightly above a beginner with autoit, and am always looking for ways to improve. What it does: Reads source code of HDTrailers, Enumerates a list of movies, Then also finds trailers available for that movie and returns the download link (to be used later) To Do What I would like to see happen eventually is finer control over which trailers get downloaded. For instance most trailers seem to contain the letters "TLR" Whereas CLIPS will contain "clip" in the name. Right now, I can only target one trailer. I hope to be able to target all soon. I will eventually add Wget to be able to download the trailers to a directory. Apple recently changed the way they allow trailers to be downloaded. Only Quicktime is allowed to do it now. Its my understaning that Wget will allow me to spoof this Also, I will be working on a GUI for this when its complete Link to comment Share on other sites More sharing options...
Vincor Posted May 3, 2013 Share Posted May 3, 2013 You should take a look at this project: More or less the same you are doing: download files, progress bar, regexes... Only the focus is on downloading free software, but you get the idea. Link to comment Share on other sites More sharing options...
wisem2540 Posted May 5, 2013 Author Share Posted May 5, 2013 (edited) Here is what I came up with so far..... thanks for everyones helpTrailer Downloader.zip Edited May 5, 2013 by wisem2540 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now