Jump to content
Sign in to follow this  
wisem2540

Wanting to make an "Upcoming Movie Trailer" downloader, few questions

Recommended Posts

wisem2540

I would like to pull all of my info either from Apple Trailers or HD-TRAILERS.net

So here are the things I am unsure about.

I can use Inspect element in firefox to inspect a 1080P link here http://www.hd-trailers.net/movie/fast-furious-6/

and I see this link http://trailers.apple.com/movies/universal/fastandfurious6/fastandfurious6-tlr3_h1080p.mov

So now all I have to do is pass that link onto Wget (because apple only allows quicktime to download their trailers now) and it should work.

So the question is, how can I utilize autoit to get the above link?

This project will be used for personal use to download upcoming trailers. My movie server will play them before my actual movie starts, similar to an actual movie theater. I will also create a routine to delete trailers more than 30 days old.

Share this post


Link to post
Share on other sites
wisem2540

So I found _INetGetSource. Now. what is the best way to target a link such as the one above? I am sure its some type of string command. I looked at _StringBetween, but that doesnt seem to do what I want

Share this post


Link to post
Share on other sites
abberration

The regexp code can be complicated. I played with it one day a few weeks ago. Here's an example of reading the links on the movie trailer page:

#include <inet.au3>
#include <array.au3>
$i = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/")
$string = StringRegExp($i, '(http.*?\.mov)', 3)
_ArrayDisplay($string)

It grabs everything that starts with 'http' and ends with '.mov'.


RAID Calculator | Software Installer

The truth has been suppressed since the dawn of time.

Share this post


Link to post
Share on other sites
wisem2540

The regexp code can be complicated. I played with it one day a few weeks ago. Here's an example of reading the links on the movie trailer page:

#include <inet.au3>
#include <array.au3>
$i = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/")
$string = StringRegExp($i, '(http.*?\.mov)', 3)
_ArrayDisplay($string)

It grabs everything that starts with 'http' and ends with '.mov'.

Complicated is right.... Im playing around trying to get it to grab the ones with TRL in the middle now, as well as what you specified. My original code had be using _Stringbetween with some pretty good success, but this seems much more flexible.

Share this post


Link to post
Share on other sites
wisem2540

at first glance it appears I may have to run the array through a different routine to return only the ones containing TLR? Is there a way to specify that routine in the same line as the first?

Share this post


Link to post
Share on other sites
abberration

720 or 1080p:

#include <inet.au3>
#include <array.au3>
$Source = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/")
$string = StringRegExp($Source, '(http://trailers\.apple\.com.*?tlr3_h1080p\.mov|http://trailers\.apple\.com.*?tlr3_h720p\.mov)', 3)
_ArrayDisplay($string)

RAID Calculator | Software Installer

The truth has been suppressed since the dawn of time.

Share this post


Link to post
Share on other sites
wisem2540

oh duh.... OR.... Sorry. That documentation is huge. Thanks for the tip

Share this post


Link to post
Share on other sites
Vincor

More on the topic here: http://www.regular-expressions.info/reference.html

Then you can try it further with a regex test program when building your scripts.

I recommend this one from AZJIO:

Edit: Can't make that link to appear correct - a regex bug in the forum!

Edited by Vincor

Share this post


Link to post
Share on other sites
wisem2540

#Include <Inet.au3>
#include <String.au3>
#include <Array.au3>
$Source = _INetGetSource('http://www.hd-trailers.net/')
;debugging
;FileWrite ("source.txt", $source)

$string = StringRegExp($source, '(movie/.*?/)', 3)
;_ArrayDisplay($String)
$Unique = _ArrayUnique ($String)
_Arraydelete ($unique,0)
_ArrayDisplay($unique)

So I am getting closer. This should pull all of the new trailers (a list) from the main page. Can someone tell me how I might use Regexp or maybe even _Array remove the /movie and only return the movie names? The dash is ok for now.

Share this post


Link to post
Share on other sites
wisem2540

Looks like StringTrim might do it. I will begin testing.

Share this post


Link to post
Share on other sites
Vincor

$string = StringRegExp($source, '(movie/.*?/)', 3)

The parenthesis are the capturing element in your regex. Not all that matches must be captured.

Try this:

$string = StringRegExp($source, 'movie/(.*?/)', 3)

That is shown in the very first example in the documentation of the function you are using: http://www.autoitscript.com/autoit3/docs/functions/StringRegExp.htm

Now, how would you remove the dash only by changing the regular expression above? ;)

Share this post


Link to post
Share on other sites
wisem2540

Well, I did get it working with StringTrim, but I guess I could try this. It could clean up my code a little.

Share this post


Link to post
Share on other sites
wisem2540

Meh,

I am not really getting this. Let me ask this.... Everything outside of () is not included in the return value, but it used as a "starting point"? Correct?

I see that if I move the / outside of the paren then it too is not included. But Im not understanding the .*? Even looking at the help file does not make it clear. It seems to be under "repeating characters"

Edited by wisem2540

Share this post


Link to post
Share on other sites
Vincor

That is right.

In that regex, the .*? means the following:

- The dot will match any character

- The * makes the dot match 0 or more times. To understand this better, please try with + instead of *. The plus means "match 1 or more times the previous". You would be just as well serve in your regex with the plus, if you assume there will always be a movie name between "movie/" and "/"

- The question mark tell the regex to match the smallest match instead of the largest. Not sure why you put that there, I just didn't remove it.

I would write it differently:

$string = StringRegExp($source, 'movie/([^/]+)/', 3)

That means, find the string "movie/", then capture everything after that that is NOT a slash (/), until you find a slash. Then stop.

Share this post


Link to post
Share on other sites
wisem2540

I was going off of Abbreattions example. I guess thats explains why I didnt understand the question mark either lol... ill play with this some more in a bit, and I will respond back

Share this post


Link to post
Share on other sites
wisem2540

#Include 
#include 
#include 
#include 


$MainUrl = ('http://www.hd-trailers.net/') ;Home Page URL
$Source = _INetGetSource($MainUrl); Source Code to get new trailer list

;debugging
;FileWrite ("source.txt", $source)



$string = StringRegExp($source, 'movie/(.*?)/', 3); Returns a movie list from source code, assumes everything after "Movie/" is a movie title

;Debugging
;_ArrayDisplay($String)

$MovieList = _ArrayUnique ($String) ;Returns only unique movie entries

_Arraydisplay ($Movielist, "Movie List")


For $x = 1 To $MovieList [0]
$Moviepage = ($MainUrl & "movie/" & $Movielist[$x])
MsgBox(0, "Element " & $x, $movielist[$x])
;MsgBox(0, $x, $moviepage)



$i = _INetGetSource($Moviepage)
;
;debugging
;FileWrite ("source.txt", $i)
;MsgBox(0, $x, $i)

$string = StringRegExp($i, '(http://trailers.apple.*?\.mov)', 3); Returns only trailers from apple
$String = _ArrayUnique ($String); filter out duplicates

;Debugging
;_ArrayDisplay($string)

If $String = "" Then; If no trailers available, set CHOOSE value to -2 to be used below
$Choose = -2
Else

;If $String does return a value Then Choose a trailer containing "1080" for 1080p resolution


$Choose = _Arraysearch ($String, "1080","","","",1)
EndIf
If $Choose = -1 Then $Choose = _Arraysearch ($String, "720","","","",1); if no 1080p trailers are available then try 720
If $Choose = -1 Or $choose = -2 Then; If no 720 "-1" or if $Sring above returns null "-2" Then consider trailer unavailable
$Trailer = "Unavailable"
Else
$Trailer = $String[$Choose]; If trailer is not unavailable, choose one
EndIf
MsgBox(0, "Restult " & $x , $trailer); return download link

Next
; More to come

I am sharing this code and opening it up for criticizim and improvement. I am only slightly above a beginner with autoit, and am always looking for ways to improve.

What it does:

Reads source code of HDTrailers, Enumerates a list of movies, Then also finds trailers available for that movie and returns the download link (to be used later)

To Do

What I would like to see happen eventually is finer control over which trailers get downloaded. For instance most trailers seem to contain the letters "TLR" Whereas CLIPS will contain "clip" in the name. Right now, I can only target one trailer. I hope to be able to target all soon.

I will eventually add Wget to be able to download the trailers to a directory. Apple recently changed the way they allow trailers to be downloaded. Only Quicktime is allowed to do it now. Its my understaning that Wget will allow me to spoof this

Also, I will be working on a GUI for this when its complete

Share this post


Link to post
Share on other sites
Vincor

You should take a look at this project:

More or less the same you are doing: download files, progress bar, regexes... Only the focus is on downloading free software, but you get the idea.

Share this post


Link to post
Share on other sites
wisem2540

Here is what I came up with so far.....

thanks for everyones help

Trailer Downloader.zip

Edited by wisem2540

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×