Jump to content

Wanting to make an "Upcoming Movie Trailer" downloader, few questions


Recommended Posts

I would like to pull all of my info either from Apple Trailers or HD-TRAILERS.net

So here are the things I am unsure about.

I can use Inspect element in firefox to inspect a 1080P link here http://www.hd-trailers.net/movie/fast-furious-6/

and I see this link http://trailers.apple.com/movies/universal/fastandfurious6/fastandfurious6-tlr3_h1080p.mov

So now all I have to do is pass that link onto Wget (because apple only allows quicktime to download their trailers now) and it should work.

So the question is, how can I utilize autoit to get the above link?

This project will be used for personal use to download upcoming trailers. My movie server will play them before my actual movie starts, similar to an actual movie theater. I will also create a routine to delete trailers more than 30 days old.

Link to comment
Share on other sites

The regexp code can be complicated. I played with it one day a few weeks ago. Here's an example of reading the links on the movie trailer page:

#include <inet.au3>
#include <array.au3>
$i = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/")
$string = StringRegExp($i, '(http.*?\.mov)', 3)
_ArrayDisplay($string)

It grabs everything that starts with 'http' and ends with '.mov'.

Link to comment
Share on other sites

The regexp code can be complicated. I played with it one day a few weeks ago. Here's an example of reading the links on the movie trailer page:

#include <inet.au3>
#include <array.au3>
$i = _INetGetSource("http://www.hd-trailers.net/movie/fast-furious-6/")
$string = StringRegExp($i, '(http.*?\.mov)', 3)
_ArrayDisplay($string)

It grabs everything that starts with 'http' and ends with '.mov'.

Complicated is right.... Im playing around trying to get it to grab the ones with TRL in the middle now, as well as what you specified. My original code had be using _Stringbetween with some pretty good success, but this seems much more flexible.
Link to comment
Share on other sites

#Include <Inet.au3>
#include <String.au3>
#include <Array.au3>
$Source = _INetGetSource('http://www.hd-trailers.net/')
;debugging
;FileWrite ("source.txt", $source)

$string = StringRegExp($source, '(movie/.*?/)', 3)
;_ArrayDisplay($String)
$Unique = _ArrayUnique ($String)
_Arraydelete ($unique,0)
_ArrayDisplay($unique)

So I am getting closer. This should pull all of the new trailers (a list) from the main page. Can someone tell me how I might use Regexp or maybe even _Array remove the /movie and only return the movie names? The dash is ok for now.

Link to comment
Share on other sites

$string = StringRegExp($source, '(movie/.*?/)', 3)

The parenthesis are the capturing element in your regex. Not all that matches must be captured.

Try this:

$string = StringRegExp($source, 'movie/(.*?/)', 3)

That is shown in the very first example in the documentation of the function you are using: http://www.autoitscript.com/autoit3/docs/functions/StringRegExp.htm

Now, how would you remove the dash only by changing the regular expression above? ;)

Link to comment
Share on other sites

Meh,

I am not really getting this. Let me ask this.... Everything outside of () is not included in the return value, but it used as a "starting point"? Correct?

I see that if I move the / outside of the paren then it too is not included. But Im not understanding the .*? Even looking at the help file does not make it clear. It seems to be under "repeating characters"

Edited by wisem2540
Link to comment
Share on other sites

That is right.

In that regex, the .*? means the following:

- The dot will match any character

- The * makes the dot match 0 or more times. To understand this better, please try with + instead of *. The plus means "match 1 or more times the previous". You would be just as well serve in your regex with the plus, if you assume there will always be a movie name between "movie/" and "/"

- The question mark tell the regex to match the smallest match instead of the largest. Not sure why you put that there, I just didn't remove it.

I would write it differently:

$string = StringRegExp($source, 'movie/([^/]+)/', 3)

That means, find the string "movie/", then capture everything after that that is NOT a slash (/), until you find a slash. Then stop.

Link to comment
Share on other sites

#Include 
#include 
#include 
#include 


$MainUrl = ('http://www.hd-trailers.net/') ;Home Page URL
$Source = _INetGetSource($MainUrl); Source Code to get new trailer list

;debugging
;FileWrite ("source.txt", $source)



$string = StringRegExp($source, 'movie/(.*?)/', 3); Returns a movie list from source code, assumes everything after "Movie/" is a movie title

;Debugging
;_ArrayDisplay($String)

$MovieList = _ArrayUnique ($String) ;Returns only unique movie entries

_Arraydisplay ($Movielist, "Movie List")


For $x = 1 To $MovieList [0]
$Moviepage = ($MainUrl & "movie/" & $Movielist[$x])
MsgBox(0, "Element " & $x, $movielist[$x])
;MsgBox(0, $x, $moviepage)



$i = _INetGetSource($Moviepage)
;
;debugging
;FileWrite ("source.txt", $i)
;MsgBox(0, $x, $i)

$string = StringRegExp($i, '(http://trailers.apple.*?\.mov)', 3); Returns only trailers from apple
$String = _ArrayUnique ($String); filter out duplicates

;Debugging
;_ArrayDisplay($string)

If $String = "" Then; If no trailers available, set CHOOSE value to -2 to be used below
$Choose = -2
Else

;If $String does return a value Then Choose a trailer containing "1080" for 1080p resolution


$Choose = _Arraysearch ($String, "1080","","","",1)
EndIf
If $Choose = -1 Then $Choose = _Arraysearch ($String, "720","","","",1); if no 1080p trailers are available then try 720
If $Choose = -1 Or $choose = -2 Then; If no 720 "-1" or if $Sring above returns null "-2" Then consider trailer unavailable
$Trailer = "Unavailable"
Else
$Trailer = $String[$Choose]; If trailer is not unavailable, choose one
EndIf
MsgBox(0, "Restult " & $x , $trailer); return download link

Next
; More to come

I am sharing this code and opening it up for criticizim and improvement. I am only slightly above a beginner with autoit, and am always looking for ways to improve.

What it does:

Reads source code of HDTrailers, Enumerates a list of movies, Then also finds trailers available for that movie and returns the download link (to be used later)

To Do

What I would like to see happen eventually is finer control over which trailers get downloaded. For instance most trailers seem to contain the letters "TLR" Whereas CLIPS will contain "clip" in the name. Right now, I can only target one trailer. I hope to be able to target all soon.

I will eventually add Wget to be able to download the trailers to a directory. Apple recently changed the way they allow trailers to be downloaded. Only Quicktime is allowed to do it now. Its my understaning that Wget will allow me to spoof this

Also, I will be working on a GUI for this when its complete

Link to comment
Share on other sites

You should take a look at this project:

More or less the same you are doing: download files, progress bar, regexes... Only the focus is on downloading free software, but you get the idea.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...