Jump to content

URL extraction AND verification if alive or dead 404


Recommended Posts

Dear AutoIt community,

I am interested in learning to write a script that will extract and list a set of specific URLs from a text file, from the clipboard or directly from a website link after checking if the URLs in the text file, clipboard or website/source are dead or alive.

I have found this thread, Text Extraction / Regular Expressions Help and that helped a little but I am not sure where to start really.

Here is an example that might help you understand better.

Link:

http://www.mixcloud.com/api/1/cloudcast/bond694/queen-absolute-greatest-80s.json

The part of the content from above link I am interested in is this:

"audio_formats": {
        "m4a": {
            "64": [
                "http://stream2.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a", 
                "http://stream3.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a", 
                "http://stream4.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a", 
                "http://stream1.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a"
            ]
        }, 
        "mp3": [
            "http://stream2.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3", 
            "http://stream3.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3", 
            "http://stream4.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3", 
            "http://stream1.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3"
        ], 
        "aac": {
            "24": [
                "http://stream2.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac", 
                "http://stream3.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac", 
                "http://stream4.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac", 
                "http://stream1.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac"
            ], 
            "64": [
                "http://stream2.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a", 
                "http://stream3.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a", 
                "http://stream4.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a", 
                "http://stream1.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a"
            ]
        }

Those are the locations at which media files are stored in different file formats. The thing is that not all locations will return valid/alive results. Only some will return the media file while others will return a 404 page.

So how can check with AutoIt if a link is alive or dead?

Ideally it would be excellent if instead of going to the ***.json link, users can simply input the original link, in this case

http://www.mixcloud.com/bond694/queen-absolute-greatest-80s/

and then the script would go to the related .json page, check the "audio_formats" links for them being alive or dead and finally list only the valid links.

If someone else is interested in this project it would be fantastic if we could collaborate on this. I am not seeking a pro but rather a beginner or intermediate coder to collaborate with and perhaps a couple of pros (if there is interest) that can sort of guide the project with reading and tutorial suggestions or help out when getting stuck with something.

So basically if there are some flaws in my idea/concept or there is something in it that cannot be done with AutoIt I would very much appreciate if someone would let me know before I embark on this journey.

I know that there are a lot of similar scripts for this task out there but none of them succeed to take the hurdle of checking if the links are alive or dead, at least based on my research.

So where do I start? Any help is much appreciated and please let me stress that I am very willing to learn and understand what I am actually scripting instead of given bits and pieces or code that I glue together. So I am not asking you pros to do the work, not at all, some simple (or perhaps later more advanced) guidance is all I seek at this moment.

Thanks.

:mellow:

Link to comment
Share on other sites

  • Moderators

#cs
"audio_formats": {
        "m4a": {
            "64": [
                "http://stream2.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a",
                "http://stream3.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a",
                "http://stream4.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a",
                "http://stream1.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a"
            ]
        },
        "mp3": [
            "http://stream2.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3",
            "http://stream3.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3",
            "http://stream4.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3",
            "http://stream1.mxcdn.com/cloudcasts/originals/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.mp3"
        ],
        "aac": {
            "24": [
                "http://stream2.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac",
                "http://stream3.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac",
                "http://stream4.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac",
                "http://stream1.mxcdn.com/cloudcasts/aac/24/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.aac"
            ],
            "64": [
                "http://stream2.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a",
                "http://stream3.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a",
                "http://stream4.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a",
                "http://stream1.mxcdn.com/cloudcasts/m4a/64/1/c/8/a/b1b7-8318-4034-aa64-b747cdf6b183.m4a"
            ]
        }
#ce

Global $gs_Html = ClipGet() ; however you get your html, here I'm just copying what you had
Global $ga_MP3 = __GetAudioFormatURLs($gs_Html, "mp3")
Global $ga_AAC = __GetAudioFormatURLs($gs_Html, "aac", "24")

_ArrayDisplay($ga_MP3, "MP3")
_ArrayDisplay($ga_AAC, "AAC - 24")

Func __GetAudioFormatURLs($s_str, $s_format, $s_type = "")

    Local $s_pattern = "(?is)\x22\Q" & $s_format & "\E\x22: "
    If $s_type Then $s_pattern &= "\{.*?\x22\Q" & $s_type & "\E\x22: "
    $s_pattern &= "\[\s*(.*?)\s*\]"

    ; get audio batch
    Local $a_audiobatch = StringRegExp($s_str, $s_pattern, 1)
    If @error Then Return SetError(1, 0, 0)

    $s_pattern = "(?:\x22(.*?)\x22)"
    Local $a_http = StringRegExp($a_audiobatch[0], $s_pattern, 3)
    If @error Then Return SetError(2, 0, 0)

    Return $a_http
EndFunc

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

SmOke_N,

thanks so much for your reply and sorry for not replying earlier. I am still on the Wiki and trying to make sense of what you posted. I am still a beginner. As soon as I have a better idea of what your code does (I assume it perfectly does what I actually want to achieve) I will post more here.

It seems that with the re-design of the site's player most of the other downloaders have started to fail so creating a little tool for this purpose is high priority for me, though, given that I am a beginner this might still take some time. Oh well. I will surly learn a lot and that is great! Thanks for your help!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...