Sign in to follow this  
Followers 0
b0x4it

Extract MP3 URLs from a text

21 posts in this topic

How can I extract MP3 URLs from a text that is in the clipboard?

I tried this with no luck:

#include <Array.au3> ;Only for _ArrayDisplay

$gs_Html = ClipGet()

$aString = StringRegExp($gs_Html, "http://(.*)/.*\.mp3", 3)

_ArrayDisplay($aString)

Any suggestion would be appreciated!

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

This would work:

#include <Array.au3>
$sText = 'http://google.com/Song1.mp3' & @LF
$sText &= 'http://google.com/Song2.mp3'
$iArray = StringRegExp($sText, '(?im)http:// .* / (.*.mp3)', 3)
If Not @error Then _ArrayDisplay($iArray)

EDIT...You may need to popup this snippet to read the code..the site thinks that it is a media link

EDIT EDIT: I had to add a space after the http://, .*, & /..remove them to run the code

Edited by Varian

Share this post


Link to post
Share on other sites

Another way to skin the cat..

#include <String.au3>

Local $Start = "http://", $End = ".mp3", _
      $String  = "This is just text before the URL you want http://example.com/asong.mp3.", _
      $Array = _StringBetween($String, $Start, $End)

MsgBox(0, "", $Start & $Array[0] & $End)

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

Thank you for your replies, but none of them extracts the correct urls.

The method in the first reply extracts urls like this:

[0]|01%20-%20Ghara%20Nabood%20-%20Alireza%20Talischi.mp3">01 - Ghara Nabood - Alireza Talischi.mp3

and the second method exracts urls like this:

[1]|intbaran.in/Albums/1390/Aban/index.php?icon=mp3" alt="" /> &nbsp;<a href="http://intbaran.in/Albums/1390/Aban/2/Minor/Minor%20Music%20Group%20-%20Top%2010/02%20-%20Alireza%20Talischi%20-%20Taghvim

any suggestion?

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Try the regular expression way but exclude results that contain a ? as those will be php arguments to the URI and not links directly to an MP3 file.

What is the address of the webpage?

Edited by gruntydatsun

Share this post


Link to post
Share on other sites

Try the regular expression way but exclude results that contain a ? as those will be php arguments to the URI and not links directly to an MP3 file.

What is the address of the webpage?

Thanks for your reply. I tried to learn regular expression, but I couldn't. The web site is :

http://www.musicbaran.org/modules.php?name=News&file=article&sid=8077

I appreciate if you can fix it for me.

Also do you have any document or web page with very simple tutorial and examples of regular expression?

Thanks

Share this post


Link to post
Share on other sites

I'll try. Not much of an expert but I've been learning too.

That is one aggressively designed website lol !

Share this post


Link to post
Share on other sites

b0x4it, gruntydatsun,

I have used this site to learn what little I know about SREs - everyone I have pointed to it in the past has found it very useful. :D

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

#include <IE.au3>
#include <String.au3>
#include <Array.au3>
$oIE = _IECreate ("http://www.musicbaran.org/modules.php?name=News&file=article&sid=8077",0,0) ;create IE instance and load Webmail
$text = _IEBodyReadHTML($oIE)
_IEQuit($oIE)
$array = StringRegExp($text,'a href="(http://.*.mp3"',3)
_ArrayDisplay($array)

the regex split out is:

' a h r e f = " ( h t t p : / / . * . m p 3 ) " '

Sorry about the constant updating but it was replacing the h t t p : / / . * . m p 3 with media tags and showing a media player interface.

Edited by gruntydatsun

Share this post


Link to post
Share on other sites

Thanks for all the replies, but I don't want to use IE, as the purpose is that to be able to do this on any web page that the source code is in the clipboard. Any other solution?

Share this post


Link to post
Share on other sites

#include <Array.au3>
$text = ClipGet()
$array = StringRegExp($text,'a href="(http://.*.mp3"',3)
_ArrayDisplay($array)

Share this post


Link to post
Share on other sites

Thanks for the reply, but something is wrong with the code. It does not run!

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

Typo. This works.

i left out the closing bracket after the mp3 bit.

sick of trying to get this onto the page without the media player stuff popping up.

answer is attached below:

Extract_MP3_URLS_From_Page.au3

Edited by gruntydatsun

Share this post


Link to post
Share on other sites

thanks for the reply. It works for most cases but in some cases it extracts urls incorrectly like this:

[0]|http://yaser3h.persiangig.com/audio/Shahriyar%20Ebrahimi%20-%20Aye%20yaas.mp3">لینک دانلود شهریار ابراهیمی</a>&nbsp; -&nbsp; </font></font></font></font></font><a href="http://yaser3h.persiangig.com/audio/Farzad%20Farzin%20-%20Midoonam.mp3

any suggestion?

thanks in advance

Share this post


Link to post
Share on other sites

hi box4it, it should capture that one, fits the pattern.

What is the URL is doesn't work on?

Share this post


Link to post
Share on other sites

hi box4it, it should capture that one, fits the pattern.

What is the URL is doesn't work on?

The web page is:

http://www.musicbaran.org/modules.php?name=News&file=article&sid=8077

it should find

http://yaser3h.persiangig.com/audio/Shahriyar%20Ebrahimi%20-%20Aye%20yaas.mp3

as well as

http://yaser3h.persiangig.com/audio/Farzad%20Farzin%20-%20Midoonam.mp3

instead of this one:

[0]|http://yaser3h.persiangig.com/audio/Shahriyar%20Ebrahimi%20-%20Aye%20yaas.mp3">لینک دانلود شهریار ابراهیمی</a>&nbsp; -&nbsp; </font></font></font></font></font><a href="

any idea?

Share this post


Link to post
Share on other sites

Sorry this is the url:

http://www.ganja2music123.com/modules.php?name=News&file=article&sid=2555

Share this post


Link to post
Share on other sites

hi box4it,

http://yaser3h.per siangig.c om /audio/ is using relative addressing in the background, not absolute so no http: or domain for the regex to match in the link.

My brain is dead. Awake too long. Need sleep.

Share this post


Link to post
Share on other sites

sure, thanks for all your help. I'm trying to find a way using _StringBetween function.

good night

Share this post


Link to post
Share on other sites

The bad link is wrapped in some non-English character set. Maybe the regex is choking on that or something? Not sure. Wait till the smart people get online and they wil probably help you.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0