Jump to content

Extract MP3 URLs from a text


b0x4it
 Share

Recommended Posts

How can I extract MP3 URLs from a text that is in the clipboard?

I tried this with no luck:

#include <Array.au3> ;Only for _ArrayDisplay

$gs_Html = ClipGet()

$aString = StringRegExp($gs_Html, "http://(.*)/.*\.mp3", 3)

_ArrayDisplay($aString)

Any suggestion would be appreciated!

Link to comment
Share on other sites

This would work:

#include <Array.au3>
$sText = 'http://google.com/Song1.mp3' & @LF
$sText &= 'http://google.com/Song2.mp3'
$iArray = StringRegExp($sText, '(?im)http:// .* / (.*.mp3)', 3)
If Not @error Then _ArrayDisplay($iArray)

EDIT...You may need to popup this snippet to read the code..the site thinks that it is a media link

EDIT EDIT: I had to add a space after the http://, .*, & /..remove them to run the code

Edited by Varian
Link to comment
Share on other sites

Another way to skin the cat..

#include <String.au3>

Local $Start = "http://", $End = ".mp3", _
      $String  = "This is just text before the URL you want http://example.com/asong.mp3.", _
      $Array = _StringBetween($String, $Start, $End)

MsgBox(0, "", $Start & $Array[0] & $End)

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Link to comment
Share on other sites

Thank you for your replies, but none of them extracts the correct urls.

The method in the first reply extracts urls like this:

[0]|01%20-%20Ghara%20Nabood%20-%20Alireza%20Talischi.mp3">01 - Ghara Nabood - Alireza Talischi.mp3

and the second method exracts urls like this:

[1]|intbaran.in/Albums/1390/Aban/index.php?icon=mp3" alt="" /> &nbsp;<a href="http://intbaran.in/Albums/1390/Aban/2/Minor/Minor%20Music%20Group%20-%20Top%2010/02%20-%20Alireza%20Talischi%20-%20Taghvim

any suggestion?

Link to comment
Share on other sites

Try the regular expression way but exclude results that contain a ? as those will be php arguments to the URI and not links directly to an MP3 file.

What is the address of the webpage?

Thanks for your reply. I tried to learn regular expression, but I couldn't. The web site is :

http://www.musicbaran.org/modules.php?name=News&file=article&sid=8077

I appreciate if you can fix it for me.

Also do you have any document or web page with very simple tutorial and examples of regular expression?

Thanks

Link to comment
Share on other sites

  • Moderators

b0x4it, gruntydatsun,

I have used this site to learn what little I know about SREs - everyone I have pointed to it in the past has found it very useful. :D

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

#include <IE.au3>
#include <String.au3>
#include <Array.au3>
$oIE = _IECreate ("http://www.musicbaran.org/modules.php?name=News&file=article&sid=8077",0,0) ;create IE instance and load Webmail
$text = _IEBodyReadHTML($oIE)
_IEQuit($oIE)
$array = StringRegExp($text,'a href="(http://.*.mp3"',3)
_ArrayDisplay($array)

the regex split out is:

' a h r e f = " ( h t t p : / / . * . m p 3 ) " '

Sorry about the constant updating but it was replacing the h t t p : / / . * . m p 3 with media tags and showing a media player interface.

Edited by gruntydatsun
Link to comment
Share on other sites

thanks for the reply. It works for most cases but in some cases it extracts urls incorrectly like this:

[0]|http://yaser3h.persiangig.com/audio/Shahriyar%20Ebrahimi%20-%20Aye%20yaas.mp3">لینک دانلود شهریار ابراهیمی</a>&nbsp; -&nbsp; </font></font></font></font></font><a href="http://yaser3h.persiangig.com/audio/Farzad%20Farzin%20-%20Midoonam.mp3

any suggestion?

thanks in advance

Link to comment
Share on other sites

hi box4it, it should capture that one, fits the pattern.

What is the URL is doesn't work on?

The web page is:

http://www.musicbaran.org/modules.php?name=News&file=article&sid=8077

it should find

http://yaser3h.persiangig.com/audio/Shahriyar%20Ebrahimi%20-%20Aye%20yaas.mp3

as well as

http://yaser3h.persiangig.com/audio/Farzad%20Farzin%20-%20Midoonam.mp3

instead of this one:

[0]|http://yaser3h.persiangig.com/audio/Shahriyar%20Ebrahimi%20-%20Aye%20yaas.mp3">لینک دانلود شهریار ابراهیمی</a>&nbsp; -&nbsp; </font></font></font></font></font><a href="

any idea?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...