Sign in to follow this  
Followers 0
toxicdav3

StringRegExp help!

11 posts in this topic

#1 ·  Posted (edited)

I tired to use the stringregexp from the help file to split the items in a rss feed but it didn't return anything.

Didn't work:
$array = StringRegExp($rss, '<(?i)item>(.*?)</(?i)item>', 1, 1)

Did work:
$array = _StringBetween($rss, '<item>', '</item>')

Also looking to split up a file name into 3 parts and extract the season and episode numbers if anyone can help me.

Show Name *split* s01e01 *split* WS PDTV XviD RLSGROUP

Show Name to $show

s01e01 can be any of the following...

s01e01

1x01

2009 03 01

odd:

Season 1

Season 1 Ep 1

Season 1 Episode 1

01x01

101

2008

to different strings if the info is available

$season

$episode

$year

$day

$month

WS PDTV XviD RLSGROUP to $tags

Please help stringregexp confuses me completely and cant even get the simple item thing working! cheers

Edited by toxicdav3

Share this post


Link to post
Share on other sites



Meep... HTML Code to-be-parsed would make it much easier to help.

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

toxicdav3, you are looking for text between <item> and </item>, correct? Your examples do not have either of those strings in them, that's the problem - we need to see what it is you are pulling from.

Also, using option 1 in StringRegExp will return only one result per pair of parentheses, so in your example it will only return a 1 element array (if something matches). Option 3 will do a repeated search for those items until it reaches the end of the string.

*edit: oops, just realized you are talking about two different things. The 2nd item we'd also need to know what specifically separates the season/episode from the rest of the text.

For example, is it 'ATVShow.s1e01.dvd.info.txt' or 'A TV Show 1x01.info.txt' or 'A-TV-Show on NBC Season 1 Ep 1 info.txt'

If it can be any of the above, you're not going to wind up with a Regular Expression that can do all the work for you. There has to be some type of logical consistent order. And what if the TV Show itself has any of those characters? Like, if they had a TV series based on the movie: 'Open Season Season 1 Ep 2'. How are you going to distinguish without a given order, or special separators.. it's one thing to find consistent formatted things like s1e12 or 3x04, but the other random possibilities requires a lot of string interpreting.

Edited by ascendant

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

I'll just leave it to stringbetween for now. How would i go about splitting this and using StringRegExp to get the season and episode? All dots, dashes, etc are replaced by spaces so there is no need to worry about that.

Show Name ~ s01e01 ~ WS PDTV XviD RLSGROUP

Edited by toxicdav3

Share this post


Link to post
Share on other sites

I'll just leave it to stringbetween for now. How would i go about splitting this and using StringRegExp to get the season and episode? All dots, dashes, etc are replaced by spaces so there is no need to worry about that.

Show Name ~ s01e01 ~ WS PDTV XviD RLSGROUP

Okay, the simplest way - if you are *always* getting s##e##, and there aren't other matches in the string:

StringRegExp($Str,"(?i)s(\d{1,2})e(\d{1,2})",1)

(Note I let it take 1 or 2 digits. If its *always* 2, just change it to {2})

Now if it's something *between* ~'s and a space, then you can just change the above to add the tildes and space:

"(?i)~ s(\d{1,2})e(\d{1,2}) ~"

Does that help?

Results:

[0] = 01

[1] = 01

Share this post


Link to post
Share on other sites

These regular expressions are not fully tested on all input string variations. Hope they help you along.

$sStr = "Show Name ~ s01e01 ~ WS PDTV XviD RLSGROUP"
 
; Separating on space~space
 $Show = StringRegExpReplace($sStr, " ~ .*$", "")
 MsgBox(0, "$Show", $Show)
 
; Separating on space~Space
 $s01e01 = StringRegExpReplace($sStr, $Show & " ~ | ~ .*$", "\1")
 MsgBox(0, "Season Episode", $s01e01)
 
 
 $tags = StringRegExpReplace($sStr, "^.*" & $s01e01 & " ~ ", "")
 MsgBox(0, "$tags", $tags)
 
 #cs
    s01e01
    1x01
    2009 03 01
    
    odd:
    Season 1
    Season 1 Ep 1
    Season 1 Episode 1
    01x01
    101
    2008
 #ce
 
 $season = StringRegExpReplace($s01e01, "(?i)(s\d{1,2}|\d{1,2}|200\d|Season \d{1,2})(?:.*)", "\1")
 MsgBox(0, "$season", $season)
 
 $episode = StringRegExpReplace($s01e01, "(?i)(?:" & $season & ")(e\d{1,2}|x\d{1,2}|Ep \d{1,2}|Episode \d{1,2}|\d{1,2})", "\1")
 MsgBox(0, "$episode", $episode)
 
 
 If StringRegExp($sStr, "(20\d{2}|19\d{2})", 0) = 1 Then
    $year = StringRegExpReplace($sStr, "(?:.*)(20\d{2}|19\d{2})(?:.*)", "\1")
 Else
    $year = "Not Found"
 EndIf
 MsgBox(0, "$year", $year)

Share this post


Link to post
Share on other sites

Malkey, you're one serious PCRE freak! hahaha.. how many PCRE's is that for one function?! yikes.. talk about excessive :P

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Thanks for your help but the "~" aren't included in the name. That's why I had it as *split* in the original post.

Edited by toxicdav3

Share this post


Link to post
Share on other sites

As Paulie said, HTML code to be parsed can help in recognizing the pattern to match and to not match and if it's necessary at all.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0