Jump to content
Sign in to follow this  
hawkair

Regular Expression Disjunctions (|) return empty matches

Recommended Posts

hawkair

I use regexp without problems but I haven't figured out why

the regexp:

\[([^\]]+)\]\r\nS2, Ep1|Season 2, Episode 1: {([^}]+)}

applied on this text (with flag 3):

Season 2, Episode 1: {Peter, Peter, Caviar Eater}

[What Lies Ahead]

S2, Ep1

will return

0 =>

1 => Peter, Peter, Caviar Eater

2 => What Lies Ahead

Similar things happen every time the second part is matched-this is just one simple example

I have to manually remove empty elements from the results every time

Where does the empty string at 0 => come from?

How can I reformulate my regular expression so that I don't get empty elements?

Thank you in advance for any help you can give

Share this post


Link to post
Share on other sites
hawkair

Hello again

I am afraid my example was not clear enough

I would like to present a simpler example

#include <Array.au3>
$text="test123,try456"
$results = StringRegExp ($Text, "test(\d+)|try(\d+)", 3)
MsgBox (0, "Results", _ArrayToString ($results))
Exit

MsgBox gives:

123||456

There is an empty match before "456"

Could it be a bug?

Share this post


Link to post
Share on other sites
jchd

It comes from the first capturing parenthesis, which captures nothing. You get an empty string as a result.

Also I'm unsure this is the actual pattern you use, as literal {} aren't escaped ???

Post your code and actual sample subject.

For your second example, change it like this:

#include <Array.au3>
$text="test123,try456"
$results = StringRegExp ($Text, "(?:test|try)(d+)", 3)
MsgBox (0, "Results", _ArrayToString ($results))
Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
hawkair

@jchd

The first example is my actual problem

I have a folder with episodes of some series in the form

Lost.S01E01.avi

I get the episode list from IMDB and it may be in two formats (older & newer)

older:

Season 2, Episode 1: {Peter, Peter, Caviar Eater}
Original Air Date-23 September 1999
Peter changes for the worse after he and Lois inherit a mansion in Newport.

Next US airings:
{Mon. Mar. 14}  {10:30 PM}  {TBS }
{Tue. Mar. 22}  {6:30 PM}  {CW }

------------------------------------


Season 2, Episode 2: {Holy Crap}
Original Air Date-30 September 1999
Peter's ultra-religious father causes trouble when he comes to live with the family.

Next US airings:
{Thur. Mar. 24}  {6:30 PM}  {CW }

newer:
Season 2
{[What Lies Ahead]
S2, Ep1
}
Oct. 16, 2011
{What Lies Ahead}
The group heads for Fort Benning. But along the way, they run into a herd of zombies and Sophia runs off in fear. Now Rick and the others try to track her down.

{[Bloodletting]
S2, Ep2
}
Oct. 23, 2011
{Bloodletting}
After Carl gets shot, Rick and Shane find Dr. Hershel Greene and his family in a nearby house. But Carl has lost too much blood and Hershel doesn't have everything he needs. Could Carl die before the right equipment is found?

Then I get the episode title for each episode and create a command file "ren.bat" that I check and execute

to rename my files as for example

Lost.S01E01.Pilot.avi

etc

I didn't think the actual code is important and it is a part of a larger portion so it may no work on its own but I include here

without checking and with minimum editing(sorry for my messy programming)

$cmd = ""
            $cmd1 = ""
            $Eps = ""
            $text = ClipGet()
            $Path = IniRead($Inifile, $Section, "AviFolder", "H:_SeriesTorchwood")
            If StringInStr($text, "") Then
                $tmp = StringReplace(StringLeft($text, StringInStr($text, "", 0, -1)), """", "")
                If FileExists($tmp) Then $Path = $tmp
            EndIf
            $Path = InputBox("Enter", "Full Path", $Path)
            IniWrite($Inifile, $Section, "AviFolder", $Path)
            If FileExists($Path & "Episodes.txt") Then $Eps = FileRead($Path & "Episodes.txt")
            $patrn = StringSplit("*.S*E*.avi,*.?x??.*.avi,* ?x?? *.avi", ",")
            ;Analyze
            $Ext = ".avi"
            $avisearch = FileFindFirstFile($Path & "*" & $Ext)
            If $avisearch = -1 Then
                $Ext = ".mkv"
                $avisearch = FileFindFirstFile($Path & "*" & $Ext)
                If $avisearch = -1 Then
                    MsgBox(0, $Path & "*.avi", "No avis/mkvs found. Exiting...")
                    Exit
                EndIf
            EndIf
            $patrn = StringSplit("S([0-9]{1,2})E([0-9]{1,2}).|s([0-9]{1,2})e([0-9]{1,2}).|[. -[]([0-9]{1,2})x([0-9]{1,2})[. -]]|[. -]([0-9])([0-9][0-9])[. -]", "|")
            $subpatrn = StringSplit("*$S$E*.srt|*$Sx$E*.srt|*S$SE$E*.srt", "|")
            While 1
                $avifile = FileFindNextFile($avisearch)
                ;       MsgBox(0,$Path & "*.avi", $avifile)
                $Avitype = 0
                For $i = 1 To $patrn[0]
                    ;   MsgBox(0,$Path & "*.avi", $patrn[$i])
                    If StringRegExp($avifile, $patrn[$i], 0) Then
                        $Avitype = $i
                        ExitLoop
                    EndIf
                Next
                If $Avitype <> 0 Then
                    ExitLoop
                Else
                    MsgBox(0, $Path & "*.avi", "No matching avis found. Exiting...")
                    Exit
                EndIf
            WEnd
            Do
                $S = StringRegExp($avifile, StringRegExpReplace($patrn[$Avitype], "[()]", ""), 1)
                If @error <> 0 Then MsgBox(0, "Error", $avifile & "doesn't have tthe same format with the rest")
                $L = StringInStr($avifile, $S[0]) + StringLen($S[0]) - 1
                $S = StringRegExp($avifile, $patrn[$Avitype], 3)
                If StringLeft($S[0], 1) = "0" Then $S[0] = StringTrimLeft($S[0], 1)
                If StringLeft($S[1], 1) = "0" Then
                    $E = StringTrimLeft($S[1], 1)
                Else
                    $E = $S[1]
                EndIf
                If $Eps <> "" Then
; Old line               $Title = StringRegExp($Eps, 'Season ' & $S[0] & ', Episode ' & $E & ': {([^}]+)}', 3);new format :S2, Ep1
'New Lines
                 $EpsPattern='[(.*)] rnS$s, Ep$e|Season $s, Episode $e: {([^}]+)}'
                 $EpsPattern=Stringreplace( Stringreplace( $EpsPattern, $s, $S[0]), $e, $S[1])
                 $Title = StringRegExp($Eps,$EpsPattern, 3)
                    If @error <> 0 Then MsgBox(0, "Error", 'Season ' & $S[0] & ', Episode ' & $E & " not in list")
                    $oldfile = $avifile
                    If $Avitype > 2 Then
                        $avifile = StringLeft($avifile, $L - 1) & " - " & StringRegExpReplace($Title[0], "[?:/]", "") & '.' & StringTrimLeft($avifile, $L)
                    Else
                        $avifile = StringLeft($avifile, $L) & StringRegExpReplace($Title[0], "[?:/]", "") & '.' & StringTrimLeft($avifile, $L)
                    EndIf
                    $cmd = $cmd & "ren """ & $oldfile & """ """ & $avifile & @CRLF
                EndIf
                $Found = 0
                For $i = 1 To $subpatrn[0]
                    $newpatrn = StringReplace(StringReplace($subpatrn[$i], "$S", $S[0]), "$E", $S[1])
                    $newpatrn = StringReplace($newpatrn, "S" & $S[0], "S0" & $S[0])
                    ;                           MsgBox(0,$avifile, $newpatrn)
                    $subsearch = FileFindFirstFile($Path & $newpatrn)
                    ;MsgBox(0,$avifile, $Path & $newpatrn)
                    If $subsearch <> -1 Then
                        While 1
                            $SubFile = FileFindNextFile($subsearch)

                            If @error Then ExitLoop
                            ;MsgBox(0,$newpatrn, StringTrimLeft (StringLeft ($newpatrn,StringInstr ( $newpatrn, "*", 0, 2)-1),1 ))
                            If StringInStr($SubFile, StringTrimLeft(StringLeft($newpatrn, StringInStr($newpatrn, "*", 0, 2) - 1), 1)) = 0 Then ExitLoop
                            ;MsgBox(0,$avifile, $newpatrn & @LF & $subfile)
                            $Found = 1
                            $Greek = StringRegExp(FileRead($Path & $SubFile), "[αβγΔεζηθικλμνξοπρστυφχψω]", 3)
                            If UBound($Greek) < 50 Then
                                $lng = "eng."
                            Else
                                $lng = ""
                            EndIf
                            $cmd = $cmd & "ren """ & $SubFile & """ """ & StringReplace($avifile, $Ext, "." & $lng & "srt") & """" & @CRLF
                            $cmd1 = $cmd1 & "ren """ & $avifile & """ """ & StringReplace($SubFile, "." & $lng & "srt", $Ext) & """" & @CRLF
                            ;           MsgBox(0," Process files", $cmd)
                        WEnd
                    EndIf
                    FileClose($subsearch)
                    ;   MsgBox(0,"$Found S1E01", $Found)
                Next
                If $Found = 0 Then $cmd = $cmd & "#" & $avifile & " has no corresponding Subtitle" & @CRLF
                $avifile = FileFindNextFile($avisearch)
            Until @error
            FileClose($avisearch)
            $cmdfile = $Path & "ren.bat"
            FileWrite($cmdfile, $cmd)
            FileWrite($Path & "revren.bat", $cmd1)
            Run("notepad.exe " & $cmdfile)

You are right about the "{" I omitted to escape them but it seems at this point in the regular expression it makes no difference

and the second example will not work if the regexp is more complicated

ie

$text="test123,try4a56"
$results = StringRegExp ($Text, "test(d+)|try([wd]+)", 3)
MsgBox (0, "Results", _ArrayToString ($results))

You say the empty match comes from the first parenthesis that does not match anything

I thought that when regexps dont match anything they dont return anything and not return nulls

Otherwise the the results would be full of nulls

[Edit]

The more I think of it the more I am inclined to believe it is a bug

StringRegExp is not supposed to return empty strings because it is supposed to return matches.

And if sth is a match it cannot be empty

I know it is not very important, it is not sth that cannot be corrected with a few additional lines of code

(ie a function that removes empty elements from arrays) but I work a lot with Regular expressions and

this seems to be a nuisance.

If there is no reasonable explanation why it should behave like this then I humbly believe that a member of the Development Team should give us his opinion

I also thought of a way to remove those empty strings in one-line-code as follows:

#include <Array.au3>
$text="test123,try4a56,test789,try98bcd7"
$results = StringRegExp ($Text, "test(d+)|try([wd]+)", 3) ;results with empty elements
$ne_results = StringRegExp (_ArrayToString ($results), "|?([^|]+)|?", 3) ;non empty results
MsgBox (0, "Results", _ArrayToString ($results) & @CRLF & _ArrayToString ($ne_results))
Exit
Edited by hawkair

Share this post


Link to post
Share on other sites
GEOSoft

i only tested the last example but this works on it

#include <Array.au3>
$text="test123,try4a56,test789,try98bcd7"
$results = StringRegExp ($Text, "(?i)(?:test|try)(w+)", 3)
_arraydisplay($results)
If you give us a few page links and the expected results then i can open the pages in the toolkit and test them for you. it looks like you may also need stringformat if you want the filenames written as i suspect. may i ask why you are using a batch file to do the rename?


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites
jchd

When there are capturing parenthesis in (say) both sides of a A|B alternation, the nonmatching part and its capturing parenthesis _does_ count.

Unless you specifically reset the parenthesis numbering, all capturing parenthesis have their own fixed number (starting at the left) and will eventually return an empty match in case the pattern matches as a whole. This is exactly what you experience and is not a bug in PCRE. All decent regexp engine I know off behave this way.

I warmly recommend you read more closely the PCRE documentation which is available at the PCRE site as part of the source package: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
hawkair

@jchd

First thank you so much for your time and effort to help

I am going to do some more research on the subject (although I hoped for some quick explanation)

I am creating a batch file so as to be able to do a final check and edit because file naming is not standard and its easy to mess up

example text was given on previous post but I repeat it here:

Season 2, Episode 1: {Peter, Peter, Caviar Eater}

Original Air Date-23 September 1999

Peter changes for the worse after he and Lois inherit a mansion in Newport.

Next US airings:

{Mon. Mar. 14} {10:30 PM} {TBS }

{Tue. Mar. 22} {6:30 PM} {CW }

------------------------------------

Season 2, Episode 2: {Holy Crap}

Original Air Date-30 September 1999

Peter's ultra-religious father causes trouble when he comes to live with the family.

Next US airings:

{Thur. Mar. 24} {6:30 PM} {CW }

newer:

Season 3

{[What Lies Ahead]

S3, Ep1

}

Oct. 16, 2011

{What Lies Ahead}

The group heads for Fort Benning. But along the way, they run into a herd of zombies and Sophia runs off in fear. Now Rick and the others try to track her down.

{[bloodletting]

S3, Ep2

}

Oct. 23, 2011

{Bloodletting}

After Carl gets shot, Rick and Shane find Dr. Hershel Greene and his family in a nearby house. But Carl has lost too much blood and Hershel doesn't have everything he needs. Could Carl die before the right equipment is found?

I would like having the previous text to give season number and episode number

and to get the Episode title with aregular expression in order to make a filename like

Lost.S02E01.Peter, Peter, Caviar Eater.avi

Lost.S02E02.Holy Crap.avi

Lost.S03E01.What Lies Ahead.avi

Lost.S03E02.Bloodletting.avi

As you see the format of the episodes is entirely different and cannot be in the form

(?:a.*|b.*)([wd]*)

that is have only one capturing parenthesis

And I have encountered in the past many occasions this is the case

I am an absolute beginner in Java but I will try to setup this example and see how regexp works there..

Edit

I found the solution exactly where you pointed

I use this (?|test(d+)|try([wd]+)) which tells that the matched parentheses have the same capturing number

(normally matching parentheses get assigned a number 1 2 etc used eg in the replacement string and returning an empty match is a way to say that the 2nd part of the (|) matched and not the first)

so the example becomes

#include <Array.au3>
$text="test123,try4a56,test789,try98bcd7"
$results = StringRegExp ($Text, "(?|test(d+)|try([wd]+))", 3)
MsgBox (0, "Results", _ArrayToString ($results))
Exit

And there is a TON of info there on regular expressions!!

Thank you, thank you, thank you!!!

Edited by hawkair

Share this post


Link to post
Share on other sites
jchd

(?| is a nice way to have all alternation branches use the same capture number(s) but you may also have to use K, or switch to more advanced features in some more chaotic contexts.

Here is another link for a good tutorial about regexps you will benefit from: http://www.asiteaboutnothing.net/regexp/ but keep in mind that this last one isn't dedicated to PCRE (the flavor used by AutoIt) and tries to cover other common regexp engines.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
hawkair

Hi guys

I hope I will be excused for resurrecting an old post.

But there was something eating me

The code I posted before, for renaming episode files, was in fact more than ten years old. I wrote it, used it then forgot it,

Recently I tried to use it again and it didnt work as expected. I thought it was something that could be fixed with regular expressions so I tried.

And I was too lazy to improve it, that is more lazy than I felt shame :oops: so I posted it unchanged.

And when I read it trying to remember what I had done I realized that it wont be of much use to anyone here

But there were some other things that had to be corrected in the code: new movie formats, new rip groups.

So armed with the tricks, tactics and practices I have picked up along the years and my knowledge of regexps I rewrote the code

Now it is slim elegant, readable, presentable, extendable, has increased error checking and even has ...comments(!).

It is not complete, has only been tested on one set of files, so there may be some mistakes in the code but

I believe it can be easily made to handle new sets just by updating the variables $Keywords and $ReplacementPairs

I am not a regular coder but I am almost proud with the result.

It is a true hymn to laziness.

I would happily accept comments, suggestions or improvements

The reasons for the existence of this program are:

1. movie players require the subtitle file to have the same name as the movie

2. episodes with different naming format appear unsorted ie the 11th may come before the 1st etc

3. its nice to see the title of the episode you are viewing

The following code takes a directory of files like this

The Walking Dead S02E11 PROPER 720p HDTV x264-IMMERSE [R3b3loi Team].srt

The.Walking.Dead - 2x01 - What.Lies.Ahead.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv

The.Walking.Dead - 2x02 - Bloodletting.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv

The.Walking.Dead - 2x03 - Save.the.Last.One.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv

The.Walking.Dead - 2x04 - Cherokee.Rose.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv

The.Walking.Dead - 2x05 - Chupacabra.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv

The.Walking.Dead - 2x07 - Pretty.Much.Dead.Already.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv

The.Walking.Dead.S02E01.720p.HDTV.x264-IMMERSE(720P).mkv

The.Walking.Dead.S02E01.720p.HDTV.x264-IMMERSE(720P).srt

the.walking.dead.s02e06.720p.hdtv.x264-orenji(2).mkv

the.walking.dead.s02e08.720p.hdtv.x264-orenji.mkv

The.Walking.Dead.S02E09.720p.HDTV.x264-IMMERSE(720p).mkv

The.Walking.Dead.S02E09.HDTV.XviD-ASAP.srt

The.Walking.Dead.S02E10.720p.HDTV.x264-IMMERSE.mkv

The.Walking.Dead.S02E11.PROPER.720p.HDTV.x264-IMMERSE.mkv

Walking Dead (The) 2x02 - Bloodletting.srt

Walking Dead (The) 2x03 - Save the Last One.srt

Walking Dead (The) 2x04 - Cherokee Rose.srt

Walking Dead (The) 2x05 - Chupacabra.srt

Walking Dead (The) 2x06 - Secrets.srt

Walking Dead (The) 2x07 - Pretty Much Dead Already.srt

Walking Dead (The) 2x08 - Nebraska.srt

Walking Dead (The) 2x09 - Triggerfinger.srt

Walking Dead (The) 2x10 - 18 Miles Out.srt

Walking Dead (The) 2x11 - Judge, Jury, Executioner.srt

and creates a batch file like this

ren "The.Walking.Dead - 2x01 - What.Lies.Ahead.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv" "The.Walking.Dead.S02E01.What Lies Ahead.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

ren "The.Walking.Dead.S02E01.720p.HDTV.x264-IMMERSE(720P).srt" "The.Walking.Dead.S02E01.What Lies Ahead.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

ren "The.Walking.Dead - 2x02 - Bloodletting.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv" "The.Walking.Dead.S02E02.Bloodletting.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

ren "Walking Dead (The) 2x02 - Bloodletting.srt" "The.Walking.Dead.S02E02.Bloodletting.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

ren "The.Walking.Dead - 2x03 - Save.the.Last.One.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv" "The.Walking.Dead.S02E03.Save the Last One.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

ren "Walking Dead (The) 2x03 - Save the Last One.srt" "The.Walking.Dead.S02E03.Save the Last One.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

ren "The.Walking.Dead - 2x04 - Cherokee.Rose.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv" "The.Walking.Dead.S02E04.Cherokee Rose.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

ren "Walking Dead (The) 2x04 - Cherokee Rose.srt" "The.Walking.Dead.S02E04.Cherokee Rose.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

ren "The.Walking.Dead - 2x05 - Chupacabra.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv" "The.Walking.Dead.S02E05.Chupacabra.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

ren "Walking Dead (The) 2x05 - Chupacabra.srt" "The.Walking.Dead.S02E05.Chupacabra.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

ren "The.Walking.Dead - 2x07 - Pretty.Much.Dead.Already.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv" "The.Walking.Dead.S02E07.Pretty Much Dead Already.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

ren "Walking Dead (The) 2x07 - Pretty Much Dead Already.srt" "The.Walking.Dead.S02E07.Pretty Much Dead Already.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

# Duplicate Episode file: The.Walking.Dead.S02E01.720p.HDTV.x264-IMMERSE(720P).mkv

ren "the.walking.dead.s02e06.720p.hdtv.x264-orenji(2).mkv" "The.Walking.Dead.S02E06.Secrets.720p.HDTV.x264-Orenji.mkv"

ren "Walking Dead (The) 2x06 - Secrets.srt" "The.Walking.Dead.S02E06.Secrets.720p.HDTV.x264-Orenji.srt"

ren "the.walking.dead.s02e08.720p.hdtv.x264-orenji.mkv" "The.Walking.Dead.S02E08.Nebraska.720p.HDTV.x264-Orenji.mkv"

ren "Walking Dead (The) 2x08 - Nebraska.srt" "The.Walking.Dead.S02E08.Nebraska.720p.HDTV.x264-Orenji.srt"

ren "The.Walking.Dead.S02E09.720p.HDTV.x264-IMMERSE(720p).mkv" "The.Walking.Dead.S02E09.Triggerfinger.720p.HDTV.x264-IMMERSE.mkv"

ren "Walking Dead (The) 2x09 - Triggerfinger.srt" "The.Walking.Dead.S02E09.Triggerfinger.720p.HDTV.x264-IMMERSE.srt"

# Duplicate Subtitle file: The.Walking.Dead.S02E09.HDTV.XviD-ASAP.srt

ren "The.Walking.Dead.S02E10.720p.HDTV.x264-IMMERSE.mkv" "The.Walking.Dead.S02E10.18 Miles Out.720p.HDTV.x264-IMMERSE.mkv"

ren "Walking Dead (The) 2x10 - 18 Miles Out.srt" "The.Walking.Dead.S02E10.18 Miles Out.720p.HDTV.x264-IMMERSE.srt"

ren "The.Walking.Dead.S02E11.PROPER.720p.HDTV.x264-IMMERSE.mkv" "The.Walking.Dead.S02E11.Judge, Jury, Executioner.PROPER.720p.HDTV.x264-IMMERSE.mkv"

ren "Walking Dead (The) 2x11 - Judge, Jury, Executioner.srt" "The.Walking.Dead.S02E11.Judge, Jury, Executioner.PROPER.720p.HDTV.x264-IMMERSE.srt"

# Duplicate Subtitle file: The Walking Dead S02E11 PROPER 720p HDTV x264-IMMERSE [R3b3loi Team].srt

The Files after renaming will become:

"The.Walking.Dead.S02E01.What Lies Ahead.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

"The.Walking.Dead.S02E01.What Lies Ahead.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

"The.Walking.Dead.S02E02.Bloodletting.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

"The.Walking.Dead.S02E02.Bloodletting.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

"The.Walking.Dead.S02E03.Save the Last One.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

"The.Walking.Dead.S02E03.Save the Last One.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

"The.Walking.Dead.S02E04.Cherokee Rose.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

"The.Walking.Dead.S02E04.Cherokee Rose.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

"The.Walking.Dead.S02E05.Chupacabra.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

"The.Walking.Dead.S02E05.Chupacabra.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

"The.Walking.Dead.S02E06.Secrets.720p.HDTV.x264-Orenji.mkv"

"The.Walking.Dead.S02E06.Secrets.720p.HDTV.x264-Orenji.srt"

"The.Walking.Dead.S02E07.Pretty Much Dead Already.720p.WEB-DL.AAC2.0.H.264-CtrlHD.mkv"

"The.Walking.Dead.S02E07.Pretty Much Dead Already.720p.WEB-DL.AAC2.0.H.264-CtrlHD.srt"

"The.Walking.Dead.S02E08.Nebraska.720p.HDTV.x264-Orenji.mkv"

"The.Walking.Dead.S02E08.Nebraska.720p.HDTV.x264-Orenji.srt"

"The.Walking.Dead.S02E09.Triggerfinger.720p.HDTV.x264-IMMERSE.mkv"

"The.Walking.Dead.S02E09.Triggerfinger.720p.HDTV.x264-IMMERSE.srt"

"The.Walking.Dead.S02E10.18 Miles Out.720p.HDTV.x264-IMMERSE.mkv"

"The.Walking.Dead.S02E10.18 Miles Out.720p.HDTV.x264-IMMERSE.srt"

"The.Walking.Dead.S02E11.Judge, Jury, Executioner.PROPER.720p.HDTV.x264-IMMERSE.mkv"

"The.Walking.Dead.S02E11.Judge, Jury, Executioner.PROPER.720p.HDTV.x264-IMMERSE.srt"

Here is the program

#include <IE.au3>
AutoItSetOption("WinTitleMatchMode", 2)
$Inifile = @ScriptDir & "settings.ini"
$Section = @ComputerName ;so I can have different settings on different PCs
$cmd = ""
$cmdundo = ""
$EpsDone = "";Keeps the episodes "SxxEyy" already processed to check for duplicates (with slightly different names)
;~ $EpsFormats = "(?i)[. -[](?|(d)x(dd)|S(d{1,2})E(d{1,2})|(d)(dd))[. -]]"
;~ Episodes may appear in these formats name.1x01, name-1x01,  name [1x01], name.S01E01, name.s1e1, name.s1e01, name.101
$EpsFormats = '(?i)[. -[](?|(d)x(dd)|S(d{1,2})E(d{1,2}))[. -]]'
$EpsList = ""
;~ The next pattern extracts the Episodes List from IMDB page, removing ads, irelevant info etc
$IMDbEpsListpatn = "(?s).*?Your Watchlist . (.*?) ad feedback.*"
;~ The next pattern finds the episode SxxEyy from Episode List and returns its title
;~ The secong part of "|" is for Episodes.txt I have already downloaded before IMDB changed the format
$EpsPattern ='(?|S$s, Ep$e.*?, dddd (.*?) *R|Season $s, Episode $e: {([^}]+)})'
;~ The Url pointing at the IMDB Episodes page
$IMDbEpsPage = "http://www.imdb.com/title/tt1520211/episodes?season=2"
;~ Valid keywords that may appear in the name or words that contain a keyword and must be removed(handled with $ReplacementPairs)
$Keywords = "(?i)(PROPER|REPACK|720p|(720P)|HDTV|x264|WEB-DL.AAC2.0.H.264|IMMERSE|CtrlHD|Orenji)"
$MovieExtensions = "avi,mkv"
;~ A list of replacements for keywords - the left part is replaced by the right part,leave the right part empty for deletion
;~ The last comma is required as it marks the end of a field
$ReplacementPairs = "orenji=Orenji,hdtv=HDTV,(720P)=,(720p)=" & ","
$SeriesName = "The.Walking.Dead"
;~ Subtitles may appear in these formats name.101, name-1x01,  name.S01E01, Update this as needed:
;~ $S1/$E1 stand for Season/Episode number with length = 1(ie 1,2,3)
;~ $S2/$E2 stand for Season/Episode number with length = 2(ie 01,02,03)
$subpatrn = StringSplit("*$S1$E2*.srt|*$S1x$E2*.srt|*S$S2E$E2*.srt", "|")
$TotalCmd = "Total Commander"
$visible = false ;if you want to see IE getting the Episode list set to true

;~ If you have Total commander then go to the desired directory with the movies and it will copy the path to the clipboard
If WinExists ($TotalCmd) Then
    WinActivate ($TotalCmd)
    WinWaitActive ($TotalCmd)
    Send ("{ESC}^+{ENTER}{RIGHT}^c") ;get the path from Total Commander to clipboard
EndIf

$Path = IniRead($Inifile, $Section, "MovieFolder", "");Check ini file if a previous pathname is stored
$text = ClipGet()                                    ;Check clipboard if it contains a valid pathname, override ini
$tmp = StringRegExp ($text, '"?(.*).*"?', 3)
If @error = 0 Then
    If FileExists ($tmp [0]) Then $Path = $tmp[0]
EndIf
$Path = InputBox("Enter", "Movies directory:", $Path);if the above found a path then suggest it else ask for input
If @error = 1 Then Exit 1 ; Cancel was pressed
If Not FileExists ($Path) Then
    MsgBox (0, "Error", $Path & " is not a valid path")
    Exit 2
EndIf
IniWrite($Inifile, $Section, "MovieFolder", $Path)    ;Update ini file with pathname

;~ If the Episodes file does not exist open IE and download it from Imdb. Variable $IMDbEpsPage must be set correctly
If Not FileExists($Path & "Episodes.txt") Then
    $oIE = _IECreate($IMDbEpsPage, 0, $visible)
    $EpsList = StringRegExpReplace (_IEBodyReadText($oIE), $IMDbEpsListpatn, "1")
    _IEQuit($oIE)
    FileWrite($Path & "Episodes.txt", $EpsList)
Else
    $EpsList = FileRead($Path & "Episodes.txt")
EndIf

$MovieExtension = StringSplit ($MovieExtensions, ",", 2)
$Ext = ""
For $i = 0 To UBound ($MovieExtension)
    $Moviesearch = FileFindFirstFile($Path & "*." & $MovieExtension[$i])
    If $Moviesearch = -1 Then ContinueLoop
    $Ext = $MovieExtension[$i]
    ExitLoop
Next
If $Ext = "" Then
    MsgBox(0, $Path , "No known movies found. Exiting...")
    Exit
EndIf
While 1
    $Moviefile = FileFindNextFile($Moviesearch)
    If @error Then ExitLoop
    $tmp = StringRegExp ($Moviefile, $EpsFormats, 2)
    If @error = 0 Then
        $SE = $tmp[0]
        $S = $tmp[1]
;~         The following could be achieved with if-then-elses, I just like to play with regexps and it is shorter
        $S1 = StringRegExpReplace ($S, "0*(d+)", "1"); S1 has leading zero removed, is 1 digit unless S1>10 (1,2,..10)
        $S2 = StringRegExpReplace ("0" & $S, "d*(d{2})$", "1"); S2 has 2 digits zero padded if needed (01,02,...,11,12)
        $E = $tmp[2]
        $E1 = StringRegExpReplace ($E, "0*(d+)", "1"); E1 has leading zero removed, is 1 digit unless E1>10 (1,2,..10)
        $E2 = StringRegExpReplace ("0" & $E, "d*(d{2})$", "1"); E2 has 2 digits zero padded if needed (01,02,...,11,12)
        $S2E2 = ".S" & $S2 & "E" & $E2
;~         Check if this is a duplicate file for an episode already done. Make a comment in the batch file
        If StringInStr ($EpsDone, $S2E2) > 0 Then
            $cmd = $cmd & "# Duplicate Episode file: " & $Moviefile & @CRLF
            ContinueLoop
        EndIf
        $EpsDone = $EpsDone & $S2E2
    Else
        MsgBox (0, "Error", "File name is not standard format: " & $Moviefile)
        Exit
    EndIf
    $EpsTitle = ""
    If $EpsList <> "" Then
        $tmp = StringRegExp ($EpsList, StringReplace (StringReplace ($EpsPattern, "$s", $S1), "$e", StringRegExpReplace ( $E1, "0*(d+)", "1")), 3)
        If @error = 0 Then
            $EpsTitle = "." & StringRegExpReplace($tmp[0], "[?:/]", "") ;Remove chars not proper for filenames
        Else
            MsgBox (0, $Moviefile, "Title for episode not found: " & $SE)
        EndIf
    EndIf

    $EpsInfo = ""
    $tmp = StringRegExp ($Moviefile, $SE & "(.*)", 3)
    If @error = 0 Then $EpsInfo = $tmp[0]
    If $EpsInfo <> "" Then
        $Part = StringRegExp ($EpsInfo, $Keywords, 3)
        $tmp = ""
        For $i = 0 To UBound ($Part) - 1
;~             Escape special characters for regexps, that may be present in the substitution pair
;~             then get the text on the right hand side of equals
            $Replacement = StringRegExp ($ReplacementPairs, StringRegExpReplace ($Part[$i], "([()[]|.])", "$0") & "=([^,]*),", 3)
            If $Part[$i] = "(720P)" Then MsgBox (0, $Part[$i], StringRegExpReplace ($Part[$i], "([()[]|.])", "$0") & "=([^,]*)," & @CRLF & $Replacement)
            If @error = 0 Then $Part[$i] = $Replacement[0]
;~             For parts that must be deleted just enter an empty replacement
            If $Part[$i] <> "" Then $tmp = $tmp & "." & $Part[$i]
        Next
;~         The Last (nonempty) part is usually the ripper group and is preceded by "-"
        $NewMovieName = $SeriesName & $S2E2 & $EpsTitle & StringRegExpReplace ($tmp, ".([^.]+)$", "-$1") & "." & $Ext
    EndIf
    $cmd = $cmd & "ren """ & $Moviefile & """ """ & $NewMovieName & """" & @CRLF
    $cmdundo = $cmdundo & "ren """ & $NewMovieName & """ """ & $Moviefile & """" & @CRLF
    $Found = 0
    For $i = 1 To $subpatrn[0]
        $newpatrn = StringReplace(StringReplace(StringReplace(StringReplace($subpatrn[$i], "$S1", $S1), "$E1", $E1), "$S2", $S2), "$E2", $E2)
        $subsearch = FileFindFirstFile($Path & $newpatrn)
        If $subsearch <> -1 Then
            While 1
                $SubFile = FileFindNextFile($subsearch)
                If @error Then ExitLoop
                $Greek = StringRegExp(FileRead($Path & $SubFile), "[αβγδεζηθικλμνξοπρστυφχψω]", 3)
                If UBound($Greek) > 50 Then      ;If there are many greek chars in subtitle then it is in greek
                    $lng = ""                    ;my default is greek so no label
                    $lngBit = 1
                Else
                    $lng = "eng."                ;declare this subtitle as english
                    $lngBit = 2
                EndIf
                $NewSubtitleName = StringReplace($NewMovieName, $Ext, $lng & "srt")
                If FileExists ($Path & $NewSubtitleName) Then $Found = BitOR ($Found , $lngBit); if a duplicate subtitle already exists with the correct name then do nothing
                If BitAND ($Found , $lngBit) = 0 Then
                    $cmd = $cmd & "ren """ & $SubFile & """ """ & $NewSubtitleName & """" & @CRLF
                    $cmdundo = $cmdundo & "ren """ & $NewSubtitleName & """ """ & $SubFile & """" & @CRLF
                    $Found = BitOR ($Found , $lngBit)
                Else
;~                     This is a duplicate file for an episode subtitle already done. Make a comment in the batch file
                    $cmd = $cmd & "# Duplicate Subtitle file: " & $SubFile & @CRLF
                EndIf
            WEnd
        EndIf
        FileClose($subsearch)
    Next
    If $Found = 0 Then $cmd = $cmd & "# " & $Moviefile & " has no corresponding Subtitle" & @CRLF
WEnd
FileClose($Moviesearch)
$cmdfile = $Path & "ren.bat"
$undofile = $Path & "revren.bat"
FileDelete ($cmdfile)
FileWrite($cmdfile, $cmd)
FileDelete ($undofile)
FileWrite($undofile, $cmdundo)
Run("notepad.exe " & $cmdfile)
;~ check results, select which duplicates to remove if necessary and then run cmdfile
;~ or update the $Keywords, $ReplacementPairs or other variables as needed
Exit

Disclaimer: Even though no changes are made to the movie or subtitle names unless you run the batch file

you should backup your files before you decide to use/test this. I take no responsibility for any undesirable events

resulting from the use or inability to use of this program

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×