# Regular Expressions

## Recommended Posts

Hi everyone,

Does anybody know how I can filter only the first filename from the text below ?

bogus test C:\test\foo\bar\example.txt. random text C:\test\foo\bar\nomatch.txt.

(Where c:\test is fixed and already availabe from a variable)

I tried following expression with StringRegExp but it doesn't work :

([A-Za-z]:\\.+)\$$.+)\\(.+[.][A-Za-z0-9]{3})([. ]|[ ]) Eventually, I would like to have the relative path (e.g. "foo\bar\"), and the filename (e.g. "example.txt") stored in an array (return value from StringRegExp ofcourse ) I've been working on this almost all day so any help would be appreciated! Thanks in advance!! JJ p.s. I couldn't find it in the helpfile but isn't there a way to skip some groups from capturing. I mean that with this example; (a|(?:c) The second group wouldn't be in the array returned from StringRegExp? ##### Link to post ##### Share on other sites I hope this helps you out. #include <Array.Au3> a = "C:\test\foo\bar\nomatch.txt" b = StringReplace(a,"nomatch.txt","") c = StringReplace(b,"C:\test\","") MsgBox(0,a,"b = "&b&@CR&"c = "&c) d = StringSplit(c,"\") _ArrayDisplay(d,"") . ##### Link to post ##### Share on other sites Hi! thanks for the reply! I don't know what the contents of the second filename is. It may or may not be there. The only thing I know is this : var = "C:\test" msg = "This can be anythingThis can be anything C:\test\every\possible\sub\example.txt There COULD be some text here... but not nessecary and there could be another filename here like C:\test\every\possible\sub2\anothersub\foobar.log." The first filename in the string may or may not include a trailing dot (e.g. bla.txt.) From the first file given I'd like the relative path (e.g. every\possible\sub$$, and the filename (e.g. example.txt)

So I'm looking for a regular expression that helps me out... But I can't get it to work (for the expression, see my first post)...

EDIT:

Tested it again, the following example :

#Include <Array.au3>

$exp = "([A-Za-z]:\\.+)\$$.+\$$(.+[.][A-Za-z0-9]{3})"$string = "test bla C:\foo\bar\test.txt bla test C:\test\bla\foobar.log"

$data = StringRegExp($string, $exp, 1) If IsArray($data) Then
_ArrayDisplay($data, "Results") Else MsgBox(64, "Info", "No match found") EndIf This gives back an array with the following values : [0] = C:\foo\bar\test.txt bla test C:\test [1] = bla\ [2] = foobar.log But I want it to be [0] = C:\foo\bar\test.txt [1] = bar\ [2] = test.txt I hope my explanation is not too complex Thanks in advance! Edited by j_stam_84 ##### Link to post ##### Share on other sites Can the file names include spaces? e.g. c:\long file\name.txt ? If not you could use: ([A-Za-z]:\$^\w]+) ...to capture the first file name. Edited by DaveF Yes yes yes, there it was. Youth must go, ah yes. But youth is only being in a way like it might be an animal. No, it is not just being an animal so much as being like one of these malenky toys you viddy being sold in the streets, like little chellovecks made out of tin and with a spring inside and then a winding handle on the outside and you wind it up grrr grrr grrr and off it itties, like walking, O my brothers. But it itties in a straight line and bangs straight into things bang bang and it cannot help what it is doing. Being young is like being like one of these malenky machines. ##### Link to post ##### Share on other sites I really don't know how you can possibly do this reliably with the rules you have to work with. If the text in the middle really is random it creates an unsolvable condition. I think you need another constraint of some sort or you need to be satisfied with something that works most of the time and fix problems by inspection (or do a FileExists test on the result and either flag problems or invoke alternate logic until you get a match). Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl Automate input type=file (Related) SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble ##### Link to post ##### Share on other sites @Dale Hi! I think that, since there would be more than one character after the first filename (a dot AND a space or just a space AND the first character from the next word), it is possible. After the last filename, the only character left in the string is a dot... Beside from that, in my situation there will never be a folder that ends with a dot and three characters. So why wouldn't it be possible :S Thanks for replying btw grtz! ##### Link to post ##### Share on other sites My point is that since "C:\test\test.txt" and "C:\test\test.txt foobar.exe" are both valid filenames (however impractical or improbable the second one is), unless you can devise another constraint or you make some assumtions, you can't create a reliable solution. Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl Automate input type=file (Related) SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble ##### Link to post ##### Share on other sites @Dale Aha I see what you mean (and now just hoping I understood you correctly :"> ) There will never be a situation in where the two filenames are directly behind eachother. So.... "C:\test\test.txt foobar.exe" It would be : "C:\test\test.txt. SOME TEXT foobar.exe." <-- with trailing dot and a dot after .txt Or "C:\test\test.txt. SOME TEXT foobar.exe." <-- with trailing dot and without a dot after .txt Or "C:\test\test.txt." grtz, JJ p.s. I'm sorry if this wasn't what you meant ##### Link to post ##### Share on other sites Actually, I wasn't worried about the second file name complicating the issue, but rather what could be in the "random text" section. If it is truely random, it could start with some text that could potentially be added to the filename you are interested in without making it an invalid file name. Suggestion: check out RegexBuddy at http://www.regexbuddy.com It is a 29 piece of software with a 30 day trial... lets you easily play with RegEx and see the results... Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl Automate input type=file (Related) SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble ##### Link to post ##### Share on other sites Matching only the first instance of something happens in your script, not in the expression. I leave translating this into AutoIt's homemade implementation an exercise for the reader. [a-zA-z]{1}[:\$[\d+|\w+|\s+\\]+[\d+|\w+|\s+]\.\w{3} Matches: C:\te43 st\fc oo\bar\ex ample.txt. <-- but not the period Doesn't match: Anything it shouldn't as far as I could tell Edited by Alterego ##### Link to post ##### Share on other sites Matching only the first instance of something happens in your script, not in the expression. I leave translating this into AutoIt's homemade implementation an exercise for the reader. [a-zA-z]{1}[:\\][\d+|\w+|\s+\\]+[\d+|\w+|\s+]\.\w{3} Matches: C:\te43 st\fc oo\bar\ex ample.txt. <-- but not the period Doesn't match: Anything it shouldn't as far as I could tell <{POST_SNAPBACK}> The conversion to the AutoIt Regexp would be: $pattern = "([a-zA-Z]:[\\a-zA-Z 0-9]+?\.\w{1,3})"
$array=StringRegExp($test, $pattern,1) If @Extended then ;$array[0] contains the matching text.  Do what you will
Endif

Give that a shot. The {1,3} matches 1 to 3 characters after the dot, not just 3.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

##### Share on other sites

@Nutster & AlterEgo

Thanks!!! This was EXACTLY what I needed!!! Although I came to realize while analyzing your expression, how STUPID I am... I couldn't find how to find the smallest match instead of the largest (still didn't see it when I read the help for StringRegExp some 15 times) 'cause I figured that was the thing that was wrong with my expression.

Then I started wondering why you had a question mark after the plus sign... so I opened the helpfile... and guess what I found out..........

? (after a repeating character) Find the smallest match instead of the largest.

Thanks a million!!!!

Edited by j_stam_84
##### Share on other sites

I am in need of pattern search strings for determining series and episode info from filenames. They are usually shown as one of the following:

s01e01
01x01
1x01
101

in filenames like:

doctorwho.s01e01.thename.avi
doctorwho.01x01.thename.avi
doctorwho.1x01.thename.avi
doctorwho.101.thename.avi

I am afraid that I am just not wrapping my brain around how the StringRegExp works.

Thanks

##### Share on other sites

I am in need of pattern search strings for determining series and episode info from filenames.  They are usually shown as one of the following:

s01e01
01x01
1x01
101

in filenames like:

doctorwho.s01e01.thename.avi
doctorwho.01x01.thename.avi
doctorwho.1x01.thename.avi
doctorwho.101.thename.avi

I am afraid that I am just not wrapping my brain around how the StringRegExp works.

Thanks

<{POST_SNAPBACK}>

For your issue since I havent gotten into the RegEx either I will suggest another option. Check out the StringSplit() function.

$strOne = doctorwho.s01e01.thename.avi$strTwo = doctorwho.01x01.thename.avi
$strThree = doctorwho.1x01.thename.avi$strFour = doctorwho.1x01.thename.avi

$aryString = StringSplit($strOne, ".")
MsgBox(0, "$strOne", "The Name: " &$aryString[3])

;you can even put it in a loop of some sort depending on how you are getting the strings.

Let me know if you need any more help or have more questions,

JS

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

##### Share on other sites
• 2 weeks later...

I am trying to use StringRegExp for the first time - and I think I have worked out a pattern for me to use. - I then thought that I would look at the example above - just to try and learn from it ..and

$msg = "This can be anythingThis can be anything C:\test\every\possible\sub\example.txt There COULD be some text here... but not nessecary and there could be another filename here like C:\test\every\possible\sub2\anothersub\foobar.log."$pattern = "(\a:[\\a-zA-Z 0-9]*\.\w{1,3})"
$array=StringRegExp($msg, $pattern,3) For$x = 0 to UBound($array) - 1 msgBox(0,"",$array[$x]) Next The above works But according to the help file \A Match any alphanumeric character (a-z, A-Z, 0-9) If I sustitute in "\A" for "a-zA-Z 0-9" the the pattern does not work ie [CODE]$msg = "This can be anythingThis can be anything C:\test\every\possible\sub\example.txt There COULD be some text here... but not nessecary and there could be another filename here like C:\test\every\possible\sub2\anothersub\foobar.log."
$pattern = "(\a:[\\\A]*\.\w{1,3})"$array=StringRegExp($msg,$pattern,3)
For $x = 0 to UBound($array) - 1
msgBox(0,"",$array[$x])
Next

This does not work.

I must be interpreting the Help file incorrectly.

Should I be able to do this ?

Thanks for any help.

##### Share on other sites
• 2 months later...

Actually, I wasn't worried about the second file name complicating the issue, but rather what could be in the "random text" section.  If it is truely random, it could start with some text that could potentially be added to the filename you are interested in without making it an invalid file name.

Suggestion: check out RegexBuddy at http://www.regexbuddy.com

It is a \$29 piece of software with a 30 day trial... lets you easily play with RegEx and see the results...

Dale

<{POST_SNAPBACK}>

http://renschler.net/RegexBuilder/

This signature is computer generated, nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#.......

## Create an account

Register a new account

• ### Recently Browsing   0 members

×

• Wiki

• Back

• #### Beta

• Git
• FAQ
• Our Picks
×
• Create New...