Jump to content

Regular Expressions


Recommended Posts

Hi everyone,

Does anybody know how I can filter only the first filename from the text below ?

bogus test C:\test\foo\bar\example.txt. random text C:\test\foo\bar\nomatch.txt.

(Where c:\test is fixed and already availabe from a variable)

I tried following expression with StringRegExp but it doesn't work :

([A-Za-z]:\\.+)\\(.+)\\(.+[.][A-Za-z0-9]{3})([. ]|[ ])

Eventually, I would like to have the relative path (e.g. "foo\bar\"), and the filename (e.g. "example.txt") stored in an array (return value from StringRegExp ofcourse ;))

I've been working on this almost all day so any help would be appreciated!

Thanks in advance!!

JJ

p.s. I couldn't find it in the helpfile but isn't there a way to skip some groups from capturing. I mean that with this example;

(a|B)(?:c)

The second group wouldn't be in the array returned from StringRegExp?

Link to comment
Share on other sites

Hi! thanks for the reply!

I don't know what the contents of the second filename is. It may or may not be there. The only thing I know is this :

$var = "C:\test"

$msg = "This can be anythingThis can be anything C:\test\every\possible\sub\example.txt There COULD be some text here... but not nessecary and there could be another filename here like C:\test\every\possible\sub2\anothersub\foobar.log."

The first filename in the string may or may not include a trailing dot (e.g. bla.txt.)

From the first file given I'd like the relative path (e.g. every\possible\sub\), and the filename (e.g. example.txt)

So I'm looking for a regular expression that helps me out... But I can't get it to work (for the expression, see my first post)...

EDIT:

Tested it again, the following example :

#Include <Array.au3>

$exp = "([A-Za-z]:\\.+)\\(.+\\)(.+[.][A-Za-z0-9]{3})"

$string = "test bla C:\foo\bar\test.txt bla test C:\test\bla\foobar.log"

$data = StringRegExp($string, $exp, 1)

If IsArray($data) Then 
    _ArrayDisplay($data, "Results")
Else
    MsgBox(64, "Info", "No match found")
EndIf

This gives back an array with the following values :

[0] = C:\foo\bar\test.txt bla test C:\test

[1] = bla\

[2] = foobar.log

But I want it to be

[0] = C:\foo\bar\test.txt

[1] = bar\

[2] = test.txt

I hope my explanation is not too complex :(

Thanks in advance!

Edited by j_stam_84
Link to comment
Share on other sites

Can the file names include spaces? e.g. c:\long file\name.txt ?

If not you could use:

([A-Za-z]:\\[^\w]+)

...to capture the first file name.

Edited by DaveF

Yes yes yes, there it was. Youth must go, ah yes. But youth is only being in a way like it might be an animal. No, it is not just being an animal so much as being like one of these malenky toys you viddy being sold in the streets, like little chellovecks made out of tin and with a spring inside and then a winding handle on the outside and you wind it up grrr grrr grrr and off it itties, like walking, O my brothers. But it itties in a straight line and bangs straight into things bang bang and it cannot help what it is doing. Being young is like being like one of these malenky machines.

Link to comment
Share on other sites

I really don't know how you can possibly do this reliably with the rules you have to work with. If the text in the middle really is random it creates an unsolvable condition. I think you need another constraint of some sort or you need to be satisfied with something that works most of the time and fix problems by inspection (or do a FileExists test on the result and either flag problems or invoke alternate logic until you get a match).

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

@Dale

Hi! I think that, since there would be more than one character after the first filename (a dot AND a space or just a space AND the first character from the next word), it is possible. After the last filename, the only character left in the string is a dot... Beside from that, in my situation there will never be a folder that ends with a dot and three characters.

So why wouldn't it be possible :S

Thanks for replying btw :(

grtz!

Link to comment
Share on other sites

My point is that since "C:\test\test.txt" and "C:\test\test.txt foobar.exe" are both valid filenames (however impractical or improbable the second one is), unless you can devise another constraint or you make some assumtions, you can't create a reliable solution.

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

@Dale

Aha I see what you mean :( (and now just hoping I understood you correctly :"> )

There will never be a situation in where the two filenames are directly behind eachother.

So.... "C:\test\test.txt foobar.exe"

It would be :

"C:\test\test.txt. SOME TEXT foobar.exe." <-- with trailing dot and a dot after .txt

Or

"C:\test\test.txt. SOME TEXT foobar.exe." <-- with trailing dot and without a dot after .txt

Or

"C:\test\test.txt."

grtz,

JJ

p.s. I'm sorry if this wasn't what you meant :(

Link to comment
Share on other sites

Actually, I wasn't worried about the second file name complicating the issue, but rather what could be in the "random text" section. If it is truely random, it could start with some text that could potentially be added to the filename you are interested in without making it an invalid file name.

Suggestion: check out RegexBuddy at http://www.regexbuddy.com

It is a $29 piece of software with a 30 day trial... lets you easily play with RegEx and see the results...

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Matching only the first instance of something happens in your script, not in the expression. I leave translating this into AutoIt's homemade implementation an exercise for the reader.

[a-zA-z]{1}[:\\][\d+|\w+|\s+\\]+[\d+|\w+|\s+]\.\w{3}

Matches:

C:\te43 st\fc oo\bar\ex ample.txt. <-- but not the period

Doesn't match:

Anything it shouldn't as far as I could tell

Edited by Alterego
Link to comment
Share on other sites

Matching only the first instance of something happens in your script, not in the expression. I leave translating this into AutoIt's homemade implementation an exercise for the reader.

[a-zA-z]{1}[:\\][\d+|\w+|\s+\\]+[\d+|\w+|\s+]\.\w{3}

Matches:

C:\te43 st\fc oo\bar\ex ample.txt.  <-- but not the period

Doesn't match:

Anything it shouldn't as far as I could tell

<{POST_SNAPBACK}>

The conversion to the AutoIt Regexp would be:

$pattern = "([a-zA-Z]:[\\a-zA-Z 0-9]+?\.\w{1,3})"
$array=StringRegExp($test, $pattern,1)
If @Extended then
   ; $array[0] contains the matching text.  Do what you will
Endif

Give that a shot. The {1,3} matches 1 to 3 characters after the dot, not just 3.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

@Nutster & AlterEgo

Thanks!!! This was EXACTLY what I needed!!! Although I came to realize while analyzing your expression, how STUPID I am... I couldn't find how to find the smallest match instead of the largest (still didn't see it when I read the help for StringRegExp some 15 times) 'cause I figured that was the thing that was wrong with my expression.

Then I started wondering why you had a question mark after the plus sign... so I opened the helpfile... and guess what I found out..........

? (after a repeating character) Find the smallest match instead of the largest.

Thanks a million!!!!

Edited by j_stam_84
Link to comment
Share on other sites

I am in need of pattern search strings for determining series and episode info from filenames. They are usually shown as one of the following:

s01e01
01x01
1x01
101

in filenames like:

doctorwho.s01e01.thename.avi
doctorwho.01x01.thename.avi
doctorwho.1x01.thename.avi
doctorwho.101.thename.avi

I am afraid that I am just not wrapping my brain around how the StringRegExp works.

Thanks :(

Link to comment
Share on other sites

I am in need of pattern search strings for determining series and episode info from filenames.  They are usually shown as one of the following:

s01e01
01x01
1x01
101

in filenames like:

doctorwho.s01e01.thename.avi
doctorwho.01x01.thename.avi
doctorwho.1x01.thename.avi
doctorwho.101.thename.avi

I am afraid that I am just not wrapping my brain around how the StringRegExp works.

Thanks :(

<{POST_SNAPBACK}>

For your issue since I havent gotten into the RegEx either I will suggest another option. Check out the StringSplit() function.

$strOne = doctorwho.s01e01.thename.avi
$strTwo = doctorwho.01x01.thename.avi
$strThree = doctorwho.1x01.thename.avi
$strFour = doctorwho.1x01.thename.avi

$aryString = StringSplit($strOne, ".")
MsgBox(0, "$strOne", "The Name: " & $aryString[3])

;you can even put it in a loop of some sort depending on how you are getting the strings.

Let me know if you need any more help or have more questions,

JS

AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Link to comment
Share on other sites

  • 2 weeks later...

I am trying to use StringRegExp for the first time - and I think I have worked out a pattern for me to use. - I then thought that I would look at the example above - just to try and learn from it ..and

$msg = "This can be anythingThis can be anything C:\test\every\possible\sub\example.txt There COULD be some text here... but not nessecary and there could be another filename here like C:\test\every\possible\sub2\anothersub\foobar.log."
$pattern = "(\a:[\\a-zA-Z 0-9]*\.\w{1,3})"
$array=StringRegExp($msg, $pattern,3)
For $x = 0 to UBound($array) - 1
msgBox(0,"",$array[$x])
Next

The above works

But according to the help file

\A Match any alphanumeric character (a-z, A-Z, 0-9)

If I sustitute in "\A" for "a-zA-Z 0-9" the the pattern does not work

ie

[CODE]
$msg = "This can be anythingThis can be anything C:\test\every\possible\sub\example.txt There COULD be some text here... but not nessecary and there could be another filename here like C:\test\every\possible\sub2\anothersub\foobar.log."
$pattern = "(\a:[\\\A]*\.\w{1,3})"
$array=StringRegExp($msg, $pattern,3)
For $x = 0 to UBound($array) - 1
msgBox(0,"",$array[$x])
Next

This does not work.

I must be interpreting the Help file incorrectly.

Should I be able to do this ? :(

Thanks for any help.

Link to comment
Share on other sites

  • 2 months later...

Actually, I wasn't worried about the second file name complicating the issue, but rather what could be in the "random text" section.  If it is truely random, it could start with some text that could potentially be added to the filename you are interested in without making it an invalid file name.

Suggestion: check out RegexBuddy at http://www.regexbuddy.com

It is a $29 piece of software with a 30 day trial... lets you easily play with RegEx and see the results...

Dale

<{POST_SNAPBACK}>

http://renschler.net/RegexBuilder/

This signature is computer generated, nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#nothing can go wron#.......

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...