Reg Exp to capture specific section of document

zfisherdrums · October 28, 2008

Goal:

Using StringRegExp, obtain the contents of a specific section in a larger document.

Notes:

> The sections are denoted by a section header and ended with two newline sequences ( see examples below ).

> For the purposes of brevity, I am trying to only capture the Differences, but I will be using a similar pattern to obtain contents from "Missing Baseline Files", "Inside Tolerance", etc.

> C# RegExp engine successfully returned when this pattern was used: Differences\n-+\n.*?^\n

Problem:

The problem is that I cannot replicate the equivalent C# pattern - specifically to tell pcre to match the two newline sequences at the end of the section. I have been able to translate up to here: Differences\R-+\R.*?, but that fails to provide the contents of the section. What do I fill in the blank to make this behave as expected?

Differences\R-+\R.*?_____

Context:

Here is the document :

CODE

Missing Baseline Files

----------------------

05\Annual1.txt

06\Annual1.txt

Differences

------

01\Annual1.txt

02\Annual1.txt

03\Annual1.txt

04\Annual1.txt

Inside Tolerance Only

---------------------

07\Annual1.txt

08\Annual1.txt

This is what I want to extract:

Differences
------
01\Annual1.txt
02\Annual1.txt
03\Annual1.txt
04\Annual1.txt

This is what I'm getting using Differences\R-+\R.*?

Differences
------

Edited October 28, 2008 by zfisherdrums

Szhlopp · October 28, 2008

Goal:
Using StringRegExp, obtain the contents of a specific section in a larger document.

Notes:
The sections are denoted by a section header and ended with two newline sequences ( see examples below ).
C# RegExp engine successfully returned when this pattern was used: Differences\n-+\n.*?^\n

Problem:
The problem is that I cannot replicate the equivalent C# pattern - specifically to tell pcre to match the two newline sequences at the end of the section. I have been able to translate up to here: Differences\R-+\R.*?, but that fails to provide the contents of the section. What do I fill in the blank to make this behave as expected?

Differences\R-+\R.*?_____

Context:
Here is the document :

CODE
Missing Baseline Files
----------------------
None

Differences
------
01\Annual1.txt
02\Annual1.txt
03\Annual1.txt
04\Annual1.txt

Inside Tolerance Only
---------------------
None
This is what I want to extract:
Differences
------
01\Annual1.txt
02\Annual1.txt
03\Annual1.txt
04\Annual1.txt
This is what I'm getting using Differences\R-+\R.*?
Differences
------

Here ya go:

(Differences\s?\s?(?:\n.*)+?)\s?\n?\s?\n?Inside

I've got a nice SRE tester in my sig if you want to use it

zfisherdrums · October 28, 2008

Here ya go:
(Differences\s?\s?(?:\n.*)+?)\s?\n?\s?\n?Inside
I've got a nice SRE tester in my sig if you want to use it

Thanks for the tool link in you sig; I'm trying it out as I type (?)!

The part of the problem I failed to mention in that I want it to be able to capture the contents without specifying the name of the following section. Put another way, how would one obtain the contents without having to mention "Inside"? I'm looking for a generic approach as I'll be using the pattern to obtain contents from several other sections. Make sense?

Szhlopp · October 28, 2008

Thanks for the tool link in you sig; I'm trying it out as I type (?)!
The part of the problem I failed to mention in that I want it to be able to capture the contents without specifying the name of the following section. Put another way, how would one obtain the contents without having to mention "Inside"? I'm looking for a generic approach as I'll be using the pattern to obtain contents from several other sections. Make sense?

Yes it does. It seems like it might be easier to just get an array of the items?

\d\d\\[[:alnum:]\.\-\_]*

Add a () around the part you want. Right now it grabs "01\Text.extension"

Ex:

\d\d\\([[:alnum:]\.\-\_]*)

"Text.extension"

SRE with Flag 3. Let me know if this solves it for you.

Szh

zfisherdrums · October 28, 2008

\d\d\\[[:alnum:]\.\-\_]*

Ok...cool...I'm close. Using your example with Flag set to 4, I'm able to procure the elements in the array ( including the parent folder in the string ).

I modified the example document in my original posting to be more representative of what happens in our domain. You see, we can potentially have lines that match this pattern in other sections as well. So how does one grab just the section they want and no more?

I realize now that I've made a few assumptions in my description of the problem. For that, I apologize and thank you for any/all the time you've spent helping me with this.

Zach...

PS: If it means what I think it does, I'm a JF too.

Edited October 28, 2008 by zfisherdrums

Sign In

Reg Exp to capture specific section of document

Recommended Posts

zfisherdrums

Link to comment

Share on other sites

Szhlopp

Link to comment

Share on other sites

zfisherdrums

Link to comment

Share on other sites

Szhlopp

Link to comment

Share on other sites

zfisherdrums

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta