Sign in to follow this  
Followers 0
rbhkamal

Question about regular expressions

9 posts in this topic

If I had a string like the following:

<id>    -------------------------- #1
  <name>rbhkamal</name>
  <email></email>
</id>   -------------------------- #2
<session>
 <active>1</active>
 <id>rbhkamal</id> ---------- #3
</session>

How can make the regular expression below match only from #1 to #2? Right now it stops at #3.

<id>(?s)(.+)</id>

To be more precise; I need to know what can I use instead of "(?s)(.+)" to match anything but a specific word (in my case it's </id>).

I tried few failed attempts:

(?:\s?[^<][^\/][^i][^d][^>])+ ----This doesn't work and I don't know why!!!

([^(?:</id>)]*) ---- This one treats "</id>" each character individually.. not as a whole word

Any help is greatly appreciated!

Regards,

RK


"When the power of love overcomes the love of power, the world will know peace"-Jimi Hendrix

Share this post


Link to post
Share on other sites



So what exactly do you want it to return from that sample?

Share this post


Link to post
Share on other sites

i want everything starting from "<id>" untill you get to the first "</id>".


"When the power of love overcomes the love of power, the world will know peace"-Jimi Hendrix

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

your example looks like xml, the tree looks off though.

look into the XMLDOM object, its methods and properties in particular.

This site has been a good resource for me.

you can start from scratch and build your own stuff like:

$xmlFile = @ScriptDir & "\xmlfile.xml"

$xmldoc = ObjCreate( 'Microsoft.XMLDOM' );create an instance of the xmlDom Object
If Not IsObj($xmldoc) Then exit; check if the object is successfully created
$xmldoc.load($xmlFile)
$xmldoc.async = False



$ROOT = $xmldoc.documentElement; get the root element

ConsoleWrite($ROOT.tagName & " is the Root node of the xml file and has " & $ROOT.childNodes.length & " childNodes" & @CRLF& @CRLF)

;loop through the childnodes
for $i = 0 To $ROOT.childNodes.length -1
    With $ROOT.childNodes($i)
        ConsoleWrite("child " & $i & " has the tagName: " & .tagName & " and has the text: " & .text & @CRLF)
    EndWith
Next

and your xmlfile would be lets say:

<offspring>
    <youngest>Benjamin</youngest>
    <middleChild>the outkasted one</middleChild>
    <oldest>Big Brother</oldest>
</offspring>

or you can look into the XMLDOM wrapper of eltorro XML DOM Wrapper UDF

i want everything starting from "<id>" untill you get to the first "</id>".

if you only want to get the content of a specific tagName then I would go for the getElementsBytagName method

hope this helps...

cheers

Edited by Marcuzzo18

[font="Century Gothic"]quisnam est quantum stultus , balatro vel balatro quisnam insistovolubilis in solum rideo risi risum----------------------------------------------------------------------------------------------------------------------------Portable Command Line Tool[/font]

Share this post


Link to post
Share on other sites

I think i got your RegExp if you are still going that route...

Try this pattern

(?s)(?i)<id>(.+?)</id>

#include <Array.au3>

$Test = "<id>"& _
"  <name>rbhkamal</name>"& _
"  <email></email>"& _
"</id>"& _
"<session>"& _
"<active>1</active>"& _
"<id>rbhkamal</id>"& _
"</session>"

$Options = "(?s)(?i)" ; "." match newlines / case-insensitive
$Start = "<id>"
$Mid = "(.+?)"
$End = "</id>"

$Expression = $Options&$Start&$Mid&$End

$Result = StringRegExp($Test, $Expression, 3)
_ArrayDisplay($Result)

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Thanks Marcuzzo18, that really helped. I just need to adjust the config file to a valid xml format. However, I've come accross this delema multiple times and every time I find a workaround.

I would still like to know how to match for anything except a specific string using regular expressions.

Regards,

RK

Edit: Thanks Paulie that actually works. I don't know how I didn't think of using mode 3... it must the weekend effect. lol

Edited by rbhkamal

"When the power of love overcomes the love of power, the world will know peace"-Jimi Hendrix

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I would still like to know how to match for anything except a specific string using regular expressions.

The reg expression in my previous post matches everything between the "<id>" and "</id>" tags.

Were you expecting a different result?

EDIT: lol i replied before your edit muttley

Edited by Paulie

Share this post


Link to post
Share on other sites

It still is kinda "misterious" to me how this works; I've been testing different options without result (I know I have much to learn about).

IMO the

<id>(.+?)(?s)</id>
should have returned a match for the first section (mode 1) but it didn't; it matched only "rbhkamal".

(?s)<id>(.+?)</id>
works as intended ... I wonder ... why this (?s) in front of the expression makes such a difference because before first <id> is nothing so it should have been working without this ...

... darn regular expressions ... it's always a matter of trial and error for me muttley


SNMP_UDF ... for SNMPv1 and v2c so far, GetBulk and a new example script

wannabe "Unbeatable" Tic-Tac-Toe

Paper-Scissor-Rock ... try to beat it anyway :)

Share this post


Link to post
Share on other sites

It still is kinda "misterious" to me how this works; I've been testing different options without result (I know I have much to learn about).

IMO the

<id>(.+?)(?s)</id>
should have returned a match for the first section (mode 1) but it didn't; it matched only "rbhkamal".

(?s)<id>(.+?)</id>
works as intended ... I wonder ... why this (?s) in front of the expression makes such a difference because before first <id> is nothing so it should have been working without this ...

... darn regular expressions ... it's always a matter of trial and error for me muttley

You need to put the flag "(?s)" before using the dot in "(.+?)" so that the "." will match a new line as well as any character. In the case of the first <id>...</id> there are new lines and that's why (.+?) failed. In the second one there are new lines thus (.+?) matched with "rbhkamal".

"When the power of love overcomes the love of power, the world will know peace"-Jimi Hendrix

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0