Howto use StringRegExpReplace() to ins CRLF into XML files?

rudi · July 19, 2008

Hi.

I'd like to beak lines in a given XML file after each "</some value>", so that it's easy to read with ANY editor, e.g. Notepad.

Can't I use StringRegExpReplace() to do so? StringRegExp() DOES correctly catch the substing where I want to insert a CRLF muttley

#include <array.au3>

$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SplitThis=StringRegExp($string,"\</.*(\>\<)",3)
_ArrayDisplay($SplitThis)
StringRegExpReplace($string,"\</.*(\>\<)","\>" & @CRLF & "\<")
MsgBox(0,"Error: " & @error & ", Ext: " & @extended,$string) ; it's still "one line" :(

Regards, Rudi.

JFee · July 20, 2008

I think you're making this hard on yourself

StrinRegExpReplace($string, "(</(?:.*?)>)[^{" & @CRLF & "}]", "\1" & @CRLF)

Not tested but should work. I'll test when I get to my dev machine

rudi · July 20, 2008

I think you're making this hard on yourself
StrinRegExpReplace($string, "(</(?:.*?)>)[^{" & @CRLF & "}]", "\1" & @CRLF)

Hm. Let me ask step by step what you are doing there?

1.)

"(</"

so "<" doesn't need to be escaped, right? (\<)

2.)

(?: ....)

from the help file: "Non captureing group". If I got it correctly, it has to match, but "doesn't show up" anywhere in the resulting array?

(?:.*) = any character, 0 to whatever occurences show up. And what for is the trailing "?" (?:.*?)

3.)

[^ ....

opening a class definition, allowing all BUT the listed items? ("[" opening the class, "^" negation :think: )

Is that to prevent another insertion of @CRLF in case it's already a CRLF in there?

4.)

[^{" & @CRLF & "}]

from the "{" I've lost you: All I know for "{...}" is to use them to define the "number of occurences", like {1,} or {4,9}

RegEx is really powerfull, but sometimes hard to read muttley

And, BTW: What is wrong, or too complicated in the RegEx I've tried? (I don't get it)

Regards, Rudi.

JFee · July 21, 2008

You are absolutely right on all of your remarks.... the {'s are not supposed to be there muttley

StrinRegExpReplace($string, "(</(?:.*?)>)[^" & @CRLF & "]", "\1" & @CRLF)

And in your original one you were replacing just the end of the string, not the entire </tag>

rudi · July 23, 2008

Hi.

This is still not working, no replaces happen muttley

That's what I want to replace:

<section1>SectName<value1>name1</value1><value2>V2</value2></section1><section2>...

These red marked is what I want to "reg-ex-look-for"

The underlined ">" are the closing ones of "</some name>"

They shall be replaced with ">" & @crlf

Howto?

Regards, Rudi.

smashly · July 23, 2008

Hi.

This is still not working, no replaces happen muttley

That's what I want to replace:

<section1>SectName<value1>name1</value1><value2>V2</value2></section1><section2>...

These red marked is what I want to "reg-ex-look-for"
The underlined ">" are the closing ones of "</some name>"
They shall be replaced with ">" & @crlf

Howto?

Regards, Rudi.

Hi, is this what you mean?

$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<")
MsgBox(0, "Replacements: " & @extended, $SRER)

Edited July 23, 2008 by smashly

rudi · July 23, 2008

Hi, is this what you mean?

$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<")
MsgBox(0, "Replacements: " & @extended, $SRER)oÝ÷ Ûú®¢×¢×±iËeÉ«¢+ÙMÑÉ¥¹IáÁIÁ± ÀÌØíÍÑÉ¥¹°ÅÕ½Ðì Ðì±Ðì¤ÅÕ½Ðì°ÅÕ½ÐìÐìÅÕ½ÐìµÀì
I1µÀìÅÕ½Ðì±ÐìÅÕ½Ðì¤ìÑ¡ÅÕ½ÐìÀäÈìÅÕ½Ðì¥Ì¹½ÐÉÅÕ¥É

This will replace ANY occurence of "><" with ">" & @CRLF & "<"

But I want do do so ONLY if the closing ">" belongs to a "XML-Close", like "</{name}>"

Regards, Rudi.

rudi · July 23, 2008

Hi, is this what you mean?

$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<")
MsgBox(0, "Replacements: " & @extended, $SRER)oÝ÷ Ûú®¢×¢×±iËeÉ«¢+ÙMÑÉ¥¹IáÁIÁ± ÀÌØíÍÑÉ¥¹°ÅÕ½Ðì Ðì±Ðì¤ÅÕ½Ðì°ÅÕ½ÐìÐìÅÕ½ÐìµÀì
I1µÀìÅÕ½Ðì±ÐìÅÕ½Ðì¤ìÑ¡ÅÕ½ÐìÀäÈìÅÕ½Ðì¥Ì¹½ÐÉÅÕ¥É

This will replace ANY occurence of "><" with ">" & @CRLF & "<"

But I want do do so ONLY if the closing ">" belongs to a "XML-Close", like "</{name}>"

Regards, Rudi.

rudi · July 23, 2008

Hi.

I've got now, that with "()" I get groups returned that can be referenced later in the replace part with "\1", "\2", ...

So my thoughts now are to use "(</.*)" to match the start of closing XML tags. Making it "(/.*?)" matches not the largest, but the (my English) smaller part.

Using a regEx "(</.*?)(>)" upon the string "</closing tag>" results in \1 = "</closing tag", and \2 =">", and as expected only the closing XML tags are matching -- when using StringRegExp(). But I fail to add a @CRLF following the closing ">" with StringRegExpReplace()

; trying to insert a @CRLF after closing XML tags...

#include <array.au3>

$string="<value1>Val1</value1><Sect2>Sect2<value2.1>Text</value2.1><value2.2>text</value2.2></Sect2>"
;                    ^^^^^^^^^ match one                 ^^^^^^^^^ match 2       ^^^^^^^^^^^^^^^^^^^^ 3 + 4
$regEx="(</.+?)(>)" 
$RegExReplace="\1\2" & @CRLF
MsgBox(0,"Original string:",$string & @LF & "Search expression: '" & $regEx & "'" & @LF & "Replace Expression:'" & $RegExReplace & "'")

; stringregexp *DOES* catch what I want:
$Result=StringRegExp($string,$regEx,3)
_ArrayDisplay($Result,"Four matches, split apart, as expected")

; how to replace this "catch" with stringregexpreplace() ??
StringRegExpReplace($string, $regEx, $RegExReplace)

MsgBox(0,"no @CRLFs inserted :(",$string)

What is my mistake?

Regards, Rudi.

Edited July 23, 2008 by rudi

weaponx · July 23, 2008

You probably shouldn't be brute force formatting XML, it could get seriously corrupted. Especially when namespaces and CDATA areas come into play. You should use the XML UDF along with a stylesheet.

Something like this:

http://skew.org/xml/stylesheets/reindent/

-or-

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
 </xsl:stylesheet>

You might have to search around for an XSL file that works for your needs.

Edited July 23, 2008 by weaponx

rudi · July 23, 2008

You probably shouldn't be brute force formatting XML, it could get seriously corrupted.

OK. This is just about VIEWING an XML file, that is "10 pages in two lines". I know that I can do so in IE as well now.

Especially when namespaces and CDATA areas come into play. You should use the XML UDF along with a stylesheet.

Something like this:
http://skew.org/xml/stylesheets/reindent/

-or-
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
 </xsl:stylesheet>
You might have to search around for an XSL file that works for your needs.

Thanks for you reply, URL, sample.

muttley How to use these XSL files ? I honestly have no clue...

Just to improve my knowlege upon StringRegExpReplace(), (no matter if it's wise to do so) -- do you see why my approach didn't work as I expected?

Regards, Rudi.

Squirrely1 · July 23, 2008

I have an includes file that apparently comes with the latest beta of AutoIt. It has a function for this:

_XMLTransform ( oXMLDoc, Style = "",szNewDoc="" )

I don't think you have to know much about xsl to just apply the transformation to your xml.

I have tried the paste-a-string method and weaponx is right - and this method kind of defeats the (supposed) simplicity of working with xml.

rudi · July 23, 2008

Hi.

> [new function in the current beta]

Thanks, I'll get the latest beta and try that one.

BTW: Maybe somebody can point out the mistake in the StringRegExReplace() syntax I have used? I'd like to understand what I missed muttley

Regards, Rudi.

weaponx · July 23, 2008

I would just do this:

$string='"some value""Another Value goes here"'
$string = StringReplace($string, ">", ">" & @CRLF)
$string = StringReplace($string, "<", @CRLF & "<")
$string = StringReplace($string, @CRLF & @CRLF, @CRLF)
ConsoleWrite($string)

Squirrely1 · July 23, 2008

Rudi - once you learn, then teach me - I couldn't regExp my way out of a wet paper bag.

rudi · July 24, 2008

@weaponx

I would just do this:

$string='"some value""Another Value goes here"'
$string = StringReplace($string, ">", ">" & @CRLF)
$string = StringReplace($string, "<", @CRLF & "<")
$string = StringReplace($string, @CRLF & @CRLF, @CRLF)
ConsoleWrite($string)

Yes, that's fine. The one I'm INTERESTED in, interested to UNDERSTAND RegEx in gereral and StringRegExpReplace() especially, is this one:

"Add a @CRLF behind each closing XML tag".

Green = the beginning of a "closing tag"

Blue = the tag name. (don't touch these, just to "catch" the right ">" Chars)

Red = the final ">" of a closing tag.

So I expected this behaviour:

RegEx = "(</)" should catch the beginning of closing tags = "</", right?

RegEx = "(</.*)" should match from the beginning, largest possible match?

RegEx = "(</.*?)" should match the smallest match in stead of the largest?

RegEx = "(>)" should match the final ">". That's the one to add a @CRLF behind it.

So finally:

RegEx = "(</.*?)(>)" will return as "\1" the "green+blue" and as "\2" the "read".

When adding a :? to the first group, then just the 2nd one will be "\1" ??

RegEx = "(:?</.*?)(>)"

Where is my mistake? Or how to use this with StringRegExpReplace()?

Regards, Rudi.

weaponx · July 24, 2008

I still think you are trying to over-simplify something that is not simple.

$xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>'
$new_xml = StringRegExpReplace($xml, '<.+?>',"$0" & @CRLF)
ConsoleWrite($new_xml & @CRLF)

<?xml version="1.0" encoding="ISO-8859-1"?>
 <bookstore>
 <book>
 <title lang="eng">
 Harry Potter
 </title>
 <price>
 29.99
 </price>
 
 </book>
 <book>
 <title lang="eng">
 Learning XML
 </title>
 <price>
 39.95
 </price>
 
 </book>
 
 </bookstore>

This leaves extra whitespace. Next you will replace double carriage returns, this takes 3 passes. I don't think this can be done in a single pass correctly.

Edited July 24, 2008 by weaponx

rudi · July 25, 2008

I still think you are trying to over-simplify something that is not simple.

This final question was not to really use this, but to see, what's my mistake.

$new_xml = StringRegExpReplace($xml, '<.+?>',"$0" & @CRLF)

And there I CAN see it now:

1.) I can catch the closing XML tags in *one* RegEx group.

2.) the first group match is referenced through "$0" (or "\0") and NOT through "\1", as I tried to use.

Thanks, Rudi.

dhaley77 · August 7, 2008

2.) the first group match is referenced through "$0" (or "\0") and NOT through "\1", as I tried to use.

I know this is kinda old, but just in case someone is searching.....

$0 doesn't actually match the first group, it matches the entire search string (like "&" in UNIX/POSIX/etc regex)

In your case, you don't need "()" to catpure anything.

Here's how I'd do this with regexp:

$xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>'

$new_xml = StringRegExpReplace($xml, "</[^>]*>", "\0" & @CRLF)

ConsoleWrite($new_xml)

This will output:

<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title>
<price>29.99</price>
</book>
<book><title lang="eng">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>

Also, if you were trying to add a @CRLF before the end tag (like in weaponx's example), you'd only need two passes. The second pass would just be needed to strip out the extra @CRLF's. Since this is only two passes and the patterns aren't complex, I went ahead and nested the StringRegExpReplace commands (the output from the inner command is the input to the outer command).

Here's the code modified to add the @CRLF before and after the end tag and strip then strip the extra's:

$xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>'

$new_xml = StringRegExpReplace(StringRegExpReplace($xml, "</[^>]*>", @CRLF & "\0" & @CRLF), @CRLF & @CRLF, @CRLF)

ConsoleWrite($new_xml)

Output:

<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter
</title>
<price>29.99
</price>
</book>
<book><title lang="eng">Learning XML
</title>
<price>39.95
</price>
</book>
</bookstore>

dan

Sign In

Howto use StringRegExpReplace() to ins CRLF into XML files?

Recommended Posts

rudi

JFee

rudi

JFee

rudi

smashly

rudi

rudi

rudi

weaponx

rudi

Squirrely1

rudi

weaponx

Squirrely1

rudi

weaponx

rudi

dhaley77

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta