Jump to content
Sign in to follow this  
rudi

Howto use StringRegExpReplace() to ins CRLF into XML files?

Recommended Posts

rudi

Hi.

I'd like to beak lines in a given XML file after each "</some value>", so that it's easy to read with ANY editor, e.g. Notepad.

Can't I use StringRegExpReplace() to do so? StringRegExp() DOES correctly catch the substing where I want to insert a CRLF muttley

#include <array.au3>

$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SplitThis=StringRegExp($string,"\</.*(\>\<)",3)
_ArrayDisplay($SplitThis)
StringRegExpReplace($string,"\</.*(\>\<)","\>" & @CRLF & "\<")
MsgBox(0,"Error: " & @error & ", Ext: " & @extended,$string) ; it's still "one line" :(

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
JFee

I think you're making this hard on yourself

StrinRegExpReplace($string, "(</(?:.*?)>)[^{" & @CRLF & "}]", "\1" & @CRLF)

Not tested but should work. I'll test when I get to my dev machine


Regards,Josh

Share this post


Link to post
Share on other sites
rudi

I think you're making this hard on yourself

StrinRegExpReplace($string, "(</(?:.*?)>)[^{" & @CRLF & "}]", "\1" & @CRLF)
Hm. Let me ask step by step what you are doing there?

1.)

"(</"

so "<" doesn't need to be escaped, right? (\<)

2.)

(?: ....)

from the help file: "Non captureing group". If I got it correctly, it has to match, but "doesn't show up" anywhere in the resulting array?

(?:.*) = any character, 0 to whatever occurences show up. And what for is the trailing "?" (?:.*?)

3.)

[^ ....

opening a class definition, allowing all BUT the listed items? ("[" opening the class, "^" negation :think: )

Is that to prevent another insertion of @CRLF in case it's already a CRLF in there?

4.)

[^{" & @CRLF & "}]

from the "{" I've lost you: All I know for "{...}" is to use them to define the "number of occurences", like {1,} or {4,9}

RegEx is really powerfull, but sometimes hard to read muttley

And, BTW: What is wrong, or too complicated in the RegEx I've tried? (I don't get it)

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
JFee

You are absolutely right on all of your remarks.... the {'s are not supposed to be there muttley

StrinRegExpReplace($string, "(</(?:.*?)>)[^" & @CRLF & "]", "\1" & @CRLF)

And in your original one you were replacing just the end of the string, not the entire </tag>


Regards,Josh

Share this post


Link to post
Share on other sites
rudi

Hi.

This is still not working, no replaces happen muttley

That's what I want to replace:

<section1>SectName<value1>name1</value1><value2>V2</value2></section1><section2>...

These red marked is what I want to "reg-ex-look-for"

The underlined ">" are the closing ones of "</some name>"

They shall be replaced with ">" & @crlf

Howto?

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
smashly

Hi.

This is still not working, no replaces happen muttley

That's what I want to replace:

<section1>SectName<value1>name1</value1><value2>V2</value2></section1><section2>...

These red marked is what I want to "reg-ex-look-for"

The underlined ">" are the closing ones of "</some name>"

They shall be replaced with ">" & @crlf

Howto?

Regards, Rudi.

Hi, is this what you mean?
$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<")
MsgBox(0, "Replacements: " & @extended, $SRER)
Edited by smashly

Share this post


Link to post
Share on other sites
rudi

Hi, is this what you mean?
$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<")
MsgBox(0, "Replacements: " & @extended, $SRER)oÝ÷ Ûú®¢×¢×±iËeÉ«­¢+ÙMÑÉ¥¹IáÁIÁ± ÀÌØíÍÑÉ¥¹°ÅÕ½Ðì Ðì±Ðì¤ÅÕ½Ðì°ÅÕ½ÐìÐìÅÕ½ÐìµÀì
I1µÀìÅÕ½Ðì±ÐìÅÕ½Ðì¤ìÑ¡ÅÕ½ÐìÀäÈìÅÕ½Ðì¥Ì¹½ÐÉÅÕ¥É

This will replace ANY occurence of "><" with ">" & @CRLF & "<"

But I want do do so ONLY if the closing ">" belongs to a "XML-Close", like "</{name}>"

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
rudi

Hi, is this what you mean?
$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>'
$SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<")
MsgBox(0, "Replacements: " & @extended, $SRER)oÝ÷ Ûú®¢×¢×±iËeÉ«­¢+ÙMÑÉ¥¹IáÁIÁ± ÀÌØíÍÑÉ¥¹°ÅÕ½Ðì Ðì±Ðì¤ÅÕ½Ðì°ÅÕ½ÐìÐìÅÕ½ÐìµÀì
I1µÀìÅÕ½Ðì±ÐìÅÕ½Ðì¤ìÑ¡ÅÕ½ÐìÀäÈìÅÕ½Ðì¥Ì¹½ÐÉÅÕ¥É

This will replace ANY occurence of "><" with ">" & @CRLF & "<"

But I want do do so ONLY if the closing ">" belongs to a "XML-Close", like "</{name}>"

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
rudi

Hi.

I've got now, that with "()" I get groups returned that can be referenced later in the replace part with "\1", "\2", ...

So my thoughts now are to use "(</.*)" to match the start of closing XML tags. Making it "(/.*?)" matches not the largest, but the (my English) smaller part.

Using a regEx "(</.*?)(>)" upon the string "</closing tag>" results in \1 = "</closing tag", and \2 =">", and as expected only the closing XML tags are matching -- when using StringRegExp(). But I fail to add a @CRLF following the closing ">" with StringRegExpReplace()

; trying to insert a @CRLF after closing XML tags...

#include <array.au3>

$string="<value1>Val1</value1><Sect2>Sect2<value2.1>Text</value2.1><value2.2>text</value2.2></Sect2>"
;                    ^^^^^^^^^ match one                 ^^^^^^^^^ match 2       ^^^^^^^^^^^^^^^^^^^^ 3 + 4
$regEx="(</.+?)(>)" 
$RegExReplace="\1\2" & @CRLF
MsgBox(0,"Original string:",$string & @LF & "Search expression: '" & $regEx & "'" & @LF & "Replace Expression:'" & $RegExReplace & "'")

; stringregexp *DOES* catch what I want:
$Result=StringRegExp($string,$regEx,3)
_ArrayDisplay($Result,"Four matches, split apart, as expected")

; how to replace this "catch" with stringregexpreplace() ??
StringRegExpReplace($string, $regEx, $RegExReplace)

MsgBox(0,"no @CRLFs inserted :(",$string)

What is my mistake?

Regards, Rudi.

Edited by rudi

Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
weaponx

You probably shouldn't be brute force formatting XML, it could get seriously corrupted. Especially when namespaces and CDATA areas come into play. You should use the XML UDF along with a stylesheet.

Something like this:

http://skew.org/xml/stylesheets/reindent/

-or-

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
 </xsl:stylesheet>

You might have to search around for an XSL file that works for your needs.

Edited by weaponx

Share this post


Link to post
Share on other sites
rudi

You probably shouldn't be brute force formatting XML, it could get seriously corrupted.

OK. This is just about VIEWING an XML file, that is "10 pages in two lines". I know that I can do so in IE as well now.

Especially when namespaces and CDATA areas come into play. You should use the XML UDF along with a stylesheet.

Something like this:

http://skew.org/xml/stylesheets/reindent/

-or-

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
 </xsl:stylesheet>

You might have to search around for an XSL file that works for your needs.

Thanks for you reply, URL, sample.

muttley How to use these XSL files :) ? I honestly have no clue...

Just to improve my knowlege upon StringRegExpReplace(), (no matter if it's wise to do so) -- do you see why my approach didn't work as I expected?

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
Squirrely1

I have an includes file that apparently comes with the latest beta of AutoIt. It has a function for this:

_XMLTransform ( oXMLDoc, Style = "",szNewDoc="" )

I don't think you have to know much about xsl to just apply the transformation to your xml.

I have tried the paste-a-string method and weaponx is right - and this method kind of defeats the (supposed) simplicity of working with xml.


Das Häschen benutzt Radar

Share this post


Link to post
Share on other sites
rudi

Hi.

> [new function in the current beta]

Thanks, I'll get the latest beta and try that one.

BTW: Maybe somebody can point out the mistake in the StringRegExReplace() syntax I have used? I'd like to understand what I missed muttley

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
Squirrely1

Rudi - once you learn, then teach me - I couldn't regExp my way out of a wet paper bag.


Das Häschen benutzt Radar

Share this post


Link to post
Share on other sites
rudi

@weaponx

I would just do this:

$string='"some value""Another Value goes here"'
$string = StringReplace($string, ">", ">" & @CRLF)
$string = StringReplace($string, "<", @CRLF & "<")
$string = StringReplace($string, @CRLF & @CRLF, @CRLF)
ConsoleWrite($string)
Yes, that's fine. The one I'm INTERESTED in, interested to UNDERSTAND RegEx in gereral and StringRegExpReplace() especially, is this one:

"Add a @CRLF behind each closing XML tag".

Green = the beginning of a "closing tag"

Blue = the tag name. (don't touch these, just to "catch" the right ">" Chars)

Red = the final ">" of a closing tag.

So I expected this behaviour:

RegEx = "(</)" should catch the beginning of closing tags = "</", right?

RegEx = "(</.*)" should match from the beginning, largest possible match?

RegEx = "(</.*?)" should match the smallest match in stead of the largest?

RegEx = "(>)" should match the final ">". That's the one to add a @CRLF behind it.

So finally:

RegEx = "(</.*?)(>)" will return as "\1" the "green+blue" and as "\2" the "read".

When adding a :? to the first group, then just the 2nd one will be "\1" ??

RegEx = "(:?</.*?)(>)"

Where is my mistake? Or how to use this with StringRegExpReplace()?

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
weaponx

I still think you are trying to over-simplify something that is not simple.

$xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>'
$new_xml = StringRegExpReplace($xml, '<.+?>',"$0" & @CRLF)
ConsoleWrite($new_xml & @CRLF)

<?xml version="1.0" encoding="ISO-8859-1"?>
 <bookstore>
 <book>
 <title lang="eng">
 Harry Potter
 </title>
 <price>
 29.99
 </price>
 
 </book>
 <book>
 <title lang="eng">
 Learning XML
 </title>
 <price>
 39.95
 </price>
 
 </book>
 
 </bookstore>

This leaves extra whitespace. Next you will replace double carriage returns, this takes 3 passes. I don't think this can be done in a single pass correctly.

Edited by weaponx

Share this post


Link to post
Share on other sites
rudi

I still think you are trying to over-simplify something that is not simple.

This final question was not to really use this, but to see, what's my mistake.

$new_xml = StringRegExpReplace($xml, '<.+?>',"$0" & @CRLF)

And there I CAN see it now:

1.) I can catch the closing XML tags in *one* RegEx group.

2.) the first group match is referenced through "$0" (or "\0") and NOT through "\1", as I tried to use.

Thanks, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites
dhaley77

2.) the first group match is referenced through "$0" (or "\0") and NOT through "\1", as I tried to use.

I know this is kinda old, but just in case someone is searching.....

$0 doesn't actually match the first group, it matches the entire search string (like "&" in UNIX/POSIX/etc regex)

In your case, you don't need "()" to catpure anything.

Here's how I'd do this with regexp:

$xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>'

$new_xml = StringRegExpReplace($xml, "</[^>]*>", "\0" & @CRLF)

ConsoleWrite($new_xml)

This will output:

<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title>
<price>29.99</price>
</book>
<book><title lang="eng">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>

Also, if you were trying to add a @CRLF before the end tag (like in weaponx's example), you'd only need two passes. The second pass would just be needed to strip out the extra @CRLF's. Since this is only two passes and the patterns aren't complex, I went ahead and nested the StringRegExpReplace commands (the output from the inner command is the input to the outer command).

Here's the code modified to add the @CRLF before and after the end tag and strip then strip the extra's:

$xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>'

$new_xml = StringRegExpReplace(StringRegExpReplace($xml, "</[^>]*>", @CRLF & "\0" & @CRLF), @CRLF & @CRLF, @CRLF)

ConsoleWrite($new_xml)

Output:

<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter
</title>
<price>29.99
</price>
</book>
<book><title lang="eng">Learning XML
</title>
<price>39.95
</price>
</book>
</bookstore>

:P

dan

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×