rudi Posted July 19, 2008 Posted July 19, 2008 Hi.I'd like to beak lines in a given XML file after each "</some value>", so that it's easy to read with ANY editor, e.g. Notepad.Can't I use StringRegExpReplace() to do so? StringRegExp() DOES correctly catch the substing where I want to insert a CRLF muttley#include <array.au3> $string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>' $SplitThis=StringRegExp($string,"\</.*(\>\<)",3) _ArrayDisplay($SplitThis) StringRegExpReplace($string,"\</.*(\>\<)","\>" & @CRLF & "\<") MsgBox(0,"Error: " & @error & ", Ext: " & @extended,$string) ; it's still "one line" :(Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
JFee Posted July 20, 2008 Posted July 20, 2008 I think you're making this hard on yourself StrinRegExpReplace($string, "(</(?:.*?)>)[^{" & @CRLF & "}]", "\1" & @CRLF) Not tested but should work. I'll test when I get to my dev machine Regards,Josh
rudi Posted July 20, 2008 Author Posted July 20, 2008 I think you're making this hard on yourself StrinRegExpReplace($string, "(</(?:.*?)>)[^{" & @CRLF & "}]", "\1" & @CRLF) Hm. Let me ask step by step what you are doing there? 1.) "(</" so "<" doesn't need to be escaped, right? (\<) 2.) (?: ....) from the help file: "Non captureing group". If I got it correctly, it has to match, but "doesn't show up" anywhere in the resulting array? (?:.*) = any character, 0 to whatever occurences show up. And what for is the trailing "?" (?:.*?) 3.) [^ .... opening a class definition, allowing all BUT the listed items? ("[" opening the class, "^" negation ) Is that to prevent another insertion of @CRLF in case it's already a CRLF in there? 4.) [^{" & @CRLF & "}] from the "{" I've lost you: All I know for "{...}" is to use them to define the "number of occurences", like {1,} or {4,9} RegEx is really powerfull, but sometimes hard to read muttley And, BTW: What is wrong, or too complicated in the RegEx I've tried? (I don't get it) Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
JFee Posted July 21, 2008 Posted July 21, 2008 You are absolutely right on all of your remarks.... the {'s are not supposed to be there muttley StrinRegExpReplace($string, "(</(?:.*?)>)[^" & @CRLF & "]", "\1" & @CRLF) And in your original one you were replacing just the end of the string, not the entire </tag> Regards,Josh
rudi Posted July 23, 2008 Author Posted July 23, 2008 Hi.This is still not working, no replaces happen muttleyThat's what I want to replace:<section1>SectName<value1>name1</value1><value2>V2</value2></section1><section2>...These red marked is what I want to "reg-ex-look-for"The underlined ">" are the closing ones of "</some name>"They shall be replaced with ">" & @crlfHowto?Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
smashly Posted July 23, 2008 Posted July 23, 2008 (edited) Hi. This is still not working, no replaces happen muttley That's what I want to replace: <section1>SectName<value1>name1</value1><value2>V2</value2></section1><section2>... These red marked is what I want to "reg-ex-look-for" The underlined ">" are the closing ones of "</some name>" They shall be replaced with ">" & @crlf Howto? Regards, Rudi.Hi, is this what you mean?$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>' $SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<") MsgBox(0, "Replacements: " & @extended, $SRER) Edited July 23, 2008 by smashly
rudi Posted July 23, 2008 Author Posted July 23, 2008 Hi, is this what you mean?$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>' $SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<") MsgBox(0, "Replacements: " & @extended, $SRER)oÝ÷ Ûú®¢×¢×±iËeÉ«¢+ÙMÑÉ¥¹IáÁIÁ± ÀÌØíÍÑÉ¥¹°ÅÕ½Ðì Ðì±Ðì¤ÅÕ½Ðì°ÅÕ½ÐìÐìÅÕ½ÐìµÀì I1µÀìÅÕ½Ðì±ÐìÅÕ½Ðì¤ìÑ¡ÅÕ½ÐìÀäÈìÅÕ½Ðì¥Ì¹½ÐÉÅÕ¥É This will replace ANY occurence of "><" with ">" & @CRLF & "<" But I want do do so ONLY if the closing ">" belongs to a "XML-Close", like "</{name}>" Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
rudi Posted July 23, 2008 Author Posted July 23, 2008 Hi, is this what you mean?$string='<Value1>"some value"</Value1><Value2>"Another Value goes here"</Value2>' $SRER = StringRegExpReplace($string,"(\><)", ">" & @CRLF & "<") MsgBox(0, "Replacements: " & @extended, $SRER)oÝ÷ Ûú®¢×¢×±iËeÉ«¢+ÙMÑÉ¥¹IáÁIÁ± ÀÌØíÍÑÉ¥¹°ÅÕ½Ðì Ðì±Ðì¤ÅÕ½Ðì°ÅÕ½ÐìÐìÅÕ½ÐìµÀì I1µÀìÅÕ½Ðì±ÐìÅÕ½Ðì¤ìÑ¡ÅÕ½ÐìÀäÈìÅÕ½Ðì¥Ì¹½ÐÉÅÕ¥É This will replace ANY occurence of "><" with ">" & @CRLF & "<" But I want do do so ONLY if the closing ">" belongs to a "XML-Close", like "</{name}>" Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
rudi Posted July 23, 2008 Author Posted July 23, 2008 (edited) Hi.I've got now, that with "()" I get groups returned that can be referenced later in the replace part with "\1", "\2", ...So my thoughts now are to use "(</.*)" to match the start of closing XML tags. Making it "(/.*?)" matches not the largest, but the (my English) smaller part.Using a regEx "(</.*?)(>)" upon the string "</closing tag>" results in \1 = "</closing tag", and \2 =">", and as expected only the closing XML tags are matching -- when using StringRegExp(). But I fail to add a @CRLF following the closing ">" with StringRegExpReplace(); trying to insert a @CRLF after closing XML tags... #include <array.au3> $string="<value1>Val1</value1><Sect2>Sect2<value2.1>Text</value2.1><value2.2>text</value2.2></Sect2>" ; ^^^^^^^^^ match one ^^^^^^^^^ match 2 ^^^^^^^^^^^^^^^^^^^^ 3 + 4 $regEx="(</.+?)(>)" $RegExReplace="\1\2" & @CRLF MsgBox(0,"Original string:",$string & @LF & "Search expression: '" & $regEx & "'" & @LF & "Replace Expression:'" & $RegExReplace & "'") ; stringregexp *DOES* catch what I want: $Result=StringRegExp($string,$regEx,3) _ArrayDisplay($Result,"Four matches, split apart, as expected") ; how to replace this "catch" with stringregexpreplace() ?? StringRegExpReplace($string, $regEx, $RegExReplace) MsgBox(0,"no @CRLFs inserted :(",$string)What is my mistake?Regards, Rudi. Edited July 23, 2008 by rudi Earth is flat, pigs can fly, and Nuclear Power is SAFE!
weaponx Posted July 23, 2008 Posted July 23, 2008 (edited) You probably shouldn't be brute force formatting XML, it could get seriously corrupted. Especially when namespaces and CDATA areas come into play. You should use the XML UDF along with a stylesheet.Something like this:http://skew.org/xml/stylesheets/reindent/-or-<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/> </xsl:stylesheet>You might have to search around for an XSL file that works for your needs. Edited July 23, 2008 by weaponx
rudi Posted July 23, 2008 Author Posted July 23, 2008 You probably shouldn't be brute force formatting XML, it could get seriously corrupted.OK. This is just about VIEWING an XML file, that is "10 pages in two lines". I know that I can do so in IE as well now. Especially when namespaces and CDATA areas come into play. You should use the XML UDF along with a stylesheet. Something like this: http://skew.org/xml/stylesheets/reindent/ -or- <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/> </xsl:stylesheet> You might have to search around for an XSL file that works for your needs. Thanks for you reply, URL, sample. muttley How to use these XSL files ? I honestly have no clue... Just to improve my knowlege upon StringRegExpReplace(), (no matter if it's wise to do so) -- do you see why my approach didn't work as I expected? Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
Squirrely1 Posted July 23, 2008 Posted July 23, 2008 I have an includes file that apparently comes with the latest beta of AutoIt. It has a function for this: _XMLTransform ( oXMLDoc, Style = "",szNewDoc="" ) I don't think you have to know much about xsl to just apply the transformation to your xml. I have tried the paste-a-string method and weaponx is right - and this method kind of defeats the (supposed) simplicity of working with xml. Das Häschen benutzt Radar
rudi Posted July 23, 2008 Author Posted July 23, 2008 Hi. > [new function in the current beta] Thanks, I'll get the latest beta and try that one. BTW: Maybe somebody can point out the mistake in the StringRegExReplace() syntax I have used? I'd like to understand what I missed muttley Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
weaponx Posted July 23, 2008 Posted July 23, 2008 I would just do this: $string='"some value""Another Value goes here"' $string = StringReplace($string, ">", ">" & @CRLF) $string = StringReplace($string, "<", @CRLF & "<") $string = StringReplace($string, @CRLF & @CRLF, @CRLF) ConsoleWrite($string)
Squirrely1 Posted July 23, 2008 Posted July 23, 2008 Rudi - once you learn, then teach me - I couldn't regExp my way out of a wet paper bag. Das Häschen benutzt Radar
rudi Posted July 24, 2008 Author Posted July 24, 2008 @weaponx I would just do this: $string='"some value""Another Value goes here"' $string = StringReplace($string, ">", ">" & @CRLF) $string = StringReplace($string, "<", @CRLF & "<") $string = StringReplace($string, @CRLF & @CRLF, @CRLF) ConsoleWrite($string) Yes, that's fine. The one I'm INTERESTED in, interested to UNDERSTAND RegEx in gereral and StringRegExpReplace() especially, is this one: "Add a @CRLF behind each closing XML tag". Green = the beginning of a "closing tag" Blue = the tag name. (don't touch these, just to "catch" the right ">" Chars) Red = the final ">" of a closing tag. So I expected this behaviour: RegEx = "(</)" should catch the beginning of closing tags = "</", right? RegEx = "(</.*)" should match from the beginning, largest possible match? RegEx = "(</.*?)" should match the smallest match in stead of the largest? RegEx = "(>)" should match the final ">". That's the one to add a @CRLF behind it. So finally: RegEx = "(</.*?)(>)" will return as "\1" the "green+blue" and as "\2" the "read". When adding a :? to the first group, then just the 2nd one will be "\1" ?? RegEx = "(:?</.*?)(>)" Where is my mistake? Or how to use this with StringRegExpReplace()? Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
weaponx Posted July 24, 2008 Posted July 24, 2008 (edited) I still think you are trying to over-simplify something that is not simple. $xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>' $new_xml = StringRegExpReplace($xml, '<.+?>',"$0" & @CRLF) ConsoleWrite($new_xml & @CRLF) <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng"> Harry Potter </title> <price> 29.99 </price> </book> <book> <title lang="eng"> Learning XML </title> <price> 39.95 </price> </book> </bookstore> This leaves extra whitespace. Next you will replace double carriage returns, this takes 3 passes. I don't think this can be done in a single pass correctly. Edited July 24, 2008 by weaponx
rudi Posted July 25, 2008 Author Posted July 25, 2008 I still think you are trying to over-simplify something that is not simple.This final question was not to really use this, but to see, what's my mistake.$new_xml = StringRegExpReplace($xml, '<.+?>',"$0" & @CRLF)And there I CAN see it now:1.) I can catch the closing XML tags in *one* RegEx group.2.) the first group match is referenced through "$0" (or "\0") and NOT through "\1", as I tried to use.Thanks, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE!
dhaley77 Posted August 7, 2008 Posted August 7, 2008 2.) the first group match is referenced through "$0" (or "\0") and NOT through "\1", as I tried to use. I know this is kinda old, but just in case someone is searching..... $0 doesn't actually match the first group, it matches the entire search string (like "&" in UNIX/POSIX/etc regex) In your case, you don't need "()" to catpure anything. Here's how I'd do this with regexp: $xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>' $new_xml = StringRegExpReplace($xml, "</[^>]*>", "\0" & @CRLF) ConsoleWrite($new_xml) This will output: <?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book><title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore> Also, if you were trying to add a @CRLF before the end tag (like in weaponx's example), you'd only need two passes. The second pass would just be needed to strip out the extra @CRLF's. Since this is only two passes and the patterns aren't complex, I went ahead and nested the StringRegExpReplace commands (the output from the inner command is the input to the outer command). Here's the code modified to add the @CRLF before and after the end tag and strip then strip the extra's: $xml = '<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter</title><price>29.99</price></book><book><title lang="eng">Learning XML</title><price>39.95</price></book></bookstore>' $new_xml = StringRegExpReplace(StringRegExpReplace($xml, "</[^>]*>", @CRLF & "\0" & @CRLF), @CRLF & @CRLF, @CRLF) ConsoleWrite($new_xml) Output: <?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang="eng">Harry Potter </title> <price>29.99 </price> </book> <book><title lang="eng">Learning XML </title> <price>39.95 </price> </book> </bookstore> dan
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now