leuce Posted May 31, 2011 Share Posted May 31, 2011 G'day everyone I need to check a string to ensure that it does not contain any characters that are invalid XML characters. I'm not talking about entities or tags errors (although I will want to check those as well, in a separate process), but about the fact that some characters may not exist in valid XML. Is there an existing function that will check a string for XML validity? Thanks Samuel Link to comment Share on other sites More sharing options...
Mat Posted May 31, 2011 Share Posted May 31, 2011 The easiest way would be to make a list of valid characters and use StringRegExp. Here is an example where I only allow 'printing' characters (including spaces). Chr(15) is not valid and so it returns 0 for the second test. Local $asTests[2] = ["<test>Valid xml</test>", "<test2>Invalid xml" & Chr(15) & "</test2>"] $sValid = "[:print:]" For $i = 0 To UBound($asTests) - 1 MsgBox(0, $asTests[$i], StringRegExp($asTests[$i], "\A[" & $sValid & "]*\Z")) Next AutoIt Project Listing Link to comment Share on other sites More sharing options...
leuce Posted May 31, 2011 Author Share Posted May 31, 2011 The easiest way would be to make a list of valid characters and use StringRegExp. Theoretically, the number of legal XML characters are finite (essentially the entire Unicode character set, plus some more characters) and the number of illegal XML characters are infinite, but there is a small number of characters that commonly occur in the types of files that I want to check, so I suspected that it would be faster to check for those than to check all the legal characters. I found a list of the illegal characters in someone else's Perl/Python script: http://www.proz.com/forum/cat_tools_technical_help/200111-tmx_fixer.html#1747147 s/[\x00-\x08]|\x0B|\x0C|[\x0E-\x1F]//g; ...so I'll see if I can figure out how to convert that to AutoIt regex syntax and use that. Local $asTests[2] = ["<test>Valid xml</test>", "<test2>Invalid xml" & Chr(15) & "</test2>"] $sValid = "[:print:]" For $i = 0 To UBound($asTests) - 1 MsgBox(0, $asTests[$i], StringRegExp($asTests[$i], "\A[" & $sValid & "]*\Z")) Next Thanks for the code snippet. Samuel Link to comment Share on other sites More sharing options...
PsaltyDS Posted June 8, 2011 Share Posted June 8, 2011 What characters are valid in a given XML file depends on the encoding specified in the header. That will make it very complicated if you want to cover all encoding cases. How about just load the XML string by _XMLLoadXML() and error check the function? Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now