Sign in to follow this  
Followers 0
FTPMonster

Special characters in _XMLUpdateField

6 posts in this topic

I have a weird issue going on with _XMLUpdateField. I'm trying to update a field with a very specific regex, but it keeps adding & everywhere I want just a &. Here's the specifics of what I've tried so far:

This:

"(?" & chr(38) & "<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"

results in this:

(?&amp;lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})

This one:

"(?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"

results in:

(?&amp;lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})

And this one:

"(?&&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"

results in:

(?&amp;&amp;lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})

All of these are the end bits of

_XMLUpdateField("//TransactionDate/RegexForYear","(?&&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})")

this example being the last attempt.

It does find the field correctly, and non-special character fields update without issue. Is there something I'm missing?

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

If you want valid html/xml, you need to encode certain chars (&,>,<)...when you return the actual values from the DOM objects, they will be converted back.

Give me a small reproducer, and I'll see if i can help.

$sXMLBeginWrap = '<?xml version="1.0"?><items>'
$sXML = '<img><a>&amp;SomeValue</a></img>'
$sXMLEndWrap = "</items>"
$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($sXMLBeginWrap & $sXML & $sXMLEndWrap)
ConsoleWrite($oXML.xml)
$oImgCol = $oXML.SelectNodes("//img")
For $oItem In $oImgCol
ConsoleWrite($oItem.SelectSingleNode("./a").text & @CRLF)
Next

output is converted back the th &:

&SomeValue

So you can do the regexp search against teh .text of the node

If you add the .text with the regexp, it will auto convert to be valid values (the &[lt|gt|etc]; codes )...so just add it straight in as the <,>,&

Full demonstration:

$sXMLBeginWrap = '<?xml version="1.0"?><items>'
$sXML = '<img><a>&amp;SomeValue</a></img>'
$sXMLEndWrap = "</items>"

$test1 = "(?" & chr(38) & "<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"
$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($sXMLBeginWrap & $sXML & $sXMLEndWrap)
ConsoleWrite($oXML.xml)
$oImgCol = $oXML.SelectNodes("//img")
For $oItem In $oImgCol
 ConsoleWrite("original value: " & $oItem.SelectSingleNode("./a").text & @CRLF)
 ConsoleWrite("update to $test1= " & $test1 & @CRLF )
 $oItem.SelectSingleNode("./a").text = $test1
 ConsoleWrite("$oItem.SelectSingleNode('./a').text= " & $oItem.SelectSingleNode("./a").text & @CRLF )
 ConsoleWrite("$oItem.SelectSingleNode('./a').outerxml= " & $oItem.SelectSingleNode("./a").xml & @CRLF )
Next

output:

original value: &SomeValue

update to $test1= (?&<=^s*d+s+w+s+w+s+)([0-9]{4})

$oItem.SelectSingleNode('./a').text= (?&<=^s*d+s+w+s+w+s+)([0-9]{4})

$oItem.SelectSingleNode('./a').outerxml= <a>(?&amp;&lt;=^s*d+s+w+s+w+s+)([0-9]{4})</a>

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

First off, thank you SO MUCH for the quick reply. Bosses are breathing down my neck on this one. :)

What I need it to say is specifically

(?&lt;=^\s*\d+\s+\w+\s+\w+\s+\w+\s+)([0-9]{4})

Now, with your help, here's the code I have:

$XMLImportFile = _XMLFileOpen('c:\FTPMonster\data\sourcefile.xml', 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"')
$id3 = _XMLGetValue ("//TransactionDate/RegexForYear")
if $id3[1]<>"(?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" then

$sXMLBeginWrap = '<?xml version="1.0"?><items>'
$test1 = "<img><a>(?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})</a></img>"
$sXMLEndWrap = "</items>"

$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($sXMLBeginWrap & $test1 & $sXMLEndWrap)
$oImgCol = $oXML.SelectNodes("//img")
For $oItem In $oImgCol
_XMLUpdateField("//TransactionDate/RegexForYear",$oItem.SelectSingleNode("./a").text)
Next
endif

Now the result is

<RegexForYear>(?lt;=^\s*\d+\s+\w+\s+\w+\s+\w+\s+)([0-9]{4})</RegexForYear>

Note how it stripped the & from before "lt". I tested with both .text and .xml versions of $oItem.SelectSingleNode("./a") to no avail.

Edited by FTPMonster

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

haha, I just provided my xml as examples...don't actually load it in, or it will override your xml...just meant to demonstrate the diff between viewing the data through xml vs node.text

The _XMLGetValue should (probably) return the .text, but you will need to jump into the function to verify...if it returns .text, then it will return &lt; as <

please provide the name of the udf you are using, i'm unfamiliar with those function calls (_XMLUpdateField)

This should work (if _XMLGetValue returns the text of the node):

$id3 = _XMLGetValue ("//TransactionDate/RegexForYear")
if $id3[1]<>"(?<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" then
ConsoleWrite("success: " & $id3[1] & @CRLF)
Else
ConsoleWrite("failure: " & $id3[1] & @CRLF)
endif

send back the output if not

also, send copy the node directly out of your xml...maybe your xml is missing the "&" on &lt; (which means your xml data needs to be modified)

example: <RegexForYear>blahblablbah</RegexForYear>

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

I'll test this right now, but in the meantime the UDF is <_XMLDomWrapper.au3>.

Share this post


Link to post
Share on other sites

It worked! :)

The interesting thing is, it always goes in to the "success" loop, even if it's exactly the same as what I want to change it to, because of the &lt; to < conversion BS. IMO, that's acceptable behavior. :)

It is working, and being deployed into production right now. :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0