Jump to content

Special characters in _XMLUpdateField


Recommended Posts

I have a weird issue going on with _XMLUpdateField. I'm trying to update a field with a very specific regex, but it keeps adding & everywhere I want just a &. Here's the specifics of what I've tried so far:

This:

"(?" & chr(38) & "<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"

results in this:

(?&amp;lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})

This one:

"(?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"

results in:

(?&amp;lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})

And this one:

"(?&&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"

results in:

(?&amp;&amp;lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})

All of these are the end bits of

_XMLUpdateField("//TransactionDate/RegexForYear","(?&&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})")

this example being the last attempt.

It does find the field correctly, and non-special character fields update without issue. Is there something I'm missing?

Link to comment
Share on other sites

If you want valid html/xml, you need to encode certain chars (&,>,<)...when you return the actual values from the DOM objects, they will be converted back.

Give me a small reproducer, and I'll see if i can help.

$sXMLBeginWrap = '<?xml version="1.0"?><items>'
$sXML = '<img><a>&amp;SomeValue</a></img>'
$sXMLEndWrap = "</items>"
$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($sXMLBeginWrap & $sXML & $sXMLEndWrap)
ConsoleWrite($oXML.xml)
$oImgCol = $oXML.SelectNodes("//img")
For $oItem In $oImgCol
ConsoleWrite($oItem.SelectSingleNode("./a").text & @CRLF)
Next

output is converted back the th &:

&SomeValue

So you can do the regexp search against teh .text of the node

If you add the .text with the regexp, it will auto convert to be valid values (the &[lt|gt|etc]; codes )...so just add it straight in as the <,>,&

Full demonstration:

$sXMLBeginWrap = '<?xml version="1.0"?><items>'
$sXML = '<img><a>&amp;SomeValue</a></img>'
$sXMLEndWrap = "</items>"

$test1 = "(?" & chr(38) & "<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})"
$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($sXMLBeginWrap & $sXML & $sXMLEndWrap)
ConsoleWrite($oXML.xml)
$oImgCol = $oXML.SelectNodes("//img")
For $oItem In $oImgCol
 ConsoleWrite("original value: " & $oItem.SelectSingleNode("./a").text & @CRLF)
 ConsoleWrite("update to $test1= " & $test1 & @CRLF )
 $oItem.SelectSingleNode("./a").text = $test1
 ConsoleWrite("$oItem.SelectSingleNode('./a').text= " & $oItem.SelectSingleNode("./a").text & @CRLF )
 ConsoleWrite("$oItem.SelectSingleNode('./a').outerxml= " & $oItem.SelectSingleNode("./a").xml & @CRLF )
Next

output:

original value: &SomeValue

update to $test1= (?&<=^s*d+s+w+s+w+s+)([0-9]{4})

$oItem.SelectSingleNode('./a').text= (?&<=^s*d+s+w+s+w+s+)([0-9]{4})

$oItem.SelectSingleNode('./a').outerxml= <a>(?&amp;&lt;=^s*d+s+w+s+w+s+)([0-9]{4})</a>

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

First off, thank you SO MUCH for the quick reply. Bosses are breathing down my neck on this one. :)

What I need it to say is specifically

(?&lt;=^\s*\d+\s+\w+\s+\w+\s+\w+\s+)([0-9]{4})

Now, with your help, here's the code I have:

$XMLImportFile = _XMLFileOpen('c:\FTPMonster\data\sourcefile.xml', 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"')
$id3 = _XMLGetValue ("//TransactionDate/RegexForYear")
if $id3[1]<>"(?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" then

$sXMLBeginWrap = '<?xml version="1.0"?><items>'
$test1 = "<img><a>(?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})</a></img>"
$sXMLEndWrap = "</items>"

$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.loadxml($sXMLBeginWrap & $test1 & $sXMLEndWrap)
$oImgCol = $oXML.SelectNodes("//img")
For $oItem In $oImgCol
_XMLUpdateField("//TransactionDate/RegexForYear",$oItem.SelectSingleNode("./a").text)
Next
endif

Now the result is

<RegexForYear>(?lt;=^\s*\d+\s+\w+\s+\w+\s+\w+\s+)([0-9]{4})</RegexForYear>

Note how it stripped the & from before "lt". I tested with both .text and .xml versions of $oItem.SelectSingleNode("./a") to no avail.

Edited by FTPMonster
Link to comment
Share on other sites

haha, I just provided my xml as examples...don't actually load it in, or it will override your xml...just meant to demonstrate the diff between viewing the data through xml vs node.text

The _XMLGetValue should (probably) return the .text, but you will need to jump into the function to verify...if it returns .text, then it will return &lt; as <

please provide the name of the udf you are using, i'm unfamiliar with those function calls (_XMLUpdateField)

This should work (if _XMLGetValue returns the text of the node):

$id3 = _XMLGetValue ("//TransactionDate/RegexForYear")
if $id3[1]<>"(?<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" then
ConsoleWrite("success: " & $id3[1] & @CRLF)
Else
ConsoleWrite("failure: " & $id3[1] & @CRLF)
endif

send back the output if not

also, send copy the node directly out of your xml...maybe your xml is missing the "&" on &lt; (which means your xml data needs to be modified)

example: <RegexForYear>blahblablbah</RegexForYear>

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

It worked! :)

The interesting thing is, it always goes in to the "success" loop, even if it's exactly the same as what I want to change it to, because of the &lt; to < conversion BS. IMO, that's acceptable behavior. :)

It is working, and being deployed into production right now. :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...