FTPMonster Posted April 19, 2013 Share Posted April 19, 2013 I have a weird issue going on with _XMLUpdateField. I'm trying to update a field with a very specific regex, but it keeps adding & everywhere I want just a &. Here's the specifics of what I've tried so far: This: "(?" & chr(38) & "<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" results in this: (?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4}) This one: "(?<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" results in: (?&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4}) And this one: "(?&<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" results in: (?&&lt;=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4}) All of these are the end bits of _XMLUpdateField("//TransactionDate/RegexForYear","(?&<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})") this example being the last attempt. It does find the field correctly, and non-special character fields update without issue. Is there something I'm missing? Link to comment Share on other sites More sharing options...
jdelaney Posted April 19, 2013 Share Posted April 19, 2013 (edited) If you want valid html/xml, you need to encode certain chars (&,>,<)...when you return the actual values from the DOM objects, they will be converted back. Give me a small reproducer, and I'll see if i can help. $sXMLBeginWrap = '<?xml version="1.0"?><items>' $sXML = '<img><a>&SomeValue</a></img>' $sXMLEndWrap = "</items>" $oXML = ObjCreate("Microsoft.XMLDOM") $oXML.loadxml($sXMLBeginWrap & $sXML & $sXMLEndWrap) ConsoleWrite($oXML.xml) $oImgCol = $oXML.SelectNodes("//img") For $oItem In $oImgCol ConsoleWrite($oItem.SelectSingleNode("./a").text & @CRLF) Next output is converted back the th &: &SomeValue So you can do the regexp search against teh .text of the node If you add the .text with the regexp, it will auto convert to be valid values (the &[lt|gt|etc]; codes )...so just add it straight in as the <,>,& Full demonstration: $sXMLBeginWrap = '<?xml version="1.0"?><items>' $sXML = '<img><a>&SomeValue</a></img>' $sXMLEndWrap = "</items>" $test1 = "(?" & chr(38) & "<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" $oXML = ObjCreate("Microsoft.XMLDOM") $oXML.loadxml($sXMLBeginWrap & $sXML & $sXMLEndWrap) ConsoleWrite($oXML.xml) $oImgCol = $oXML.SelectNodes("//img") For $oItem In $oImgCol ConsoleWrite("original value: " & $oItem.SelectSingleNode("./a").text & @CRLF) ConsoleWrite("update to $test1= " & $test1 & @CRLF ) $oItem.SelectSingleNode("./a").text = $test1 ConsoleWrite("$oItem.SelectSingleNode('./a').text= " & $oItem.SelectSingleNode("./a").text & @CRLF ) ConsoleWrite("$oItem.SelectSingleNode('./a').outerxml= " & $oItem.SelectSingleNode("./a").xml & @CRLF ) Next output: original value: &SomeValue update to $test1= (?&<=^s*d+s+w+s+w+s+)([0-9]{4}) $oItem.SelectSingleNode('./a').text= (?&<=^s*d+s+w+s+w+s+)([0-9]{4}) $oItem.SelectSingleNode('./a').outerxml= <a>(?&<=^s*d+s+w+s+w+s+)([0-9]{4})</a> Edited April 19, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
FTPMonster Posted April 19, 2013 Author Share Posted April 19, 2013 (edited) First off, thank you SO MUCH for the quick reply. Bosses are breathing down my neck on this one. What I need it to say is specifically (?<=^\s*\d+\s+\w+\s+\w+\s+\w+\s+)([0-9]{4}) Now, with your help, here's the code I have: $XMLImportFile = _XMLFileOpen('c:\FTPMonster\data\sourcefile.xml', 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"') $id3 = _XMLGetValue ("//TransactionDate/RegexForYear") if $id3[1]<>"(?<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" then $sXMLBeginWrap = '<?xml version="1.0"?><items>' $test1 = "<img><a>(?<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})</a></img>" $sXMLEndWrap = "</items>" $oXML = ObjCreate("Microsoft.XMLDOM") $oXML.loadxml($sXMLBeginWrap & $test1 & $sXMLEndWrap) $oImgCol = $oXML.SelectNodes("//img") For $oItem In $oImgCol _XMLUpdateField("//TransactionDate/RegexForYear",$oItem.SelectSingleNode("./a").text) Next endif Now the result is <RegexForYear>(?lt;=^\s*\d+\s+\w+\s+\w+\s+\w+\s+)([0-9]{4})</RegexForYear> Note how it stripped the & from before "lt". I tested with both .text and .xml versions of $oItem.SelectSingleNode("./a") to no avail. Edited April 19, 2013 by FTPMonster Link to comment Share on other sites More sharing options...
jdelaney Posted April 19, 2013 Share Posted April 19, 2013 (edited) haha, I just provided my xml as examples...don't actually load it in, or it will override your xml...just meant to demonstrate the diff between viewing the data through xml vs node.textThe _XMLGetValue should (probably) return the .text, but you will need to jump into the function to verify...if it returns .text, then it will return < as <please provide the name of the udf you are using, i'm unfamiliar with those function calls (_XMLUpdateField)This should work (if _XMLGetValue returns the text of the node):$id3 = _XMLGetValue ("//TransactionDate/RegexForYear") if $id3[1]<>"(?<=^\s*\d+\s+\w+\s+\w+\s+)([0-9]{4})" then ConsoleWrite("success: " & $id3[1] & @CRLF) Else ConsoleWrite("failure: " & $id3[1] & @CRLF) endifsend back the output if notalso, send copy the node directly out of your xml...maybe your xml is missing the "&" on < (which means your xml data needs to be modified)example: <RegexForYear>blahblablbah</RegexForYear> Edited April 19, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
FTPMonster Posted April 19, 2013 Author Share Posted April 19, 2013 I'll test this right now, but in the meantime the UDF is <_XMLDomWrapper.au3>. Link to comment Share on other sites More sharing options...
FTPMonster Posted April 19, 2013 Author Share Posted April 19, 2013 It worked! The interesting thing is, it always goes in to the "success" loop, even if it's exactly the same as what I want to change it to, because of the < to < conversion BS. IMO, that's acceptable behavior. It is working, and being deployed into production right now. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now