MarcusD Posted July 8, 2015 Posted July 8, 2015 Hi,this is my first post I Need your help regarding XML . I have a huge XML file and would like to extract some values and export to a txt file. my first try :#include <Array.au3> #Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Change2CUI=y #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** $hXML = FileOpen("VERK1405.xml") If $hXML = -1 Then MsgBox(0, "Error", "Unable to open file.") Exit EndIf $sXML = FileRead($hXML) FileClose($hXML) $a_AcctNbr = StringRegExp($sXML, '<AcctNbr>(.*?)</AcctNbr>', 3) $a_AmexNbr = StringRegExp($sXML, '<AmexOfcNbr>(.*?)</AmexOfcNbr>', 3) $a_InvNbr = StringRegExp($sXML, '<InvNbr>(.*?)</InvNbr>', 3) $a_CreNbr = StringRegExp($sXML, '<CrNoteNbr>(.*?)</CrNoteNbr>',3) For $i = 0 to UBound($a_AcctNbr) - 1 ConsoleWrite ($i &" - "&$a_AcctNbr[$i] &";"&$a_AmexNbr[$i] & ";" &$a_InvNbr[$i]& ";"&$a_CreNbr[$i]&@CRLF) NextIt works but in some cases the <CrNoteNbr> is available and in some not - If there is a Credit Note Number i would like to use it instead of the InvNbr - but how ?How can make the Array unique with the InvNvr ? Thanks for your help CheersMarcus
jguinch Posted July 8, 2015 Posted July 8, 2015 can you provide a piece of this XML file ? Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF
drquochoai Posted July 8, 2015 Posted July 8, 2015 why don't you try to search a XML reader of #include files, i'm trying for you, and find _XMLDomWrapper.au3 , he use DOM$objDoc = ObjCreate("Msxml2.DOMDocument." & $iVer & ".0")He's really better than read a file as tring like your code.You can try it by download the _XMLDomWrapper.au3 at the link below, then copy downloaded file to your Autoit Include folder (at your install directory of autoit)Read the code online at: https://github.com/Silvernine0S/FolderMenu3EX/blob/master/Include/_XMLDomWrapper.au3Download at: https://github.com/Silvernine0S/FolderMenu3EX/raw/master/Include/_XMLDomWrapper.au3To use it, at begining of your autoit file, add:#include <Array.au3> ;... or anything you like, then #include <_XMLDomWrapper.au3> ; --> include this file, besure he's at your Include foler, mine is: E:\Portable programs\AutoIT\IncludeBelow is all function of this file:expandcollapse popup_XMLCreateFile Creates an XML file with the given name and root. _XMLFileOpen Creates an instance of an XML file. _XMLSaveDoc Save the xml doc, use $iForce = 1 to force save if AutoSave is off. ; ========================================================================== _XMLGetChildNodes Selects XML child Node(s) of an element based on XPath input from root node. _XMLGetNodeCount Get node count for specified path and type. _XMLGetPath Returns a nodes full path based on XPath input from root node. ; ========================================================================== _XMLSelectNodes Selects XML Node(s) based on XPath input from root node. _XMLGetField Get XML Field(s) based on XPath input from root node. _XMLGetValue Get XML Field based on XPath input from root node. _XMLGetChildText Selects XML child Node(s) of an element based on XPath input from root node. _XMLUpdateField Update existing node(s) based on XPath specs. _XMLReplaceChild Replaces a node with a new node. ; ========================================================================== _XMLDeleteNode Delete specified XPath node. _XMLDeleteAttr Delete attribute for specified XPath _XMLDeleteAttrNode Delete attribute node for specified XPath ; ========================================================================== _XMLGetAttrib Get XML attribute based on XPath input from root node. _XMLGetAllAttrib Get all XML Field(s) attributes based on XPath input from root node. _XMLGetAllAttribIndex Get all XML Field(s) attributes based on Xpathn and specific index. _XMLSetAttrib Set XML Field(s) attributes based on XPath input from root node. ; ========================================================================== _XMLCreateCDATA Create a CDATA SECTION node directly under root. _XMLCreateComment Create a COMMENT node at specified path. _XMLCreateAttrib Creates an attribute for the specified node. ; ========================================================================== _XMLCreateRootChild Create node directly under root. _XMLCreateRootNodeWAttr Create a child node under root node with attributes. _XMLCreateChildNode Create a child node under the specified XPath Node. _XMLCreateChildWAttr Create a child node under the specified XPath Node with Attributes. ; ========================================================================== _XMLSchemaValidate Validate a document against a DTD. _XMLGetDomVersion Returns the XSXML version currently in use. _XMLError Sets or Gets XML error message generated by XML functions. _XMLUDFVersion Returns the UDF Version number. _XMLTransform Transfroms the document using built-in sheet or xsl file passed to function. _XMLNodeExists Checks for the existence of the specified path. ; ========================================================================== _XMLSetAutoFormat Turn auto indenting on or off. _XMLSetAutoSave Set the automatic save to on or off.To use the function , just search in the _XMLDomWrapper.au3 file, or read it online by the link abow, find the function you want to use, read his Parameters .Example read your file by:#include <_XMLDomWrapper.au3> $myxml = _XMLFileOpen("VERK1405.xml")
SadBunny Posted July 8, 2015 Posted July 8, 2015 Indeed, you should use a "smart" xml parser. Because in your example, the problem is that if the "CrNoteNbr" is not available in all records, the regex result array for that optional field will just have fewer elements and your values will not match up with the other fields any more.Take this xml:<record> <fieldA>content1</field1> <fieldB>content2</field2> </record> <record> <fieldA>contentBLAH</fieldA> </record> <record> <fieldA>contentFOO</fieldA> <fieldB>contentBAZ</fieldB> </record>Parsing fieldA and fieldB with your method will yield an array with three elements for fieldA and an array with two elements for fieldB, but the second element in the fieldB array will have the content from the third record (because it's just the second occurrence of "fieldB". So this breaks your code's assumption that all array indexes represent the record count.XML with optional fields cannot be parsed in the way you are doing it. What you would need to do, given the example above, is first get an array of record (<record>(.*)</record>), then for each element in that array, do a new regex match to get the fields. Then simply take the field that you want and assign it to the variable you want.But apart from all that, your approach comes with tons of headaches. What about if no results are returned? What if there are commented sections in the XML? What if there are special characters in escaped XML sequences? That kind of stuff. You are much, MUCH better off handling this through a well-thought-through xml dom parser. Especially if you are going to parse actual real life financial data. Don't know what you're going to use it for but this kind of quick&dirty approach is almost certainly going to bite you in the ass. Roses are FF0000, violets are 0000FF... All my base are belong to you.
MarcusD Posted July 8, 2015 Author Posted July 8, 2015 Indeed, you should use a "smart" xml parser. Because in your example, the problem is that if the "CrNoteNbr" is not available in all records, the regex result array for that optional field will just have fewer elements and your values will not match up with the other fields any more.Take this xml:<record> <fieldA>content1</field1> <fieldB>content2</field2> </record> <record> <fieldA>contentBLAH</fieldA> </record> <record> <fieldA>contentFOO</fieldA> <fieldB>contentBAZ</fieldB> </record>Parsing fieldA and fieldB with your method will yield an array with three elements for fieldA and an array with two elements for fieldB, but the second element in the fieldB array will have the content from the third record (because it's just the second occurrence of "fieldB". So this breaks your code's assumption that all array indexes represent the record count.XML with optional fields cannot be parsed in the way you are doing it. What you would need to do, given the example above, is first get an array of record (<record>(.*)</record>), then for each element in that array, do a new regex match to get the fields. Then simply take the field that you want and assign it to the variable you want.But apart from all that, your approach comes with tons of headaches. What about if no results are returned? What if there are commented sections in the XML? What if there are special characters in escaped XML sequences? That kind of stuff. You are much, MUCH better off handling this through a well-thought-through xml dom parser. Especially if you are going to parse actual real life financial data. Don't know what you're going to use it for but this kind of quick&dirty approach is almost certainly going to bite you in the ass.
MarcusD Posted July 8, 2015 Author Posted July 8, 2015 Hi all,thanks for your replies. here my 2nd try with the XMLDOMWrapper . expandcollapse popup#include <Array.au3> #include <_XMLDomWrapper.au3> #include <String.au3> #include<Date.au3> #include <File.au3> $heute = @YEAR & "" & @MON & "" &@MDAY $file = FileOpen("./log/"&$heute&"_log.txt", 1) $VBBT = 0 $AXTM = 0 ;$XML = FileOpen("", @ProgramFilesDir & "\autoit", "XML (*.xml)", 1) $XMLopen = _XMLFileOpen("XMLVERK1405.xml") ConsoleWrite("Debug: $iRET = " & $XMLopen & @LF) $XMLcount = _XmlGetNodeCount("//Message/TravelBatch/TravelTran") For $i = 0 To $XMLcount $AcctNbr = _XMLGetValue("TravelTran[" & $i & "]/Account/AcctNbr") $Amex = _XMLGetValue("TravelTran[" & $i & "]/Provider/AmexOfcNbr") $InvNr = _XMLGetValue("TravelTran[" & $i & "]/Provider/InvNbr") FileWrite($file, $Amex[1]&";"&$InvNr[1]&";"&$AcctNbr[1] & @CRLF) Next FileWrite($file,"008585018;VBBT TOTAL INVOICES;"&$VBBT & @CRLF) FileWrite($file,"008584961;AXTM TOTAL INVOICES;"&$AXTM & @CRLF) FileClose($file)I can´t get any values and i have the Problem with the "optional" CrNoteNbr field as well.Hopyfully one of your can help me again.ThanksMarcus XMLVERK1405.xml
SadBunny Posted July 8, 2015 Posted July 8, 2015 (edited) You're having trouble with your XPath-fu, and you didn't read the function doc correctly:;=============================================================================== ; Function Name: _XMLGetValue ; Description: Get XML values based on XPath input from root node. ; Parameter(s): $strXPath - xml tree path from root node (root/child/child..) ; Syntax: _XMLGetValue($strXPath) ; Author(s): Stephen Podhajecki <gehossafats@netmdc.com> ; Return Value(s): On Success An array of fields text values(count is in first element) ; On Failure -1 set ; @Error = 1 ; @Extended to: ; 0 = No matching node ; 1 = No object passed ;===============================================================================Firstly the XPath. If you read the returned value, the @error and the @extended with your code, you'll see that the result is set to -1, the error is set and the extended is set to 0. So there are no nodes found. As you are certain that the nodes are in there (and assuming no bugs in the UDF), the only conclusion is that you mis-specified the XPath. Try something like: "/Message/TravelBatch/TravelTran[" & $i & "]/Account/AcctNbr".Secondly the return value: the function does not return a single field, but an array of fields (because who knows, maybe there are multiple fields called AmexOfcNbr on that xpath). So you'll have to:Do the function with a valid xpathWith the returned value:if it is -1 (or alternatively if it is not an array)then no value was foundelse apparently it is not -1, in which case it must be an array (assuming bug-free UDF), so: loop through the returned array. If you know there is always one element, just take the first element of the array and assign that to your variable.If you know that the element is optional, do extra work, like:get value of AmexOfcNbrif value is an arraythen remember the first element of that arrayelse get the value of InvNbr and remember the first element of that array insteadyou may want to do some error checking in case both values return no results (your xml may be broken or your file missing or... or...)Good luck! Edited July 8, 2015 by SadBunny Roses are FF0000, violets are 0000FF... All my base are belong to you.
jguinch Posted July 8, 2015 Posted July 8, 2015 One way :$oXML = ObjCreate("Microsoft.XMLDOM") $oXML.load("XMLVERK1405.xml") $objNodeList = $oXML.getElementsByTagName("TravelTran"); $i = 0 For $oNode In $objNodeList $sTransSeqNbr = $oNode.selectSingleNode ("TransSeqNbr").text $sAmexOfcNbr = $oNode.selectSingleNode ("Provider/AmexOfcNbr").text $sInvNbr = $oNode.selectSingleNode ("Provider/InvNbr").text $sCrNoteNbr = $oNode.selectSingleNode ("Provider/CrNoteNbr").text If $sCrNoteNbr = "" Then $sCrNoteNbr = $sInvNbr ConsoleWrite("[TravelTran #" & $i & "]" & @CRLF & "TransSeqNbr " & @TAB & $sTransSeqNbr & @CRLF & "AmexOfcNbr" & @TAB & $sAmexOfcNbr & @CRLF & "InvNbr" & @TAB & $sInvNbr & @CRLF & "CrNoteNbr" & @TAB & $sCrNoteNbr & @CRLF & @CRLF) $i += 1 Next Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now