Jump to content

Read Huge XML File


Recommended Posts

Hi,

this is my first post :) I Need your help regarding XML . I have a huge XML file and would like to extract some values and export to a txt file.

 

my first try :

#include <Array.au3>

#Region ;**** Directives created by AutoIt3Wrapper_GUI ****
#AutoIt3Wrapper_Change2CUI=y
#EndRegion ;**** Directives created by AutoIt3Wrapper_GUI ****

$hXML = FileOpen("VERK1405.xml")
If $hXML = -1 Then

    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf

$sXML = FileRead($hXML)
FileClose($hXML)
$a_AcctNbr = StringRegExp($sXML, '<AcctNbr>(.*?)</AcctNbr>', 3)
$a_AmexNbr = StringRegExp($sXML, '<AmexOfcNbr>(.*?)</AmexOfcNbr>', 3)
$a_InvNbr = StringRegExp($sXML, '<InvNbr>(.*?)</InvNbr>', 3)
$a_CreNbr = StringRegExp($sXML, '<CrNoteNbr>(.*?)</CrNoteNbr>',3)

    For $i = 0 to UBound($a_AcctNbr) - 1


ConsoleWrite ($i &" - "&$a_AcctNbr[$i] &";"&$a_AmexNbr[$i] & ";" &$a_InvNbr[$i]& ";"&$a_CreNbr[$i]&@CRLF)





    Next

It works :) but in some cases the <CrNoteNbr> is available and in some not - If there is a Credit Note Number i would like to use it instead of the InvNbr - but how ?

How can make the Array unique with the InvNvr ?

 

Thanks for your help

 

Cheers

Marcus

 

 

 

Link to comment
Share on other sites

why don't you try to search a XML reader of #include files, i'm trying for you, and find _XMLDomWrapper.au3 , he use DOM

$objDoc = ObjCreate("Msxml2.DOMDocument." & $iVer & ".0")

He's really better than read a file as tring like your code.

You can try it by download the _XMLDomWrapper.au3 at the link below, then copy downloaded file to your Autoit Include folder (at your install directory of autoit)

Read the code online at: https://github.com/Silvernine0S/FolderMenu3EX/blob/master/Include/_XMLDomWrapper.au3

Download at: https://github.com/Silvernine0S/FolderMenu3EX/raw/master/Include/_XMLDomWrapper.au3

To use it, at begining of your autoit file, add:

#include <Array.au3>
;... or anything you like, then
#include <_XMLDomWrapper.au3> ; --> include this file, besure he's at your Include foler, mine is: E:\Portable programs\AutoIT\Include

Below is all function of this file:

_XMLCreateFile           Creates an XML file with the given name and root.
    _XMLFileOpen             Creates an instance of an XML file.
    _XMLSaveDoc              Save the xml doc,  use $iForce = 1 to force save if AutoSave is off.  
    ; ==========================================================================
    _XMLGetChildNodes        Selects XML child Node(s) of an element based on XPath input from root node. 
    _XMLGetNodeCount         Get node count for specified path and type. 
    _XMLGetPath              Returns a nodes full path based on XPath input from root node. 
    ; ==========================================================================
    _XMLSelectNodes          Selects XML Node(s) based on XPath input from root node. 
    _XMLGetField             Get XML Field(s) based on XPath input from root node.
    _XMLGetValue             Get XML Field based on XPath input from root node. 
    _XMLGetChildText         Selects XML child Node(s) of an element based on XPath input from root node. 
    _XMLUpdateField          Update existing node(s) based on XPath specs.
    _XMLReplaceChild         Replaces a node with a new node. 
    ; ==========================================================================
    _XMLDeleteNode           Delete specified XPath node.
    _XMLDeleteAttr           Delete attribute for specified XPath
    _XMLDeleteAttrNode       Delete attribute node for specified XPath
    ; ==========================================================================
    _XMLGetAttrib            Get XML attribute based on XPath input from root node.
    _XMLGetAllAttrib         Get all XML Field(s) attributes based on XPath input from root node.
    _XMLGetAllAttribIndex    Get all XML Field(s) attributes based on Xpathn and specific index.
    _XMLSetAttrib            Set XML Field(s) attributes based on XPath input from root node.
    ; ==========================================================================
    _XMLCreateCDATA          Create a CDATA SECTION node directly under root. 
    _XMLCreateComment        Create a COMMENT node at specified path.
    _XMLCreateAttrib         Creates an attribute for the specified node. 
    ; ==========================================================================
    _XMLCreateRootChild      Create node directly under root.
    _XMLCreateRootNodeWAttr  Create a child node under root node with attributes.
    _XMLCreateChildNode      Create a child node under the specified XPath Node.
    _XMLCreateChildWAttr     Create a child node under the specified XPath Node with Attributes. 
    ; ==========================================================================
    _XMLSchemaValidate       Validate a document against a DTD. 
    _XMLGetDomVersion        Returns the XSXML version currently in use. 
    _XMLError                Sets or Gets XML error message generated by XML functions.
    _XMLUDFVersion           Returns the UDF Version number. 
    _XMLTransform            Transfroms the document using built-in sheet or xsl file passed to function. 
    _XMLNodeExists           Checks for the existence of the specified path. 
    ; ==========================================================================
    _XMLSetAutoFormat        Turn auto indenting on or off.
    _XMLSetAutoSave          Set the automatic save to on or off.

To use the function , just search in the _XMLDomWrapper.au3 file, or read it online by the link abow, find the function you want to use, read his Parameters  .

Example read your file by:

#include <_XMLDomWrapper.au3>
$myxml = _XMLFileOpen("VERK1405.xml")

 

Link to comment
Share on other sites

Indeed, you should use a "smart" xml parser. Because in your example, the problem is that if the "CrNoteNbr" is not available in all records, the regex result array for that optional field will just have fewer elements and your values will not match up with the other fields any more.

Take this xml:

<record>
 <fieldA>content1</field1>
 <fieldB>content2</field2>
</record>

<record>
 <fieldA>contentBLAH</fieldA>
</record>

<record>
 <fieldA>contentFOO</fieldA>
 <fieldB>contentBAZ</fieldB>
</record>

Parsing fieldA and fieldB with your method will yield an array with three elements for fieldA and an array with two elements for fieldB, but the second element in the fieldB array will have the content from the third record (because it's just the second occurrence of "fieldB". So this breaks your code's assumption that all array indexes represent the record count.

XML with optional fields cannot be parsed in the way you are doing it. What you would need to do, given the example above, is first get an array of record (<record>(.*)</record>), then for each element in that array, do a new regex match to get the fields. Then simply take the field that you want and assign it to the variable you want.

But apart from all that, your approach comes with tons of headaches. What about if no results are returned? What if there are commented sections in the XML? What if there are special characters in escaped XML sequences? That kind of stuff. You are much, MUCH better off handling this through a well-thought-through xml dom parser. Especially if you are going to parse actual real life financial data. Don't know what you're going to use it for but this kind of quick&dirty approach is almost certainly going to bite you in the ass.

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

Indeed, you should use a "smart" xml parser. Because in your example, the problem is that if the "CrNoteNbr" is not available in all records, the regex result array for that optional field will just have fewer elements and your values will not match up with the other fields any more.

Take this xml:

<record>
 <fieldA>content1</field1>
 <fieldB>content2</field2>
</record>

<record>
 <fieldA>contentBLAH</fieldA>
</record>

<record>
 <fieldA>contentFOO</fieldA>
 <fieldB>contentBAZ</fieldB>
</record>

Parsing fieldA and fieldB with your method will yield an array with three elements for fieldA and an array with two elements for fieldB, but the second element in the fieldB array will have the content from the third record (because it's just the second occurrence of "fieldB". So this breaks your code's assumption that all array indexes represent the record count.

XML with optional fields cannot be parsed in the way you are doing it. What you would need to do, given the example above, is first get an array of record (<record>(.*)</record>), then for each element in that array, do a new regex match to get the fields. Then simply take the field that you want and assign it to the variable you want.

But apart from all that, your approach comes with tons of headaches. What about if no results are returned? What if there are commented sections in the XML? What if there are special characters in escaped XML sequences? That kind of stuff. You are much, MUCH better off handling this through a well-thought-through xml dom parser. Especially if you are going to parse actual real life financial data. Don't know what you're going to use it for but this kind of quick&dirty approach is almost certainly going to bite you in the ass.

Link to comment
Share on other sites

Hi all,

thanks for your replies. here my 2nd try with the XMLDOMWrapper .

 

#include <Array.au3>
#include <_XMLDomWrapper.au3>
#include <String.au3>
#include<Date.au3>
#include <File.au3>



$heute = @YEAR & "" & @MON & "" &@MDAY
$file = FileOpen("./log/"&$heute&"_log.txt", 1)

$VBBT = 0
$AXTM = 0

;$XML = FileOpen("", @ProgramFilesDir & "\autoit", "XML (*.xml)", 1)
$XMLopen = _XMLFileOpen("XMLVERK1405.xml")
ConsoleWrite("Debug: $iRET = " & $XMLopen & @LF)
$XMLcount = _XmlGetNodeCount("//Message/TravelBatch/TravelTran")

For $i = 0 To $XMLcount



    $AcctNbr = _XMLGetValue("TravelTran[" & $i & "]/Account/AcctNbr")
    $Amex = _XMLGetValue("TravelTran[" & $i & "]/Provider/AmexOfcNbr")
    $InvNr = _XMLGetValue("TravelTran[" & $i & "]/Provider/InvNbr")


    FileWrite($file,  $Amex[1]&";"&$InvNr[1]&";"&$AcctNbr[1]  & @CRLF)



Next

    FileWrite($file,"008585018;VBBT TOTAL INVOICES;"&$VBBT & @CRLF)
    FileWrite($file,"008584961;AXTM TOTAL INVOICES;"&$AXTM & @CRLF)

   FileClose($file)

I can´t get any values and i have the Problem with the "optional" CrNoteNbr field as well.

Hopyfully one of your can help me again.

Thanks

Marcus

 

 

XMLVERK1405.xml

Link to comment
Share on other sites

You're having trouble with your XPath-fu, and you didn't read the function doc correctly:

;===============================================================================
; Function Name:     _XMLGetValue
; Description:     Get XML values based on XPath input from root node.
; Parameter(s):     $strXPath - xml tree path from root node (root/child/child..)
; Syntax:             _XMLGetValue($strXPath)
; Author(s):         Stephen Podhajecki <gehossafats@netmdc.com>
; Return Value(s):    On Success  An array of fields text values(count is in first element)
;                            On Failure -1 set
;                                @Error = 1
;                                @Extended to:
;                                        0 = No matching node
;                                        1 = No object passed
;===============================================================================

Firstly the XPath. If you read the returned value, the @error and the @extended with your code, you'll see that the result is set to -1, the error is set and the extended is set to 0. So there are no nodes found. As you are certain that the nodes are in there (and assuming no bugs in the UDF), the only conclusion is that you mis-specified the XPath. Try something like: "/Message/TravelBatch/TravelTran[" & $i & "]/Account/AcctNbr".

Secondly the return value: the function does not return a single field, but an array of fields (because who knows, maybe there are multiple fields called AmexOfcNbr on that xpath). So you'll have to:

  1. Do the function with a valid xpath
  2. With the returned value:
    1. if it is -1 (or alternatively if it is not an array)
      then no value was found
      else apparently it is not -1, in which case it must be an array (assuming bug-free UDF), so: loop through the returned array. If you know there is always one element, just take the first element of the array and assign that to your variable.
  3. If you know that the element is optional, do extra work, like:
    1. get value of AmexOfcNbr
    2. if value is an array
      then remember the first element of that array
      else get the value of InvNbr and remember the first element of that array instead
    3. you may want to do some error checking in case both values return no results (your xml may be broken or your file missing or... or...)

Good luck!

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

One way :

$oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.load("XMLVERK1405.xml")

$objNodeList = $oXML.getElementsByTagName("TravelTran");


$i = 0
For $oNode In $objNodeList
    $sTransSeqNbr = $oNode.selectSingleNode ("TransSeqNbr").text
    $sAmexOfcNbr = $oNode.selectSingleNode ("Provider/AmexOfcNbr").text
    $sInvNbr = $oNode.selectSingleNode ("Provider/InvNbr").text
    $sCrNoteNbr = $oNode.selectSingleNode ("Provider/CrNoteNbr").text
    
    If $sCrNoteNbr = "" Then $sCrNoteNbr = $sInvNbr
    
    ConsoleWrite("[TravelTran #" & $i & "]" & @CRLF & "TransSeqNbr " & @TAB & $sTransSeqNbr & @CRLF & "AmexOfcNbr" & @TAB & $sAmexOfcNbr & @CRLF & "InvNbr" & @TAB & $sInvNbr & @CRLF & "CrNoteNbr" & @TAB & $sCrNoteNbr & @CRLF & @CRLF)
    
    $i += 1
Next

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...