Jump to content
ijourneaux

Intermittent problem processing XML file

Recommended Posts

I am having an intermittent issue processing XML files. The XML file contains 1 <Entities> node and multiple <Entity> nodes.

<?xml version="1.0"?>
<Entities>
  <Entity RecordType="Emerson.CSI.DataImport.MHM.WaveFormData">
    <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property>
    <Property Name="LastWrite_Time_as_String" IsReadOnly="True" ValueType="System.String">4/4/2019 5:21:20 AM</Property>
    <Property Name="LastWrite_Time_as_UInt" IsReadOnly="False" ValueType="System.UInt32">1554373280</Property>
    <Property Name="LastWrite_Time" IsReadOnly="False" ValueType="System.DateTime">4/4/2019 5:21:20 AM</Property>
    <Property Name="LastWrite_ProgramID" IsReadOnly="False" ValueType="System.Int16">37</Property>
    <Property Name="ExpertCode" IsReadOnly="False" ValueType="System.SByte">0</Property>
    <Property Name="ActualDate_as_UInt" IsReadOnly="False" ValueType="System.UInt32">1554373251</Property>
    <Property Name="ActualDate" IsReadOnly="False" ValueType="System.DateTime">4/4/2019 5:20:51 AM</Property>
    <Property Name="UnitsString" IsReadOnly="False" ValueType="System.String">
    </Property>
  </Entity>
  <Entity RecordType="Emerson.CSI.DataImport.MHM.SpectraData">
    <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property>
    <Property Name="LastWrite_Time_as_String" IsReadOnly="True" ValueType="System.String">4/4/2019 5:21:20 AM</Property>
    <Property Name="LastWrite_Time_as_UInt" IsReadOnly="False" ValueType="System.UInt32">1554373280</Property>
    <Property Name="LastWrite_Time" IsReadOnly="False" ValueType="System.DateTime">4/4/2019 5:21:20 AM</Property>
    <Property Name="LastWrite_ProgramID" IsReadOnly="False" ValueType="System.Int16">37</Property>
    <Property Name="IsTruePeak" IsReadOnly="False" ValueType="System.Boolean">False</Property>
    <Property Name="IsDigitalOverall" IsReadOnly="False" ValueType="System.Boolean">False</Property>
    <Property Name="IsZoom" IsReadOnly="False" ValueType="System.Boolean">False</Property>
    <Property Name="IsDigitallyIntegrated" IsReadOnly="False" ValueType="System.Boolean">False</Property>
    <Property Name="IsAWeighted" IsReadOnly="False" ValueType="System.Boolean">False</Property>
    <Property Name="Is3rdOctave" IsReadOnly="False" ValueType="System.Boolean">False</Property>
    <Property Name="CollectedInAnalyzeMode" IsReadOnly="False" ValueType="System.Boolean">False</Property>
    <Property Name="AnalysisFlag" IsReadOnly="False" ValueType="System.Int16">0</Property>
    <Property Name="OwningDCS" IsReadOnly="False" ValueType="System.Int32">4203</Property>
    <Property Name="ContinuationRecord" IsReadOnly="False" ValueType="System.Int32">1365089</Property>
  </Entity>
</Entities>

I would like to break up the XML file so that each XML file has only one <Entity> node. I used recommendations form another user to break the files apart. Conceptually easy

  1. Open XML file 
  2.  Verify that the minimum bits are there
  3. Select the <Entity> nodes
  4. Create a new XML file for each <Entity> node
Func BreakXMLFileApart($sFileXML)
    Local $oXmlDoc
    Local $aAttributes
    Local $oProperties
    Local $oProperty
    Local $oNode
    Local $iNodeCount
    Local $sString
    Local $i
    Local $cnt
    Local $sDrive, $sDir, $sFilename, $sExtension

    Local $aPathSplit = _PathSplit($sFileXML, $sDrive, $sDir, $sFilename, $sExtension)

    ;Create XML Document object and load XML file

    $oXmlDoc = _XML_CreateDOMDocument(Default)
    _XML_Load($oXmlDoc, $sFileXML) ;<== ENTER XML FILE PATH HERE
    If @error Then
        _WriteErrorLog(StringFormat("_XML_load error - @error = %s", @error) & @CRLF)
        _WriteErrorLog("-" & $sFileXML & " - Moved to Problem Folder")
         MoveFileToSubFolder($sFileXML, $ProblemFolder)
        return 1
    EndIf

    ;If no specified nodes exist, log error and exit
    If Not _XML_NodeExists($oXmlDoc, "//Entity") Then
        ;        ConsoleWrite("No specified nodes exist" & @CRLF)
        return 1
    EndIf
    ;Get number of Entity nodes
    $iNodeCount = _XML_GetNodesCount($oXmlDoc, "//Entity")

    If ($iNodeCount > 1) Then
        ;Get number of Property nodes
        $oEntities = _XML_SelectNodes($oXmlDoc, "//Entity")
        $iNodeCount = @extended
        $cnt = 0
        For $oEntity In $oEntities
            $cnt = $cnt + 1
            $sString = ""
            FileWrite($sDrive & $sDir & $sFilename & "-" & $cnt & $sExtension, '<?xml version="1.0"?>' & @CRLF & "<Entities>" & @CRLF & $oEntity.xml & @CRLF & "</Entities>" & @CRLF)
         Next
         consolewrite($SaveOriginalXMLFiles & @crlf)
        If ($SaveOriginalXMLFiles = "True") Then
         consolewrite($SaveOriginalXMLFiles & @crlf)

            MoveFileToSubFolder($sFileXML, $OriginalXMLFolder)
        EndIf
    EndIf
    Return 0
EndFunc   ;==>BreakXMLFileApart

This works 99% of the time. Unfortunately, on that 1% of cases, the new XML file ends up with 2 identical sections

<?xml version="1.0"?>
<Entities>
  <Entity RecordType="Emerson.CSI.DataImport.MHM.WaveFormData">
      <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property>
        ...
  </Entity>
</Entities>
<?xml version="1.0"?>
<Entities>
  <Entity RecordType="Emerson.CSI.DataImport.MHM.WaveFormData">
      <Property Name="LastWrite_User" IsReadOnly="False" ValueType="System.Int32">42</Property>
        ...
  </Entity>
</Entities>

It is as if FileWrite statement is getting executed twice.

I know it would be better if I uploaded a working sample but the problem is intermittent. To make things even worse. One time I read the file I will see this issue. If I try a second time, it works.

Appreciate any comments that can help me figure out what is going on.

Share this post


Link to post
Share on other sites
Posted (edited)

Hey @ijourneaux

I think the main problem here is that you're using objects management functions which aren't easy to debug so you dont have a way to understand what your script is exactly doing step by step.

If i were you I would instead read the XML file using _FileReadToArray() then locate the lines containing "<Entity" and "Entity>" and put what's between them inside a new text file which you name ".xml" 

This way you can add debug informations at all the steps of processing the files and you know exactly what's happening.

That beeing said, you're using Filewrite with a file name as first argument which implies that if a file with the same name already exists it will APPEND this file and write the content of the second argument at the end of the file instead of erasing the content of the first file and write only what is in the second argument.

Try to run this code multiple times and you'll understand what I mean:

FileWrite(@ScriptDir & "\test.txt", "this is a test " & @HOUR & "H" & @MIN & "M" & @SEC & @CRLF)

This means that if for any reason your program calls your XML processing function on the same file by mistake, the Filewrite function will append already existing files and make them look like duplicates inside exactly as you described.

So instead of using Filewrite with a file name as first argument, you should create the file with Fileopen with 10 (create & overwrite) as second argument to prevent that from happening.

This is the best reason I can think of, maybe it is, maybe it's not, if not then as I said earlier you'd better leave object processing and process the file as text yourself, it's fast and easy :)

Edited by Neutro

Share this post


Link to post
Share on other sites

You have given me a couple of great ideas. I did not appreciate that the file write appended. I am not sure how I woul dahve the same filename twice but It looks pretty suspicious to me.

Thanks for thanking the time to comment on a code snippet. I know that it isn't the best way to ask for help.

Share this post


Link to post
Share on other sites
Posted (edited)

I would do something like this...much easier:

 

#include <File.au3>
$sMyXMLFile = "test.xml"

Local $oXML = ObjCreate("Microsoft.XMLDOM")
$oXML.Load($sMyXMLFile)

$oEntity_Nodes = $oXML.selectnodes("/Entities/Entity")

$i = 1
For $oEntity in $oEntity_Nodes
    _FileCreate("XMLOutput." & $i & ".xml")
    FileWrite("XMLOutput." & $i & ".xml",$oEntity.xml)
    $i += 1
Next

Of course, update the file name as you see fit...I just made a simple loop to make it unique.

Or I would recommend proceeding using the XML udf rather than regular expressions.

 

Just a note: there is no error handling required for how I set this up.  If the xml is malformed, then $oXML will not populate with data, but the .selectnodes will not blow-up...it will just return an empty collection...then the loop will not get entered into because the collection is empty.  All the XMLDOM is setup like that...as long as you don't do nested object references, which WOULD require validation that the parent node is an object...example of that would be something like this:

$oXML.selectSingleNode("/Entities").selectnodes("./Entity")

If there is no Entites node found, then this would blow up without an error handler.

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...