Jump to content

Editing a big xml file


Recommended Posts

Hi all,

I've got a huge xml file I want to edit automatically. Over 10 million rows. First I used a for-next loop with FileReadLine, that took ages... of course... Then I tried reading the file to a array, editing the array by replacing certain values and then wrote the array back to a xml file. That worked. Much faster then editing the xml file itself.

#include <File.au3>
#include <Array.au3>

$filename = "OUTPUT_OPDRACHTEN_20120411_221538.xml"
Local $aArray

ConsoleWrite("Reading file..." & @LF)
_FileReadToArray($filename, $aArray)
ConsoleWrite("Reading file... Ready!" & @LF)

$lines = UBound($aArray)
$count = 0

ConsoleWrite("Changing..." & @LF)

For $i = 0 To $lines - 1

If StringInStr($aArray[$i], "<BC_INCASSO>") > 0 Then
$aArray[$i] = "<BC_INCASSO xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xsi:nil=""true""/>"
$count = $count + 1
;ConsoleWrite($i & @LF)
EndIf

If StringInStr($aArray[$i], "<OS_OMA_ONDW_OMA>Wijziging</OS_OMA_ONDW_OMA>") > 0 Then
$aArray[$i] = "<OS_OMA_ONDW_OMA>Verlenging</OS_OMA_ONDW_OMA>"
EndIf

Next

ConsoleWrite("Changing... Ready!" & @LF)
ConsoleWrite("Writing file" & @LF)
$new = FileOpen("new.xml", 129)

For $i = 1 To $lines - 1

FileWriteLine($new, $aArray[$i])
Next

FileClose($new)

ConsoleWrite("Writing file... Ready!" & @LF)
ConsoleWrite($count & @LF)

But now, the next step.... I would like to insert a few rows of code in certain places of the array, but that takes ages again. I guess that's because when I add a new value (_ArrayInsert) on let's say index 5 and it's a array with 10.000.000 values, it has to re-index all values below that new value.

If StringInStr($aArray[$i], "") > 0 Then
$aArray[$i] = "ID1"
_ArrayInsert($aArray,$i+2,"")
_ArrayInsert($aArray,$i+3,"ID2")
_ArrayInsert($aArray,$i+4,"")
ConsoleWrite($i & @LF)
EndIf

Does anybody has a idea how I can do this reasonably fast ?

Greetings.

Link to comment
Share on other sites

Use the XMLDOM object

http://www.w3schools.com/dom/default.asp

example of usage:

$oXML=ObjCreate("Microsoft.XMLDOM")
$stest = @DesktopDir & "xml1.xml"
$oXML.load($stest) ; load document
$result = $oXML.selectSingleNode('//b="Today"')
ConsoleWrite ( $result.xml & @CRLF)
ConsoleWrite ( $result.childNodes.item(1).text & @CRLF)
ConsoleWrite ( $result.childNodes.item(0).text & @CRLF)

there is also selectnodes, so if you have a generic structure, and want to add a node, you can use that one, and then loop through the returned collection of objects

Here is an example of adding a child node of edition to ALL instances of ANY <Book>

newel=xmlDoc.createElement("edition")
x=xmlDoc.getElementsByTagName("book")
x.appendChild(newel)
Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

If you have such a large xml document there is no way to make your script fast and without errors by editing it like a string/text file. You really need standards made for xml (xpath/xquery).

I once needed to transform an xml document with 5 million rows with xslt. I found xmldom to slow and ended up calling Saxon (http://saxon.sourceforge.net/) from command line. Another option is using an xml database. For BaseX 10 mil is nothing (http://basex.org). Combining it with Autoit you either need to call it from command line or use the Java UDF.

So I would first try jdelaney suggestions and if this doens't work for you then try some of my suggestions.

Link to comment
Share on other sites

  • 9 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...