Jump to content
Sign in to follow this  

Editing a big xml file

Recommended Posts

Hi all,

I've got a huge xml file I want to edit automatically. Over 10 million rows. First I used a for-next loop with FileReadLine, that took ages... of course... Then I tried reading the file to a array, editing the array by replacing certain values and then wrote the array back to a xml file. That worked. Much faster then editing the xml file itself.

#include <File.au3>
#include <Array.au3>

$filename = "OUTPUT_OPDRACHTEN_20120411_221538.xml"
Local $aArray

ConsoleWrite("Reading file..." & @LF)
_FileReadToArray($filename, $aArray)
ConsoleWrite("Reading file... Ready!" & @LF)

$lines = UBound($aArray)
$count = 0

ConsoleWrite("Changing..." & @LF)

For $i = 0 To $lines - 1

If StringInStr($aArray[$i], "<BC_INCASSO>") > 0 Then
$aArray[$i] = "<BC_INCASSO xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xsi:nil=""true""/>"
$count = $count + 1
;ConsoleWrite($i & @LF)

If StringInStr($aArray[$i], "<OS_OMA_ONDW_OMA>Wijziging</OS_OMA_ONDW_OMA>") > 0 Then
$aArray[$i] = "<OS_OMA_ONDW_OMA>Verlenging</OS_OMA_ONDW_OMA>"


ConsoleWrite("Changing... Ready!" & @LF)
ConsoleWrite("Writing file" & @LF)
$new = FileOpen("new.xml", 129)

For $i = 1 To $lines - 1

FileWriteLine($new, $aArray[$i])


ConsoleWrite("Writing file... Ready!" & @LF)
ConsoleWrite($count & @LF)

But now, the next step.... I would like to insert a few rows of code in certain places of the array, but that takes ages again. I guess that's because when I add a new value (_ArrayInsert) on let's say index 5 and it's a array with 10.000.000 values, it has to re-index all values below that new value.

If StringInStr($aArray[$i], "") > 0 Then
$aArray[$i] = "ID1"
ConsoleWrite($i & @LF)

Does anybody has a idea how I can do this reasonably fast ?


Share this post

Link to post
Share on other sites

Use the XMLDOM object


example of usage:

$stest = @DesktopDir & "xml1.xml"
$oXML.load($stest) ; load document
$result = $oXML.selectSingleNode('//b="Today"')
ConsoleWrite ( $result.xml & @CRLF)
ConsoleWrite ( $result.childNodes.item(1).text & @CRLF)
ConsoleWrite ( $result.childNodes.item(0).text & @CRLF)

there is also selectnodes, so if you have a generic structure, and want to add a node, you can use that one, and then loop through the returned collection of objects

Here is an example of adding a child node of edition to ALL instances of ANY <Book>

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post

Link to post
Share on other sites

If you have such a large xml document there is no way to make your script fast and without errors by editing it like a string/text file. You really need standards made for xml (xpath/xquery).

I once needed to transform an xml document with 5 million rows with xslt. I found xmldom to slow and ended up calling Saxon (http://saxon.sourceforge.net/) from command line. Another option is using an xml database. For BaseX 10 mil is nothing (http://basex.org). Combining it with Autoit you either need to call it from command line or use the Java UDF.

So I would first try jdelaney suggestions and if this doens't work for you then try some of my suggestions.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...