Sign in to follow this  
Followers 0
wvzuilen

Editing a big xml file

5 posts in this topic

Hi all,

I've got a huge xml file I want to edit automatically. Over 10 million rows. First I used a for-next loop with FileReadLine, that took ages... of course... Then I tried reading the file to a array, editing the array by replacing certain values and then wrote the array back to a xml file. That worked. Much faster then editing the xml file itself.

#include <File.au3>
#include <Array.au3>

$filename = "OUTPUT_OPDRACHTEN_20120411_221538.xml"
Local $aArray

ConsoleWrite("Reading file..." & @LF)
_FileReadToArray($filename, $aArray)
ConsoleWrite("Reading file... Ready!" & @LF)

$lines = UBound($aArray)
$count = 0

ConsoleWrite("Changing..." & @LF)

For $i = 0 To $lines - 1

If StringInStr($aArray[$i], "<BC_INCASSO>") > 0 Then
$aArray[$i] = "<BC_INCASSO xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xsi:nil=""true""/>"
$count = $count + 1
;ConsoleWrite($i & @LF)
EndIf

If StringInStr($aArray[$i], "<OS_OMA_ONDW_OMA>Wijziging</OS_OMA_ONDW_OMA>") > 0 Then
$aArray[$i] = "<OS_OMA_ONDW_OMA>Verlenging</OS_OMA_ONDW_OMA>"
EndIf

Next

ConsoleWrite("Changing... Ready!" & @LF)
ConsoleWrite("Writing file" & @LF)
$new = FileOpen("new.xml", 129)

For $i = 1 To $lines - 1

FileWriteLine($new, $aArray[$i])
Next

FileClose($new)

ConsoleWrite("Writing file... Ready!" & @LF)
ConsoleWrite($count & @LF)

But now, the next step.... I would like to insert a few rows of code in certain places of the array, but that takes ages again. I guess that's because when I add a new value (_ArrayInsert) on let's say index 5 and it's a array with 10.000.000 values, it has to re-index all values below that new value.

If StringInStr($aArray[$i], "") > 0 Then
$aArray[$i] = "ID1"
_ArrayInsert($aArray,$i+2,"")
_ArrayInsert($aArray,$i+3,"ID2")
_ArrayInsert($aArray,$i+4,"")
ConsoleWrite($i & @LF)
EndIf

Does anybody has a idea how I can do this reasonably fast ?

Greetings.

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Use the XMLDOM object

http://www.w3schools.com/dom/default.asp

example of usage:

$oXML=ObjCreate("Microsoft.XMLDOM")
$stest = @DesktopDir & "xml1.xml"
$oXML.load($stest) ; load document
$result = $oXML.selectSingleNode('//b="Today"')
ConsoleWrite ( $result.xml & @CRLF)
ConsoleWrite ( $result.childNodes.item(1).text & @CRLF)
ConsoleWrite ( $result.childNodes.item(0).text & @CRLF)

there is also selectnodes, so if you have a generic structure, and want to add a node, you can use that one, and then loop through the returned collection of objects

Here is an example of adding a child node of edition to ALL instances of ANY <Book>

newel=xmlDoc.createElement("edition")
x=xmlDoc.getElementsByTagName("book")
x.appendChild(newel)
Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

If you have such a large xml document there is no way to make your script fast and without errors by editing it like a string/text file. You really need standards made for xml (xpath/xquery).

I once needed to transform an xml document with 5 million rows with xslt. I found xmldom to slow and ended up calling Saxon (http://saxon.sourceforge.net/) from command line. Another option is using an xml database. For BaseX 10 mil is nothing (http://basex.org). Combining it with Autoit you either need to call it from command line or use the Java UDF.

So I would first try jdelaney suggestions and if this doens't work for you then try some of my suggestions.

Share this post


Link to post
Share on other sites

Thanks for your suggestions. I'll try the XML DOM first.

Share this post


Link to post
Share on other sites

skin27 can I ask why you ended up choosing Saxon over Basex as a command-line XML parser?
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0