Jump to content

SPEED, baby!


Recommended Posts

Hello everyone!

This is a project of mine that I have been working on for converting a rather large XML file to a .csv spreadsheet.

My goal is to gather information for ID, JOINED DATE, ACTIVITY, and ACTIVITY DATE per entry and output them in some Microsoft Excel friendly file in the format previously mentioned with a column containing each piece of category of information.

(Example, I appologize for how ugly it is.)

+------+----------------+------------+-------------------+

| ID | JOINED DATE | ACTIVITY | ACTIVITY DATE |

+------+----------------+------------+-------------------+

| data | data | data | data |

+------+----------------+------------+-------------------+

All as FAST as I can (right now it takes my laptop (with many applications running) 7.5 minutes to process the sample .atom file) without reaching 100% CPU, preferably staying below 75%. I've spent a very long time tweaking things to try to make them faster, and I think I've done an alright job so far, but I wanted to submit this to the community to see if any of you more experienced coders had any tweaks or suggestions of your own to add.

You need the XML DOM Wrapper (an EXCELLENT UDF) found here:

http://www.autoitscript.com/forum/index.ph...l=XMLDomWrapper

The sample .atom file (email addresses and company name/links have been taken out for privacy reasons as it is an extremely large company and an unreleased application):

2008_11_8.zip

And my script, of course:

SB_Scribe_1.3.au3

And I would like to thank WeaponX for helping me with nearly all things related to MSXML found in my script. Thanks man =)

So go for it! All suggestions are greatly appreciated.

Thanks everyone

Edit 11/13 at 11:15 P.M.:

Updated script, zipped the sample .atom file and uploaded (thank GOD)

Edit 11/14 at 8:15 A.M.:

Minor fix in the file writing loop for cleaner output.

Edited by t8inevergreen
Link to comment
Share on other sites

First please use codebox tags for something so large. Second concerning CPU I didn't see a sleep in your While 1 loop. Also For speed try moving the Filewrite from

For $i = 0 To $sLoop
    If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"'
    FileWrite($fPath, '"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF)
Next

to after that and changing it to something like

For $i = 0 To $sLoop
    If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"'
    $var &='"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF
Next
FileWrite($fPath,$var)
Link to comment
Share on other sites

First please use codebox tags for something so large. Second concerning CPU I didn't see a sleep in your While 1 loop. Also For speed try moving the Filewrite from

For $i = 0 To $sLoop
    If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"'
    FileWrite($fPath, '"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF)
Next

to after that and changing it to something like

For $i = 0 To $sLoop
    If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"'
    $var &='"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF
Next
FileWrite($fPath,$var)
Yeah, I did use codebox tags, but for some reason they aren't working. Right now they're just in plaintext right above and right below the xml data. Very sorry, I would have included it as an attachment but the forum disallows the .atom format for uploads. It's certainly numerous in lines, so maybe the codebox can only handle a certain amount of data?

As for your suggestion, awesome! Thank you very much, I will certainly try that out and upload the updated version of the script after I've given it a quick run.

Link to comment
Share on other sites

Sorry I didn't see the tags. Hope my suggestions assist your script. Could you change the extension to .txt and attach and inform users to change the extension back to .atom? It's a bit of a pain in the @$$ to scroll that much to see 3 replies.

Edit: Typo

Edited by dbzfanatic
Link to comment
Share on other sites

Sorry I didn't see the tags. Hope my suggestions assist your script. Could you change the extension to .txt and attach and inform users to change the extension back to .atom? It's a bit of a pain in the @$$ to scroll that much to see 3 replies.

Edit: Typo

Fixed, and the script is updated with your revision. First try yielded about 60 seconds cut =) My comp isn't a very reliable benchmarking environment right now, but that is certainly a very large improvement, thank you!

Scratch that, I'm actually down to 3 minutes now. Impressive! Thank you even more for the tip.

Edited by t8inevergreen
Link to comment
Share on other sites

I'm also considering completely ditching MSXML and writing my own XML parsing Python script using LibXML. Would anyone recommend this? I'm not sure how they compare in speed as Microsoft tucked a nice little clause in their EULA that says that you cannot post benchmark statistics about MSXML without their written permission.

Would it be worth it for speed reasons to do this? Is MSXML considered "slow" in the XML world?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...