t8inevergreen Posted November 14, 2008 Share Posted November 14, 2008 (edited) Hello everyone!This is a project of mine that I have been working on for converting a rather large XML file to a .csv spreadsheet.My goal is to gather information for ID, JOINED DATE, ACTIVITY, and ACTIVITY DATE per entry and output them in some Microsoft Excel friendly file in the format previously mentioned with a column containing each piece of category of information.(Example, I appologize for how ugly it is.)+------+----------------+------------+-------------------+| ID | JOINED DATE | ACTIVITY | ACTIVITY DATE |+------+----------------+------------+-------------------+| data | data | data | data |+------+----------------+------------+-------------------+All as FAST as I can (right now it takes my laptop (with many applications running) 7.5 minutes to process the sample .atom file) without reaching 100% CPU, preferably staying below 75%. I've spent a very long time tweaking things to try to make them faster, and I think I've done an alright job so far, but I wanted to submit this to the community to see if any of you more experienced coders had any tweaks or suggestions of your own to add. You need the XML DOM Wrapper (an EXCELLENT UDF) found here:http://www.autoitscript.com/forum/index.ph...l=XMLDomWrapperThe sample .atom file (email addresses and company name/links have been taken out for privacy reasons as it is an extremely large company and an unreleased application):2008_11_8.zipAnd my script, of course:SB_Scribe_1.3.au3And I would like to thank WeaponX for helping me with nearly all things related to MSXML found in my script. Thanks man =)So go for it! All suggestions are greatly appreciated.Thanks everyoneEdit 11/13 at 11:15 P.M.:Updated script, zipped the sample .atom file and uploaded (thank GOD)Edit 11/14 at 8:15 A.M.:Minor fix in the file writing loop for cleaner output. Edited November 14, 2008 by t8inevergreen Link to comment Share on other sites More sharing options...
dbzfanatic Posted November 14, 2008 Share Posted November 14, 2008 First please use codebox tags for something so large. Second concerning CPU I didn't see a sleep in your While 1 loop. Also For speed try moving the Filewrite from For $i = 0 To $sLoop If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"' FileWrite($fPath, '"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF) Next to after that and changing it to something like For $i = 0 To $sLoop If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"' $var &='"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF Next FileWrite($fPath,$var) Go to my website. | My Zazzle Page (custom products)Al Bhed Translator | Direct linkScreenRec ProSimple Text Editor (STE) [TUTORIAL]Task Scheduler UDF <--- First ever UDF!_ControlPaste() UDF[quote name='renanzin' post='584064' date='Sep 26 2008, 07:00 AM']whats help ?[/quote] Link to comment Share on other sites More sharing options...
t8inevergreen Posted November 14, 2008 Author Share Posted November 14, 2008 First please use codebox tags for something so large. Second concerning CPU I didn't see a sleep in your While 1 loop. Also For speed try moving the Filewrite from For $i = 0 To $sLoop If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"' FileWrite($fPath, '"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF) Next to after that and changing it to something like For $i = 0 To $sLoop If $xEntry[1][$i] <> "" Then $xEntry[1][$i] = '"' & $xEntry[1][$i] & '"' $var &='"' & $xEntry[0][$i] & '", ' & $xEntry[1][$i] & ', ' & $xEntry[2][$i] & ', ' & $xEntry[3][$i] & @CRLF Next FileWrite($fPath,$var) Yeah, I did use codebox tags, but for some reason they aren't working. Right now they're just in plaintext right above and right below the xml data. Very sorry, I would have included it as an attachment but the forum disallows the .atom format for uploads. It's certainly numerous in lines, so maybe the codebox can only handle a certain amount of data? As for your suggestion, awesome! Thank you very much, I will certainly try that out and upload the updated version of the script after I've given it a quick run. Link to comment Share on other sites More sharing options...
dbzfanatic Posted November 14, 2008 Share Posted November 14, 2008 (edited) Sorry I didn't see the tags. Hope my suggestions assist your script. Could you change the extension to .txt and attach and inform users to change the extension back to .atom? It's a bit of a pain in the @$$ to scroll that much to see 3 replies. Edit: Typo Edited November 14, 2008 by dbzfanatic Go to my website. | My Zazzle Page (custom products)Al Bhed Translator | Direct linkScreenRec ProSimple Text Editor (STE) [TUTORIAL]Task Scheduler UDF <--- First ever UDF!_ControlPaste() UDF[quote name='renanzin' post='584064' date='Sep 26 2008, 07:00 AM']whats help ?[/quote] Link to comment Share on other sites More sharing options...
t8inevergreen Posted November 14, 2008 Author Share Posted November 14, 2008 (edited) Sorry I didn't see the tags. Hope my suggestions assist your script. Could you change the extension to .txt and attach and inform users to change the extension back to .atom? It's a bit of a pain in the @$$ to scroll that much to see 3 replies.Edit: TypoFixed, and the script is updated with your revision. First try yielded about 60 seconds cut =) My comp isn't a very reliable benchmarking environment right now, but that is certainly a very large improvement, thank you!Scratch that, I'm actually down to 3 minutes now. Impressive! Thank you even more for the tip. Edited November 14, 2008 by t8inevergreen Link to comment Share on other sites More sharing options...
t8inevergreen Posted November 14, 2008 Author Share Posted November 14, 2008 I'm also considering completely ditching MSXML and writing my own XML parsing Python script using LibXML. Would anyone recommend this? I'm not sure how they compare in speed as Microsoft tucked a nice little clause in their EULA that says that you cannot post benchmark statistics about MSXML without their written permission. Would it be worth it for speed reasons to do this? Is MSXML considered "slow" in the XML world? Link to comment Share on other sites More sharing options...
dbzfanatic Posted November 14, 2008 Share Posted November 14, 2008 Glad my ideas helped you out. As for the XML I wouldn't know as I don't deal with it often. Go to my website. | My Zazzle Page (custom products)Al Bhed Translator | Direct linkScreenRec ProSimple Text Editor (STE) [TUTORIAL]Task Scheduler UDF <--- First ever UDF!_ControlPaste() UDF[quote name='renanzin' post='584064' date='Sep 26 2008, 07:00 AM']whats help ?[/quote] Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now