Jump to content
Alodar

XML Conversion

Recommended Posts

Alodar

So I've been dealing with a lot of xml files lately, and I end up converting them to csvs and the like, and then putting them into a database.  Which is fine, but honestly, the process bothers me.  I'm not quite up to writing my own xml parser, so obviously I'd want to use MSXML or something like that, but I'd like to get some advice on structure.

Obviously, if this is something I should be picking up a book on, please tell me and if you're feeling generous, even tell me which book I should be reading, but what I'd like to get to is how to parse out an xml file into separate structures; i.e. the parent items and child items into different arrays (or variables?).  This is the part I'm not sure about; how I should be parsing it out.  I feel like the obvious answer is to parse it into a set of arrays with one being the 'parent' and the other(s?) being the child elements, which I would know ahead of time from the schema, but is there a better method that I'm just not aware of?

Share this post


Link to post
Share on other sites
mLipok

Did you check this following thread:

 

?

 


Signature beginning:   Wondering who uses AutoIT and what it can be used for ?
* GHAPI UDF - modest begining - comunication with GitHub REST API Forum Rules *
ADO.au3 UDF     POP3.au3 UDF     XML.au3 UDF    How to use IE.au3  UDF with  AutoIt v3.3.14.x  for other useful stuff click the following button

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library
 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF *

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskScheduler

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2018-10-31

Share this post


Link to post
Share on other sites
Alodar

Yeah.  I was more asking for what 'should' I do as a concept, rather than a 'how'.

Dealing with an XML file in and of itself is ok, and the UDF listed would work great for it, I was just wondering if there was a better method than my original thought of creating different arrays for 'parent' and 'child' items.

Eventually, the xml data I deal with will be going in to tables, so the most effective way to deal with it is kind of what I'm after.

Share this post


Link to post
Share on other sites
TheSaint

Is the XML a source you have no control over?

Personally, I dislike XML, and have recently come to grips with SQL which is far superior ... including great helper programs.

Any XML I come across these days,I just read into an array and deal with it much faster that way ... but of course you need to know your source, and your needs be simplistic.

 


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Share this post


Link to post
Share on other sites
Alodar

Mostly it's that.  I have control over it once I download it etc, but as for creating it from the source data or anything else, no, I have no control.  It's a file sent to me that I then have to manipulate.

Share this post


Link to post
Share on other sites
mLipok

What a specyfic format you have ?

 


Signature beginning:   Wondering who uses AutoIT and what it can be used for ?
* GHAPI UDF - modest begining - comunication with GitHub REST API Forum Rules *
ADO.au3 UDF     POP3.au3 UDF     XML.au3 UDF    How to use IE.au3  UDF with  AutoIt v3.3.14.x  for other useful stuff click the following button

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library
 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF *

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskScheduler

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2018-10-31

Share this post


Link to post
Share on other sites
jchd

Since you convert XML into CVS I understand your format is fixed and doesn't rely on variable structure XML allows.

Maybe you could just parse the XML source using a regexp then directly populate your DB from the resulting array, all within AutoIt.
Another avenue: eqsily convert fixed-format XML inJSON then directly import it into DB; SQLite handles that for instance. The drawback is then that the data is still encapsulated in JSON, but specific SQL[ite]functions permit selecting stuff in there as well, albeit not as easily compared to full-fledged SQL tables.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
TheDcoder
4 hours ago, jchd said:

eqsily convert fixed-format XML inJSON then directly import it into DB; SQLite handles that for instance.

Does SQLite handle the converstion from XML to JSON as well? If not, how would we convert fixed-format XML to JSON?


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Spoiler

My contributions to the AutoIt Community

If I have hurt or offended you in anyway, Please accept my apologies, I never (regardless of the situation) mean to do that to anybody!!!

3fHNZJ.gif

PLEASE JOIN ##AutoIt AND HELP THE IRC AUTOIT COMMUNITY!

Share this post


Link to post
Share on other sites
jchd

"No" and "StringRegExp" resp.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
TheDcoder

@jchd I'd not classify writing RegEx for converting XML to JSON "easy", I recently tried to do something similar (stripping elements from XML) using RegEx and even it was not too easy with some trivial bug :lol:. Unless someone has already written RegEx for converting XML -> JSON? :blink:


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Spoiler

My contributions to the AutoIt Community

If I have hurt or offended you in anyway, Please accept my apologies, I never (regardless of the situation) mean to do that to anybody!!!

3fHNZJ.gif

PLEASE JOIN ##AutoIt AND HELP THE IRC AUTOIT COMMUNITY!

Share this post


Link to post
Share on other sites
jchd

Remember we're talking of a (presumably) fixed-format XML source, convertible to pure CSV as a routine task. That means an array of series of similar entities.

I'd wouldn't call "difficult" the task to convert from the first form to any of the next:

XML
<people>
    <firstname>Jimmy</firstname><lastname>Hendrix</lastname>
    <firstname>Jim</firstname><lastname>Morrison</lastname>
    <firstname>Frank</firstname><lastname>Zappa</lastname>
    <firstname>Robert</firstname><lastname>Wyatt</lastname>
    <firstname>John</firstname><lastname>Coltrane</lastname>
</people>

JSON
{
    "people": [
        { "firstname":"Jimmy", "lastname":"Hendrix" },
        { "firstname":"Jim", "lastname":"Morrison" },
        { "firstname":"Frank", "lastname":"Zappa" },
        { "firstname":"Robert", "lastname":"Wyatt" },
        { "firstname":"John", "lastname":"Coltrane" }
    ]
}

CSV
"firstname","lastname"
"Jimmy","Hendrix"
"Jim","Morrison"
"Frank","Zappa"
"Robert","Wyatt"
"John","Coltrane"

SQL
insert into People ("firstname", "lastname") values
  ('Jimmy', 'Hendrix'),
  ('Jim', 'Morrison'),
  ('Frank', 'Zappa'),
  ('Robert', 'Wyatt'),
  ('John', 'Coltrane')
;

Very basic string manipulation indeed.

  • Like 1
  • Thanks 1

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
TheDcoder

@jchd I see, it would be easy enough to process fixed-format XML I guess... Thanks for example :)


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Spoiler

My contributions to the AutoIt Community

If I have hurt or offended you in anyway, Please accept my apologies, I never (regardless of the situation) mean to do that to anybody!!!

3fHNZJ.gif

PLEASE JOIN ##AutoIt AND HELP THE IRC AUTOIT COMMUNITY!

Share this post


Link to post
Share on other sites
jchd
Posted (edited)

It's still plain that arbitrarily nested XML or JSON isn't a suitable source for CVS nor SQL table (both being equivalent to 2D arrays), but as far as I correctly understood OP's context, this isn't the situation he has to handle.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
Alodar

Indded.  What I have is a context with mutiple (variable number) of subset items within it that i need to split.

e.g.

<projects>
    <projectlist>
        <projectid>Project1</projectid>
        <tasks>
          <task>Code1</task>
              <taskinfo>blah blah</taskinfo>
              <taskinfo2>blah blah</taskinfo2>
              <taskinfo3>blah blah</taskinfo3>
          <task>Code2</task>
              <taskinfo>blah blah</taskinfo>
              <taskinfo2>blah blah</taskinfo2>
              <taskinfo3>blah blah</taskinfo3>
        </tasks>
    </projectlist>
    <projectlist>
        <projectid>Project2</projectid>
        <tasks>
          <task>Code1</task>
              <taskinfo>blah blah</taskinfo>
              <taskinfo2>blah blah</taskinfo2>
              <taskinfo3>blah blah</taskinfo3>
          <task>Code2</task>
              <taskinfo>blah blah</taskinfo>
              <taskinfo2>blah blah</taskinfo2>
              <taskinfo3>blah blah</taskinfo3>
        </tasks>
    </projectlist>
</projects>

 

And it's a huge list; 1000 items+.  My current plan of using XML.au3 and then looping through each 'set' and placing it in arrays and then writing it to separate csvs (one for tasks and their info, and one for project id's and it's info - it has info the same way the tasks do) works, but it's a bit slow.  Perhaps clunky?  Dunno, but i can't think of a faster / smoother way to do it?

Share this post


Link to post
Share on other sites
jchd

I see; the structure you get isn't as "flat" as you would want for a simple conversion.

If your RDBMS offers support for JSON with functions to search the JSON tree, you could import the source convertedto JSON. Then use DB functions to populate regular tables from there.

Alternatively, searching the XML by regexp to first get an array of projects, then for each project an array of tasks then for each task an array of steps. From there you can directly populate the SQL DB tables with their foreign keys to link data between entities. That would be my own choice.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
Alodar

I'd love to do that, but unfortunately I'm only allowed to submit csv files to it.

Unless there's a faster way to read the xml (300mb), then I don't know what to do.  I like the idea of using regex to read it line by line, but will that be qualitatively faster than using the XML dom?

Share this post


Link to post
Share on other sites
jchd

Only real-world tests will say.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×