Jump to content

import XML into SQLite - what's the most efficient method?


Recommended Posts

i have a set of XML files of this format:

<record>
    <col1>data</col1>
    <col2>data</col2>
    ...
    <colN>data</colN>
</record>
<record>
    <col1>data</col1>
    <col2>data</col2>
    ...
    <colN>data</colN>
</record>
...

i wrote a simple function, based on _StringBetween(), to read the data and rewrite it into a TSV file for post-processing.

i'm going to be processing a LOT of those files on regular basis. i'm considering import into SQLite database for more efficient storing and post-processing. it's going to be accumulated into a single table, where each row is a record and each column name is mentioned in the XML.

this question is concerned about the overall speed and simplicity of the import process:

is there an existing feature of SQLite for importing such XML? i.e. without using AutoIt to read and parse the text? like an XML parser that is native to SQLite?

thanks for any hints!

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

If your XML files always have such a simple 2D array-like structure, you can adopt one of the two approaches: use a faster way to read the XML into a 2D array (a one line regex would do) or transform the XML intoJSON form which then fits well the JSON1 extension.

Personally I'd use the first way. Post a sample real input file and DB.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

yes, the XML structure is very simple. the posted structure is pretty much the way it actually is (col1..colN reworded, originally having a much more descriptive titles).

a regex seems a daunting task. all results of google "regex parse XML" leads nowhere or worse. besides, looking at _StringBetween() i see it already utilizes StringRegExp(), very useful because i provide it with <col1> and </col1> as the start and end substrings, in a loop col1..colN. not sure if a single regexp can successfully handle the entire XML parsing directly into a 2D array. i see StringRegExp() can "Return an array of arrays containing global matches including the full match (Perl / PHP style)." which may be what i'm looking for, but i haven't the slightest idea on how to approach this.

converting to JSON is something i'm about to consider, although at first glance it seems extra work for the same purpose, i doubt it's going to be more efficient.

so for now i'll stick to using _StringBetween().

 

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...