orbs Posted July 7, 2017 Posted July 7, 2017 i have a set of XML files of this format: <record> <col1>data</col1> <col2>data</col2> ... <colN>data</colN> </record> <record> <col1>data</col1> <col2>data</col2> ... <colN>data</colN> </record> ... i wrote a simple function, based on _StringBetween(), to read the data and rewrite it into a TSV file for post-processing. i'm going to be processing a LOT of those files on regular basis. i'm considering import into SQLite database for more efficient storing and post-processing. it's going to be accumulated into a single table, where each row is a record and each column name is mentioned in the XML. this question is concerned about the overall speed and simplicity of the import process: is there an existing feature of SQLite for importing such XML? i.e. without using AutoIt to read and parse the text? like an XML parser that is native to SQLite? thanks for any hints! Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
jchd Posted July 7, 2017 Posted July 7, 2017 If your XML files always have such a simple 2D array-like structure, you can adopt one of the two approaches: use a faster way to read the XML into a 2D array (a one line regex would do) or transform the XML intoJSON form which then fits well the JSON1 extension. Personally I'd use the first way. Post a sample real input file and DB. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
orbs Posted July 9, 2017 Author Posted July 9, 2017 yes, the XML structure is very simple. the posted structure is pretty much the way it actually is (col1..colN reworded, originally having a much more descriptive titles). a regex seems a daunting task. all results of google "regex parse XML" leads nowhere or worse. besides, looking at _StringBetween() i see it already utilizes StringRegExp(), very useful because i provide it with <col1> and </col1> as the start and end substrings, in a loop col1..colN. not sure if a single regexp can successfully handle the entire XML parsing directly into a 2D array. i see StringRegExp() can "Return an array of arrays containing global matches including the full match (Perl / PHP style)." which may be what i'm looking for, but i haven't the slightest idea on how to approach this. converting to JSON is something i'm about to consider, although at first glance it seems extra work for the same purpose, i doubt it's going to be more efficient. so for now i'll stick to using _StringBetween(). Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now