benners

Deciding between using SQLite or an array

9 posts in this topic

#1 ·  Posted

Info

I am working on a program and have reached a point where it may be prudent to use a database rather than an array. A user selects multiple files or folders that contain updates for different Office versions. These are then extracted and information gleaned from the msp file. There maybe as many as 300 - 350+ exe files than can extract into 1000+ msp files and these will be then loaded into a treeview.

The treeview is sorted, and I thought it would be better for the treeview and array order to match, then when processing I could just loop through the array. When a file is selected, it's details are retrieved from the array and displayed on a GUI. The 2D array has 23 columns at present and starts initially with 500 rows, and grows as required. One problem I had was sorting the array by two columns, I have tried a few solutions posted on the forum but none worked, so  I thought I would use SQL power. 

With the SQL calls, a temp 1D array is filled by different functions and the array rows correlate to the number of table columns. The insert string is created from the array, and contains the correct number of values as I thought It would save me specifying the columns every time.

When using an array the time for a set number of files (65) was about 18 secs, this includes extraction and GUI updates etc. Using a DB file on a SSD it takes around 24 secs, using a DB in memory it is on par with the array, around 18 secs. I will use a memory DD finally, but to check the data, I currently write one to disk. After the program closes, there is no need to retain the file info at present, but in the future, I may check a file against a table entry and save the time retrieving the info again if they are exact, depends on how much time is actually saved.

The question

Does anyone have any better ideas to accomplish the above. any thoughts on the DB setup, I have attached a sample that the program created, I know diddly about DB structure whether it would be better to use multiple tables for x86/x64 or different Office versions 2007 - 2016 etc, or just stick with the array implementation. There are a lot of long strings in some of the columns so this may break an array?

I have attached the DB file. This has all the Office updates for 2010 - 2016 so it should provide a wide enough sample.

 

Updates.7z

Share this post


Link to post
Share on other sites



#2 ·  Posted

I can probably help, but some points are a bit unclear to me.
For instance, will you have to search and dissect TARGETPRODUCTCODES values and/or LANGUAGE?

What do your typical and pathalogical queries look like? 18 s looks like way too much. SQLite DBs with billion rows may lead to such delays, but not a 65-row table!

It seems at first that you'll have to build more tables to streamline access. I can see the use for a language table, a targetproductcode table and their cross tables to achieve NtoN relationship (using foreign keys).

I'd risk to bet that you can store many million rows yet obtain quasi-instant access to the actual data you need, of course given a sensible schema including the correct indices for the queries you're gonna use. My point is that it would be countrer-productive to throw away the DB instead of enriching it with new data.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Firstly, apologies for the long post and maybe mansplaining.

The 18s is, I would expect, mainly due to the extraction of the msp files from the exe. When creating the attached DB the recored time was 200 seconds, this extracted 358 files and returned 1450+ msp files that needed parsing. This could also be improved by code changes\methods of analysis.

The program is an updated version of one I did here a while ago. I didn't have time to maintain it and the website  looks to be spammed by unrelated content. 

On program startup a DB is opened and table created (if needed)

Func __Database_CreateTable($s_TableName = 'Updates')
    _Log_Header('__Database_CreateTable')
    _Log_Write('Table Name: ' & $s_TableName)

    Local _
            $s_Function = '_SQLite_Exec', _
            $s_Fields = _
            '"KBARTICLENUMBER" INTEGER NOT NULL,' & _
            '"CLASSIFICATION" TEXT NOT NULL,' & _
            '"DESCRIPTION" TEXT NOT NULL,' & _
            '"STDPACKAGENAME" TEXT NOT NULL,' & _
            '"TARGETPRODUCTNAME" TEXT NOT NULL,' & _
            '"BUILDNUMBER" TEXT NOT NULL,' & _ ; hope this is the correct type?
            '"CREATIONTIMEUTC" TEXT NOT NULL,' & _
            '"MOREINFOURL" TEXT NOT NULL,' & _
            '"PLATFORM" TEXT NOT NULL,' & _
            '"FILESIZE" TEXT NOT NULL,' & _
            '"SUPERSEDEDBY" TEXT,' & _
            '"SUPERSEDES" TEXT,' & _
            '"UPDATECATALOGURLS" TEXT,' & _
            '"TARGETPRODUCTCODES" TEXT NOT NULL,' & _
            '"LANGUAGE" TEXT NOT NULL,' & _
            '"MAJORVERSION" INTEGER NOT NULL,' & _
            '"MSPSOURCEDIRECTORY" TEXT NOT NULL,' & _
            '"MSPFILENAME" TEXT NOT NULL,' & _
            '"MSPFILEHASH" TEXT,' & _
            '"MSPMODIFIEDTIME" INTEGER NOT NULL,' & _
            '"ORIGINALPATH" TEXT NOT NULL,' & _
            '"CHILDHANDLE" TEXT NOT NULL,' & _
            '"ISCHECKED" INTEGER', _
            $s_SQL = "CREATE TABLE IF NOT EXISTS " & $s_TableName & " (" & $s_Fields & ");"

    Local $v_Parameters = -1 & '#' & $s_SQL
    Local $i_lineNumber = 413
    Local $i_Ret = _SQLite_Exec(-1, $s_SQL)
    If Not @error Then Return SetError(@error, _Log_Write('Table Created', 1), _Log_Header())

    #Region #### Fatal error processing ###################
    _Log_WriteFatalInfo($s_Function, 'Database', $i_lineNumber, $v_Parameters, @error, $i_Ret, _SQLite_ErrMsg())
    #EndRegion #### Fatal error processing ###################

    Return SetError(1, 0, _Log_Header())
EndFunc   ;==>__Database_CreateTable

Basically a user selects a folder containing office setup file(s), would normally be one but I am coding to allow multiple Office versions\languages etc to be selected at once. An array is created with properties relating to the selected Office products, including language and the ProductID of the application. This will be used later when deciding which updates are relevant to which Office. 

The user then selects the updates, again could be multiple folders and these are extracted, dir scanned for msp, an array is produced and then looped through,  _SQLite_Exec(-1, "BEGIN") and  _SQLite_Exec(-1, "COMMIT") are executed before start ,and after the loop finishes. During each loop functions fill an array for the DB then this array is formatted to produce part of the INSERT command. 

Func _Database_InsertRow($s_Values = _DatabaseArray_FormatToSQLValues(), $s_TableName = 'Updates')
    _Log_Header('_Database_InsertRow')

    Local _
            $s_SQL = "INSERT INTO " & $s_TableName & " VALUES (" & $s_Values & ");", _ ; sql string
            $s_Function = '_SQLite_Exec' ; running function

    _Log_Write('Inserting Row..')
    _Log_Write('SQL Query : ' & $s_SQL)
    _Log_Write('Row Number: ' & _Database_GetRowCount())


    Local $v_Parameters = -1 & '#' & $s_SQL
    Local $i_lineNumber = 167
    Local $i_Ret = _SQLite_Exec(-1, $s_SQL)
    If Not @error Then Return _Log_Header()

    #Region #### Fatal error processing ###################
    _Log_WriteFatalInfo($s_Function, 'Database', $i_lineNumber, $v_Parameters, @error, $i_Ret, _SQLite_ErrMsg())
    #EndRegion #### Fatal error processing ###################

    Return SetError(1, @extended, _Log_Header())
EndFunc   ;==>_Database_InsertRow

When using an array I create the full array then loop through it to add items to a treeview. I'll have to change if using a DB this so I can add the child handle to the DB when running the INSERT command. When a treeview item is selected, the properties are read from the DB to fill specific labels, as shown in the pic. I will use the treeviews child item handle to get the information. I suppose I could make this DB column unique?. I plan on adding the option to select updates based on a specific language, so to answer your question, yes the LANGUAGE column will need to be searched. I plan on something like this

SELECT * FROM Updates WHERE Instr(LANGUAGE, '0000') > 0 OR Instr(LANGUAGE, '0804') > 0

Most of the usage of the ProductCodes will at present be done when collating the information for the INSERT command. It will only be when the main function is run that these will be queried. Idea is to copy Office setup files to a destination then run through the treeviews selected updates and copy these to the Office's Updates folder. I will check the current Office ProductID code against the the current update and if the ProductID is not in the TARGETPRODUCTCODES column mark it as not required.

check if a product code is in a column
SELECT * FROM Updates WHERE Instr(TARGETPRODUCTCODES, '90140000-0016-0401-1000-0000000FF1CE') > 0

I am new to the DB setup so learning as I go. I think to start it is vital to get the DB structure correct and efficient then alter the code I have to fit with the DB layout.

 

Cheers

treeview.png

Edited by benners

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Updates.sq3I've spent a little time on your question. I've changed the schema to apply what I said in previous post.

The Updates table is more or less your original, but I added an explicit ID (alias for rowid). There are 4 new tables:
Productcodes: tables of all known product codes, each given an Id (alias for rowid)
Languages: table of all known languages, each given an Id (alias for rowid)

Two other tables link Updates IDs to productcodeIDs and languagesIDs (N to N relationships).

Using this schema, a sample query to find applicable updates to a given productcode for a given language might look like this (I interpreted language '0000' as "language-independant and I may miss something else):

select distinct U.Id,
       KBARTICLENUMBER,
       DESCRIPTION,              
       PLATFORM,
       TARGETPRODUCTNAME,
       MSPFILENAME,       
       ISCHECKED

from updates U
     join updprod up on u.id = up.updid
     join productcodes pc on up.prodid = pc.id      
     join updlang ul on u.id = ul.updid     
     join languages l on ul.langid = l.id

where
     prod = '90140000-003D-0000-1000-0000000FF1CE' and     
     lang in ('0000', '0422')

It returns 52 results (distinct updates) in 20ms on my very slow PC. Of course that is just a simple example off the top of my head.

To play with the joined DB (I changed the extension to avoid collision) without having to write code, download and install SQLite Expert (home edition is free) from http://www.sqliteexpert.com/ and then you can issue SQL queries/statements and adapt the schema if needed.

You'll probably have to add suitable indices for the actual queries you need, and possibly make the schema more powerful using triggers and much more, depending on your actual use.

Updates.sq3

 

Updates.sq3

Edited by jchd
DB reloaded after vacuum. IDK why two D/L links, the first doesn't work!
2 people like this

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Firstly, Thank you so much for your help and time on this thread

11 hours ago, jchd said:

Of course that is just a simple example off the top of my head

Well, please don't post a complex example :P . For someone like myself who is hard of thinking and has the patience of a firework, it will take a while for me to get to grips with the information you posted. I've been on multiple SQL websites and still haven't fully grasped the join portion.

I have been using SQLite Expert Personal for a while, I have seen you mention it in other threads. Do you have the Professional version?. I was looking at the features list and think the  Visual SQL Query Builder may help speed up my learning and was wondering if you knew what, if any, differences it had over the free offerings like SQLeo.

11 hours ago, jchd said:

You'll probably have to add suitable indices for the actual queries you need, and possibly make the schema more powerful using triggers and much more, depending on your actual use.

This is way above my experience level, I will try to rewrite the database part of my script to build the DB as you have laid out, but if it gets too complicated I'll just have to stick to the one table until I'm more experienced.  I like the Language table as it would make it easier to load a combo for specific language selection than the delimited string setup I had, that was a throw back from the array usage.

Oh, treat yourself to a faster PC :D j/k

 

Edited by benners

Share this post


Link to post
Share on other sites

#6 ·  Posted

I recommend going thru https://www.w3schools.com/sql/default.asp at your own pace. It doesn't list SQLite explicitely but the great majority of stuff equally applies. OTOH SQLite has its own set of singularities but this won't stop you from learning much.

Yes I own a Pro version of Expert. Bogdan (the author) is like a remote friend.

Arrays and lists in SQL columns are no-nos. Not only you almost always end up with a lot of duplication (product codes and languages are repeated a large number of time in the original table, making the DB larger than it should be, thus slowing things down), but it's also a pain in the CPU to access elements.

Instead, typically, a "parent" table of all distinct elements is built with unique IDs and another "child" table will hold every "link" from, say, an Update row to a productcode. That's the UpdProd table. Similarly for the UpdLang table, establishing relationships between a given update and the languages it targets. Those "links" are using IDs, unique 64-bit identifiers assigned to every row of every table. A "child" tables refering to ID of another table, we call that a foreign key, declared with the keyword REFERENCES. There is magic involved in foreign keys, more correctly: automagic, in that if you update a parent ID (a parent ID) the optional ON UPDATE CASCADE will update child keys for you in child tables. ON DELETE CASCADE will similarly delete child rows refering to the deleted parent ID. For instance if you delete some update entriy in Updates table, it will automagically delete every entry in UpdProd and UpdLang refering to this update. This is because we've declared really strong relationships between these tables.

If you look at UpdProd (24,626 rows) and UpdLang (6313 rows) tables you see that large numbers of entries: one for each individual relationship. You may think its big, huge and will be slow, but indices and cache both come into play to speed up access tremendously. Playing with Expert you can see the timings (lower left corner) of every request issued in the SQL tab. See for yourself that seemingly "complex" SQL statements yield correct results in dozens of ms, this including the rendering in the display grid.

You shouldn't be afraid of joins. They are the glue which allows you to stick together pieces of information, based on established relationships and selection criterions. SQL can seem verbose at times but using its syntax to describe the results you want is in fact way shorter and most of the times much more efficient than describing in some conventional programming language the steps and algorithms in full detail required to actually deliver those results. Like in a restaurant, it's more comfortable to order a "fricandeau de veau sauce noisette, please" than actually going to the kitchen and cooking yourself the said meal.

You're welcome to chime again at any time.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#7 ·  Posted

@jchd

Well bought some SQL books and the Pro version of SQLite Expert. So I have some reading and playing to do :) . v4.xx is slower than v3.xx and v5.xx Alpha on queries and loading for me, so don't know what that's about.

I have gotten the database to a similar layout as you created previously. I made a few name changes, dropped unneeded columns etc. I have attached the UDF and ask if you could give it the once over, if you have time, and offer any suggestions that would improve it (on the SQL side). It won't run as there are other files it relies on, but I'm sure you'll spot some improvements.

Now it's in Db format, I can add extra selection options for the updates that appear in the treeview list. I have another array that I could convert to a table as well. Never ending :blink:

Cheers

OI_Database.au3

Share this post


Link to post
Share on other sites

#8 ·  Posted

I can see you're serious about using SQLite and learn pretty fast. Good point. Implementing indirection (e.g. thru Enums) is solid, albeit a little bit heavy on coding, but preserves solid code when things evolve in the future.

I'd like to open some possibilities but this will go beyond what is sensible in this thread. I'll PM you shortly (even faster than that if you could provide me with 48h days).


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#9 ·  Posted

Thanks for the offer. I'll have some free time this w/end or next week. I work 14 hr shifts so not much time for other stuff. There's no need to rush, PM  me when you have the time. I can also supply you with the full program source to make it easier.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now