Jump to content

Loading a database from disk to memory {and back again}


Recommended Posts

Quick question - is it possible to take a SQlite database from the disk, load it into memory, work with it in-memory, then flush it back to disk? I have a script that basically runs tens of thousands of database operations and I'd like to see if removing the constant disk read/write operations from the equation would speed things up a little.

Link to comment
Share on other sites

Quinch,

Besides loading the DB in memory (which will certainly work well until its size blasts RAM out), maybe (I insist on "maybe") there are valuable SQLite optimizations that would apply to your use case. Not knowing the particular of it makes it fairly difficult to give you any advice, lest wild guesses. If you can inform us better on what your bottleneck(s) really are, things would clear up.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Oh, it's simple enough - it opens a file on the disk, runs a MD5 and SHA1 hash, then queries the database to see if an entry with those two values exists. If no, then add it {plus a couple of other values}. The potential bottleneck is in the sheer volume of those queries - the files themselves range between a handful of bytes to a hundred or two kilobytes, but there are routinely tens of thousands of them so I'd like to see if taking those numerous queries to memory {the database itself is about ten to twenty megabytes big} as opposed to disk would speed it up.

I also realize that it would be a fairly simple job of reading all the contents of the disk database, table by table, and inserting them into the memory one, but I'm not sure whether the time spent doing that would outweigh possible time saved.

Edited by Quinch
Link to comment
Share on other sites

The backup (works for any source, any target, RAM or disk) will always be faster than doing anything at the SQL level.

I'm not sure you need both MD5 and SHA1. If I were you, I'd stick to MD5 only unless you're working in a potentially unfriendly context (where determined attacker have enough incentive to foil you in taking a file content for another one). I wild guess that your actual bottleneck is in computing the hashes, not much at the lookup level, but I agree it depends on the frequency of new files found vs registered files.

Now for the DB part: after a few queries and/or updates/inserts it's likely that your DB will all reside in OS and/or SQLite cache. so making a memory copy of the DB won't spped up things significantly. Make SQLite cache big enough: 10..20Mb is close to nothing today.

Another route to spped lookup is to first hash only the first (say) 1Kb of every file and lookup this. If it's found, then only hash the whole baby and lookup that. Of course the efficiency of that tactics only works if you're likely to find the "head hash" often enough to offset the (supposedly rare) double hash.

Only trial will tell you what works best in your case.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Depending on the number of queries and your implementation, using transactions could also speed up the SQL portion of the process, but as @jchd pointed out the hashing algorithms are going to take largest amount of computational time.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...