Sign in to follow this  
Followers 0
mlowery

Convert text file to SQLite quickly?

6 posts in this topic

#1 ·  Posted (edited)

I need to load log files from an Apache webserver into an SQLite database for analysis. What's the fastest way to read an ASCII text file (CRLF delimited) into SQLite records? Currently, I'm:

  • Reading the file into an array
  • Parsing fields from each record/line using StringRegExp
  • Creating individual INSERT statements from these fields using StringFormat
  • Executing INSERTS into a memory SQLite database
  • Dumping the memory database to a file
I feel like I should be able to insert records or parse fields more efficiently. How can I improve performance?

Local $db = @ScriptDir & "\test.db3"
Local $testlog_src = "logfile.log"
Local $template_logentry = '(\d+\.\d+\.\d+\.\d+) (.+) (.+) \[(.+)\] "(.+)" (.+) (.+) "(.+)" "(.+)"' ; separates record into fields
Local $template_sqlinsert = "INSERT INTO aTest(ip,unk,user,datetime,request,status,bytes,referrer) " & _
        'VALUES("%s","%s","%s","%s","%s","%s","%s","%s");'
Enum $_ip, $_unk, $_user, $_datetime, $_request, $_status, $_bytes, $_referrer
;===============================================================================
; MAIN
;===============================================================================
; INITIALIZE DATABASE
_SQLite_Startup()

_SQLite_Open() ; open :memory: Database
_SQLite_Exec(-1, "CREATE TABLE aTest(ip,unk,user,datetime,request,status,bytes,referrer);") ; CREATE a Table

; read and parse log entries
$log = ""

SplashTextOn("", "reading file...")
_FileReadToArray($testlog_src, $log)


SplashTextOn("", "parsing file to db...")
For $i = 1 To $log[0]
    $bar = StringRegExp($log[$i], $template_logentry, 1)
    _SQLite_Exec(-1, StringFormat($template_sqlinsert, $bar[$_ip], $bar[$_unk], $bar[$_user], $bar[$_datetime], $bar[$_request], $bar[$_status], $bar[$_bytes], $bar[$_referrer])) ; INSERT Data
Next

_SQLite_Dump($db)

Edited by mlowery

Share this post


Link to post
Share on other sites



Please test if

Local $db = @ScriptDir & "\test.db3"
Local $testlog_src = "logfile.log"
Local $template_logentry = '(\d+\.\d+\.\d+\.\d+) (.+) (.+) \[(.+)\] "(.+)" (.+) (.+) "(.+)" "(.+)"' ; separates record into fields
Local $template_sqlinsert = "INSERT INTO aTest(ip,unk,user,datetime,request,status,bytes,referrer) " & _
        'VALUES("%s","%s","%s","%s","%s","%s","%s","%s");'
Enum $_ip, $_unk, $_user, $_datetime, $_request, $_status, $_bytes, $_referrer
;===============================================================================
; MAIN
;===============================================================================
; INITIALIZE DATABASE
_SQLite_Startup()

_SQLite_Open() ; open :memory: Database
_SQLite_Exec(-1, "PRAGMA synchronous = OFF;")
_SQLite_Exec(-1, "CREATE TABLE aTest(ip,unk,user,datetime,request,status,bytes,referrer);") ; CREATE a Table

; read and parse log entries
$log = ""

SplashTextOn("", "reading file...")
_FileReadToArray($testlog_src, $log)


SplashTextOn("", "parsing file to db...")
_SQLite_Exec(-1, "BEGIN TRANSACTION")
For $i = 1 To $log[0]
    $bar = StringRegExp($log[$i], $template_logentry, 1)
    _SQLite_Exec(-1, StringFormat($template_sqlinsert, $bar[$_ip], $bar[$_unk], $bar[$_user], $bar[$_datetime], $bar[$_request], $bar[$_status], $bar[$_bytes], $bar[$_referrer])) ; INSERT Data
Next
_SQLite_Exec(-1, "COMMIT TRANSACTION")
_SQLite_Dump($db)

gives you any speed improvements... I'm curious myself :)

Share this post


Link to post
Share on other sites

There is a slight gain. Measured from the start of "parsing," timer ticks went from 24563 to 22591 with the TRANSACTION additions. (I knew there had to be a more efficient way to send INSERTS in bulk form -- thanks for the tip!)

The log file seems to load quickly into the array. I think it's the parsing of fields that's the slow part of the process. I'm sure there's a better way to do that.

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

The log file seems to load quickly into the array. I think it's the parsing of fields that's the slow part of the process. I'm sure there's a better way to do that.

Maybe StringRegExp() is not the fastest way to split the strings... can imagine that it needs a lot of computing power compared to stringright(), stringleft(), stringmid(), stringinstr() and stringlen() combos... post an (anonymized :) ) log-entry and we'll try it that way...

Best Regards

Edited by KaFu

Share this post


Link to post
Share on other sites

Maybe StringRegExp() is not the fastest way to split the strings... can imagine that it needs a lot of computing power compared to stringright(), stringleft(), stringmid(), stringinstr() and stringlen() combos... post an (anonymized :) ) log-entry and we'll try it that way...

Best Regards

My guess too is that RegExp is probably the slowpoke. The stinker is that the log file fields aren't fixed width, and don't have absolute delimiters. The first field is an IP address, which can be as short as 1.1.1.1 or as long as 255.255.255.255. Some of the fields are enclosed in quotes, but most aren't.

I considered a slightlier messy combination of StringSplits. I'll give that a shot and see if it's better. (I don't have any problem uploading anonymized data; it's just that the sample log from one day is about 4 MB.) If helpful, a typical line in the logfile looks like this (all on one line):

CODE
xx.xx.xx.xxx - - [07/Dec/2008:03:50:32 -0700] "GET www.domain.com/ HTTP/1.1" 200 7452 "http://www.domain.de/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Orange 8.0; InfoPath.1)"

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Sorry , for my poor English. lol

I think the problem was that keyword "INSERT".

You can test the following code.

#include <sqlite.au3>
#include <sqlite.dll.au3>

_SQLite_Startup()
_SQLite_Open("abc.db3")
_SQLite_Exec(-1,"Create table abc(id)")

For $i=1 To 100 Step 1
    _SQLite_Exec(-1,"Insert into abc values ('1')")
    ConsoleWrite($i & @CRLF)
Next

_SQLite_Close ("abc.db3")
_SQLite_Shutdown ()oÝ÷ Ù8^­ë.ÛºÚ"µÍÉÝÌMNL
]]ÒUË^H[YÎÉÝÌMNL
H]]Ò]ÕÜ[ÚYÝÑ^]ÛÙNU[YNM
N

It spended 14 seconds to insert only 100 data.

Edited by Xwolf

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0