mlowery Posted December 10, 2008 Share Posted December 10, 2008 (edited) I need to load log files from an Apache webserver into an SQLite database for analysis. What's the fastest way to read an ASCII text file (CRLF delimited) into SQLite records? Currently, I'm:Reading the file into an arrayParsing fields from each record/line using StringRegExpCreating individual INSERT statements from these fields using StringFormatExecuting INSERTS into a memory SQLite databaseDumping the memory database to a fileI feel like I should be able to insert records or parse fields more efficiently. How can I improve performance?Local $db = @ScriptDir & "\test.db3" Local $testlog_src = "logfile.log" Local $template_logentry = '(\d+\.\d+\.\d+\.\d+) (.+) (.+) \[(.+)\] "(.+)" (.+) (.+) "(.+)" "(.+)"' ; separates record into fields Local $template_sqlinsert = "INSERT INTO aTest(ip,unk,user,datetime,request,status,bytes,referrer) " & _ 'VALUES("%s","%s","%s","%s","%s","%s","%s","%s");' Enum $_ip, $_unk, $_user, $_datetime, $_request, $_status, $_bytes, $_referrer ;=============================================================================== ; MAIN ;=============================================================================== ; INITIALIZE DATABASE _SQLite_Startup() _SQLite_Open() ; open :memory: Database _SQLite_Exec(-1, "CREATE TABLE aTest(ip,unk,user,datetime,request,status,bytes,referrer);") ; CREATE a Table ; read and parse log entries $log = "" SplashTextOn("", "reading file...") _FileReadToArray($testlog_src, $log) SplashTextOn("", "parsing file to db...") For $i = 1 To $log[0] $bar = StringRegExp($log[$i], $template_logentry, 1) _SQLite_Exec(-1, StringFormat($template_sqlinsert, $bar[$_ip], $bar[$_unk], $bar[$_user], $bar[$_datetime], $bar[$_request], $bar[$_status], $bar[$_bytes], $bar[$_referrer])) ; INSERT Data Next _SQLite_Dump($db) Edited December 10, 2008 by mlowery Link to comment Share on other sites More sharing options...
KaFu Posted December 10, 2008 Share Posted December 10, 2008 Please test if Local $db = @ScriptDir & "\test.db3" Local $testlog_src = "logfile.log" Local $template_logentry = '(\d+\.\d+\.\d+\.\d+) (.+) (.+) \[(.+)\] "(.+)" (.+) (.+) "(.+)" "(.+)"' ; separates record into fields Local $template_sqlinsert = "INSERT INTO aTest(ip,unk,user,datetime,request,status,bytes,referrer) " & _ 'VALUES("%s","%s","%s","%s","%s","%s","%s","%s");' Enum $_ip, $_unk, $_user, $_datetime, $_request, $_status, $_bytes, $_referrer ;=============================================================================== ; MAIN ;=============================================================================== ; INITIALIZE DATABASE _SQLite_Startup() _SQLite_Open() ; open :memory: Database _SQLite_Exec(-1, "PRAGMA synchronous = OFF;") _SQLite_Exec(-1, "CREATE TABLE aTest(ip,unk,user,datetime,request,status,bytes,referrer);") ; CREATE a Table ; read and parse log entries $log = "" SplashTextOn("", "reading file...") _FileReadToArray($testlog_src, $log) SplashTextOn("", "parsing file to db...") _SQLite_Exec(-1, "BEGIN TRANSACTION") For $i = 1 To $log[0] $bar = StringRegExp($log[$i], $template_logentry, 1) _SQLite_Exec(-1, StringFormat($template_sqlinsert, $bar[$_ip], $bar[$_unk], $bar[$_user], $bar[$_datetime], $bar[$_request], $bar[$_status], $bar[$_bytes], $bar[$_referrer])) ; INSERT Data Next _SQLite_Exec(-1, "COMMIT TRANSACTION") _SQLite_Dump($db) gives you any speed improvements... I'm curious myself OS: Win10-22H2 - 64bit - German, AutoIt Version: 3.3.16.1, AutoIt Editor: SciTE, Website: https://funk.eu AMT - Auto-Movie-Thumbnailer (2022-Nov-26) BIC - Batch-Image-Cropper (2023-Apr-01) COP - Color Picker (2009-May-21) DCS - Dynamic Cursor Selector (2024-Feb-16) HMW - Hide my Windows (2018-Sep-16) HRC - HotKey Resolution Changer (2012-May-16) ICU - Icon Configuration Utility (2018-Sep-16) SMF - Search my Files (2023-Jun-03) - THE file info and duplicates search tool SSD - Set Sound Device (2017-Sep-16) Link to comment Share on other sites More sharing options...
mlowery Posted December 10, 2008 Author Share Posted December 10, 2008 There is a slight gain. Measured from the start of "parsing," timer ticks went from 24563 to 22591 with the TRANSACTION additions. (I knew there had to be a more efficient way to send INSERTS in bulk form -- thanks for the tip!) The log file seems to load quickly into the array. I think it's the parsing of fields that's the slow part of the process. I'm sure there's a better way to do that. Link to comment Share on other sites More sharing options...
KaFu Posted December 10, 2008 Share Posted December 10, 2008 (edited) The log file seems to load quickly into the array. I think it's the parsing of fields that's the slow part of the process. I'm sure there's a better way to do that.Maybe StringRegExp() is not the fastest way to split the strings... can imagine that it needs a lot of computing power compared to stringright(), stringleft(), stringmid(), stringinstr() and stringlen() combos... post an (anonymized ) log-entry and we'll try it that way...Best Regards Edited December 10, 2008 by KaFu OS: Win10-22H2 - 64bit - German, AutoIt Version: 3.3.16.1, AutoIt Editor: SciTE, Website: https://funk.eu AMT - Auto-Movie-Thumbnailer (2022-Nov-26) BIC - Batch-Image-Cropper (2023-Apr-01) COP - Color Picker (2009-May-21) DCS - Dynamic Cursor Selector (2024-Feb-16) HMW - Hide my Windows (2018-Sep-16) HRC - HotKey Resolution Changer (2012-May-16) ICU - Icon Configuration Utility (2018-Sep-16) SMF - Search my Files (2023-Jun-03) - THE file info and duplicates search tool SSD - Set Sound Device (2017-Sep-16) Link to comment Share on other sites More sharing options...
mlowery Posted December 11, 2008 Author Share Posted December 11, 2008 Maybe StringRegExp() is not the fastest way to split the strings... can imagine that it needs a lot of computing power compared to stringright(), stringleft(), stringmid(), stringinstr() and stringlen() combos... post an (anonymized ) log-entry and we'll try it that way... Best RegardsMy guess too is that RegExp is probably the slowpoke. The stinker is that the log file fields aren't fixed width, and don't have absolute delimiters. The first field is an IP address, which can be as short as 1.1.1.1 or as long as 255.255.255.255. Some of the fields are enclosed in quotes, but most aren't. I considered a slightlier messy combination of StringSplits. I'll give that a shot and see if it's better. (I don't have any problem uploading anonymized data; it's just that the sample log from one day is about 4 MB.) If helpful, a typical line in the logfile looks like this (all on one line): CODExx.xx.xx.xxx - - [07/Dec/2008:03:50:32 -0700] "GET www.domain.com/ HTTP/1.1" 200 7452 "http://www.domain.de/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Orange 8.0; InfoPath.1)" Link to comment Share on other sites More sharing options...
Xwolf Posted January 12, 2009 Share Posted January 12, 2009 (edited) Sorry , for my poor English. lol I think the problem was that keyword "INSERT". You can test the following code. #include <sqlite.au3> #include <sqlite.dll.au3> _SQLite_Startup() _SQLite_Open("abc.db3") _SQLite_Exec(-1,"Create table abc(id)") For $i=1 To 100 Step 1 _SQLite_Exec(-1,"Insert into abc values ('1')") ConsoleWrite($i & @CRLF) Next _SQLite_Close ("abc.db3") _SQLite_Shutdown ()oÝ÷ Ù8^ë.ÛºÚ"µÍÉÝÌMNL ]]ÒUË^H[YÎÉÝÌMNL H]]Ò]ÕÜ[ÚYÝÑ^]ÛÙNU[YNM N It spended 14 seconds to insert only 100 data. Edited January 12, 2009 by Xwolf Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now