buymeapc Posted June 25, 2013 Share Posted June 25, 2013 (edited) Hi all! I have two files. One of which, I read into a SQLite database and the other I read line by line and search the SQLite database for the lines from the file. It works, but it's really slow, so I'm wondering if it's my process or just the nature of the beast. File2 (FSTEST.dat) is 8mb and File1 (ESTEST.dat) is 1.5mb. These are the smaller of the sets of files (the other sets can be upwards of 50mb or so). Is there a way to make this run faster? Thanks for the help as always! expandcollapse popup#include <File.au3> #include <SQLite.au3> #include <SQLite.dll.au3> Global $hQuery, $aRow, $fl_array2 Global $sTempDB = @ScriptDir & "\TempDB.db" Global $iTotalCount = 0, $iTotalSoFar = 0 Global $iKey1_a = 1 Global $iKey1_b = 12 Global $iKey2_a = 18 Global $iKey2_b = 25 Global $st = TimerInit() Global $iOrphans, $iDupes, $iMatches ; --------------------------------------------- ; Start SQLite ; --------------------------------------------- _SQLite_Startup() If @error Then ConsoleWrite('error loading sqlite.dll' & @LF) Exit EndIf If FileExists($sTempDB) Then FileDelete($sTempDB) _SQLite_Open($sTempDB) If @error Then ConsoleWrite('Unable to open DB' & @LF) _Exit() EndIf ; --------------------------------------------- ; Get the files to compare ; --------------------------------------------- $sFile1 = FileOpenDialog("Select EXTEST.DAT", @ScriptDir, "Dat files (*.dat)", 1 + 2, "") If @error Then _Exit() $sFile2 = FileOpenDialog("Select FSTEST.DAT", @ScriptDir, "Dat files (*.dat)", 1 + 2, "") If @error Then _Exit() ; --------------------------------------------- ; Load the data from File2 into the db ; --------------------------------------------- _ReLoad() ; --------------------------------------------- ; Compare the fields from File1 to those in File2 ; --------------------------------------------- CompareItems($sFile1) ; --------------------------------------------- Func CompareItems($sFileContents) Dim $aSplit _FileReadToArray($sFileContents, $aSplit) If IsArray($aSplit) Then If $aSplit[1] = "" Then _ArrayDelete($aSplit, 1) $aSplit[0] -= 1 EndIf $iTotalCount += $aSplit[0] For $i = 1 To $aSplit[0] $sPart1 = StringMid($aSplit[$i], $iKey1_a, ($iKey1_b - $iKey1_a) + 1) $sPart2 = StringMid($aSplit[$i], $iKey2_a, ($iKey2_b - $iKey2_a) + 1) If $sPart1 <> "" Then $iCount = 0 _SQLite_Query(-1, 'select * from compare where PART1 = "' & $sPart1 & '" and PART2 = "' & $sPart2 & '";', $hQuery) While _SQLite_FetchData($hQuery, $aRow) = $SQLite_OK If $aRow[0] <> "" Then $iCount += 1 WEnd ConsoleWrite($sPart1 & " $iCount = " & $iCount & @CRLF) If $iCount = 0 And $i <> $aSplit[0] Then $iOrphans += 1 EndIf If $iCount = 1 Then $iMatches += 1 EndIf If $iCount > 1 Then $iDupes += 1 EndIf EndIf $iTotalSoFar += 1 Next ConsoleWrite("---------------------------" & @CRLF) ConsoleWrite("Orphans: " & $iOrphans & @CRLF) ConsoleWrite("Matches: " & $iMatches & @CRLF) ConsoleWrite("Dupes: " & $iDupes & @CRLF) ConsoleWrite("Total: " & $iTotalCount & @CRLF) EndIf EndFunc ;==>CompareItems Func _ReLoad() ; drop "parts" table if it exists, re-define and reload it If _SQLite_Exec(-1, 'drop table if exists compare;') <> $SQLite_OK Then ConsoleWrite('Drop table failed' & @LF) _Exit() Else ConsoleWrite('Compare table dropped for refresh' & @LF) EndIf If _SQLite_Exec(-1, 'create table compare (PART1, PART2);') <> $SQLite_OK Then ConsoleWrite('Create Table Failed' & @LF) _Exit() EndIf _FileReadToArray($sFile2, $fl_array2) Switch @error Case 1 ConsoleWrite('Input file failed to open' & @LF) _Exit() Case 2 ConsoleWrite('Unable to split file' & @LF) _Exit() EndSwitch Local $aLine, $sql ProgressOn('Loading Compare Table', 'Please Wait', '', Default, Default, 2 + 16); 2 - without always on top / 16 - Window can be moved _SQLite_Exec(-1, "begin immediate;") For $1 = 1 To $fl_array2[0] ProgressSet(($1 / $fl_array2[0]) * 100) $sql = 'insert into compare values ("' & _ StringMid($fl_array2[$1], $iKey1_a, ($iKey1_b - $iKey1_a) + 1) & '", "' & _ StringMid($fl_array2[$1], $iKey2_a, ($iKey2_b - $iKey2_a) + 1) & '");' If _SQLite_Exec(-1, $sql) <> $SQLite_OK Then ConsoleWrite('Table insert failed STMT = ' & $sql & @LF) _Exit() EndIf Next _SQLite_Exec(-1, "commit;") ProgressOff() ConsoleWrite('Table loaded with ' & UBound($fl_array2) - 1 & ' records in ' & Round(TimerDiff($st) / 1000, 3) & ' seconds' & @LF) EndFunc ;==>_ReLoad Func _Exit() _SQLite_Close() _SQLite_Shutdown() Exit EndFunc ;==>_Exit (I don't know why the code box always loses my tab structure...weird) Test Files.zip Edited June 25, 2013 by buymeapc Link to comment Share on other sites More sharing options...
Zedna Posted June 25, 2013 Share Posted June 25, 2013 I recommend to create index in SQLite database table for columns part1 and part2. Resources UDF Â ResourcesEx UDF Â AutoIt Forum Search Link to comment Share on other sites More sharing options...
buymeapc Posted June 26, 2013 Author Share Posted June 26, 2013 Hi Zedna, thanks for the response. Since SQLite is very new to me, is there an example you could point me to get me started, please? Link to comment Share on other sites More sharing options...
Geir1983 Posted June 26, 2013 Share Posted June 26, 2013 Is it possible to create CRC of the two files and just compare this? Link to comment Share on other sites More sharing options...
buymeapc Posted June 26, 2013 Author Share Posted June 26, 2013 Is that possible with AutoIt? I wouldn't even know how to begin to create a CRC. My ultimate goal of this script is to read each line of File1 and see if characters 1 - 12 exist in File2. Pretty simple, but the amount of data is huge which is making this challenging. Link to comment Share on other sites More sharing options...
jchd Posted June 26, 2013 Share Posted June 26, 2013 Well 50 MiB is far from huge by today's standards. First, you should escape (using _SQLite_FastEscape()) string in SQLite commands. Then a quick search will turn out very similar topics, like: Some examples will do what you want directly, some after minor changes. Store only the head of strings in the DB if you need to compare of chars 1..12. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
buymeapc Posted June 28, 2013 Author Share Posted June 28, 2013 Ok, here's an odd thing I noticed. I changed my script to use an autoincrementing primary key, which is great and all. But why would these two 'create' function make a huge speed difference in querying files?? When using the lower one, querying is blazing fast, while the top one runs incredibly slow. Why is that? If _SQLite_Exec(-1, 'CREATE TABLE Strings (id integer primary key autoincrement, String1 CHAR CONSTRAINT ksString1 NOT NULL, String2 CHAR CONSTRAINT ksString2 NOT NULL);') <> $SQLite_OK Then ; SLOW (about 2 queries/second) If _SQLite_Exec(-1, 'CREATE TABLE Strings (id integer primary key autoincrement, String1 CHAR CONSTRAINT ksString1 UNIQUE ON CONFLICT IGNORE, String2);') <> $SQLite_OK Then ; VERY FAST (about 100 queries/second) Link to comment Share on other sites More sharing options...
jchd Posted June 28, 2013 Share Posted June 28, 2013 The second schema uses a unique index on ksString1 while the first doesn't have any index on ksString1. That will make a very significant difference in speed in queries involving this column in some way. I bet this is the effect you notice. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
buymeapc Posted June 28, 2013 Author Share Posted June 28, 2013 Ah, ok. That makes sense. Is there any way to keep the speed without using the unique constraint since it's possible to have an item more than once? Well, to be technical, the first string could use the unique constraint, but the second string might have multiple values associated with the first. How would I be able to accomodate that? Link to comment Share on other sites More sharing options...
Solution jchd Posted June 29, 2013 Solution Share Posted June 29, 2013 Either REPLACE (alias for INSERT OR REPLACE) or INSERT OR IGNORE could do the trick. Anyway be sure to enclose your insert loop(s) within a transaction (BEGIN; ... COMMIT;) to speed up bulk inserts. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
buymeapc Posted July 1, 2013 Author Share Posted July 1, 2013 Thanks for the help! With your suggestions, I used the following line and I got my speed back. It works much better now. If _SQLite_Exec(-1, "CREATE TABLE Strings (String1 TEXT, String2 TEXT, PRIMARY KEY (String1, String2) ON CONFLICT IGNORE);") <> $SQLite_OK Then Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now