Jump to content
gcue

best practice to store/query/show duplicates using sqlite

Recommended Posts

gcue

hello

i am thinking of storing information on several thousands of files in a sqlite file.  to help with duplicates, is there a way to sort the actual database file by (not results) but actual data in a specific column? this way i can search for a something and assume the next item is a duplicate and if its not then exitloop. (hope that makes sense)

to show all results i was thinking of using a treeview and if there are duplicates then show them as subitems.

please let me know any of your thoughts on the matter

thanks!

Share this post


Link to post
Share on other sites
jchd

Are you talking about text files or files with an homogenous structure? Because "sort by actual data" and "search for something" are pretty vague concepts. And how would you decide that two files are "duplicate"?


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
gcue

files  would be media files - so id store metadata for each file.  i'd know each file was duplicate by its sha1 hash value - so id want to sort by the column storing those values

Share this post


Link to post
Share on other sites
Chimp

Hi @gcue,

in this thread, I've posted a possible way to go.


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites
jchd

To use @Chimp query, you'll need something to build the database. This is a skeleton you could use. Hasher.au3 is by trancexx in the example forum.

#include <String.au3>
#include <File.au3>
#include <SQLite.au3>
#include "..\Include\Autres\hashes.au3"

Local $dll = _SQLite_Startup("c:\bin\sqlite3.dll")
Local $hDB

Local $DBname = "MyFiles.sq3"
FileDelete($DBname)
$hDB = _SQLite_Open($DBname)
_SQLite_Exec($hDB, "pragma page_size = 64536;")
_SQLite_Exec($hDB, "pragma cache_size = 10000;")
_SQLite_Exec($hDB, "pragma journal_mode = WAL;")

ConsoleWrite(_SQLite_LibVersion() & @LF)
Local $aRows, $iCols, $iRows
_SQLite_GetTable2d($hDB, "pragma compile_options", $aRows, $iRows, $iCols)
_SQLite_Display2DResult($aRows)

_SQLite_Exec($hDB, _
    "create table if not exists Files (" & _
        "Id integer primary key, " & _
        "Title text collate nocase, " & _
        "Album text collate nocase, " & _
        "Hash text" & _
    ");" _
)

Local $Time = TimerInit()
Local $sDir = "C:\Users\jc\Music\CDs"
Local $aFiles = _FileListToArrayRec($sDir, "*.flac", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_FASTSORT, $FLTAR_FULLPATH)
If @error Then
    ConsoleWrite(@extended & @LF)
Else
    $Time = TimerDiff($Time)
    Local $iFiles = $aFiles[0]
    ConsoleWrite("Enumerating " & $iFiles & " files in array took " & Round($Time / 1000, 2) & "s" & " (" & Round($Time * 1000 / $iFiles, 1) & "µs per file)" & @LF)
    $Time = TimerInit()
    Local $sDrive, $sFolder, $sName, $sExt, $sParent
    _SQLite_Exec($hDB, "begin immediate")
    For $i = 1 To $iFiles
        _PathSplit($aFiles[$i], $sDrive, $sFolder, $sName, $sExt)
        _SQLite_Exec($hDB, "insert into " & _
                                "Files (" & _
                                    "Title, " & _
                                    "Album, " & _
                                    "Hash" & _
                                ") values (" & _
                                    X($sName) & _
                                    XX($sFolder) & _
                                    XX(_SHA1ForFile($aFiles[$i])) & _
                                ")")
    Next
    _SQLite_Exec($hDB, "end")
    $Time = TimerDiff($Time)
    ConsoleWrite("Inserting " & $iFiles & " files in database took " & Round($Time / 1000, 2) & "s (" & Round($Time * 1000 / $iFiles, 1) & "µs per file)" & @LF)
    _SQLite_Exec($hDB, "create index if not exists ixAlbumTitle on Files (Album, Title)")
EndIf
_SQLite_Close($hDB)
_SQLite_Shutdown()

Func X($s)
    Return "'" & StringReplace($s, "'", "''") & "'"
EndFunc

Func XX($s)
    Return ", '" & StringReplace($s, "'", "''") & "'"
EndFunc

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
gcue

thank you for the help and feedback =)

Share this post


Link to post
Share on other sites
jchd

Of course you'll have to adapt the code to your setup and needs. Once loaded, you can simply use SQLite Expert to query. With the DB created by the above script on my own PC I can now query: select * from files where title like '%love%' order by album, title  and get fast answer.

Adapt Chimp's query to list duplicates hashes. This is only the simplest setup, the schema can be extended to support much, much more.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.