Jump to content

best practice to store/query/show duplicates using sqlite


Recommended Posts

hello

i am thinking of storing information on several thousands of files in a sqlite file.  to help with duplicates, is there a way to sort the actual database file by (not results) but actual data in a specific column? this way i can search for a something and assume the next item is a duplicate and if its not then exitloop. (hope that makes sense)

to show all results i was thinking of using a treeview and if there are duplicates then show them as subitems.

please let me know any of your thoughts on the matter

thanks!

Link to comment
Share on other sites

Are you talking about text files or files with an homogenous structure? Because "sort by actual data" and "search for something" are pretty vague concepts. And how would you decide that two files are "duplicate"?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

files  would be media files - so id store metadata for each file.  i'd know each file was duplicate by its sha1 hash value - so id want to sort by the column storing those values

Link to comment
Share on other sites

To use @Chimp query, you'll need something to build the database. This is a skeleton you could use. Hasher.au3 is by trancexx in the example forum.

#include <String.au3>
#include <File.au3>
#include <SQLite.au3>
#include "..\Include\Autres\hashes.au3"

Local $dll = _SQLite_Startup("c:\bin\sqlite3.dll")
Local $hDB

Local $DBname = "MyFiles.sq3"
FileDelete($DBname)
$hDB = _SQLite_Open($DBname)
_SQLite_Exec($hDB, "pragma page_size = 64536;")
_SQLite_Exec($hDB, "pragma cache_size = 10000;")
_SQLite_Exec($hDB, "pragma journal_mode = WAL;")

ConsoleWrite(_SQLite_LibVersion() & @LF)
Local $aRows, $iCols, $iRows
_SQLite_GetTable2d($hDB, "pragma compile_options", $aRows, $iRows, $iCols)
_SQLite_Display2DResult($aRows)

_SQLite_Exec($hDB, _
    "create table if not exists Files (" & _
        "Id integer primary key, " & _
        "Title text collate nocase, " & _
        "Album text collate nocase, " & _
        "Hash text" & _
    ");" _
)

Local $Time = TimerInit()
Local $sDir = "C:\Users\jc\Music\CDs"
Local $aFiles = _FileListToArrayRec($sDir, "*.flac", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_FASTSORT, $FLTAR_FULLPATH)
If @error Then
    ConsoleWrite(@extended & @LF)
Else
    $Time = TimerDiff($Time)
    Local $iFiles = $aFiles[0]
    ConsoleWrite("Enumerating " & $iFiles & " files in array took " & Round($Time / 1000, 2) & "s" & " (" & Round($Time * 1000 / $iFiles, 1) & "µs per file)" & @LF)
    $Time = TimerInit()
    Local $sDrive, $sFolder, $sName, $sExt, $sParent
    _SQLite_Exec($hDB, "begin immediate")
    For $i = 1 To $iFiles
        _PathSplit($aFiles[$i], $sDrive, $sFolder, $sName, $sExt)
        _SQLite_Exec($hDB, "insert into " & _
                                "Files (" & _
                                    "Title, " & _
                                    "Album, " & _
                                    "Hash" & _
                                ") values (" & _
                                    X($sName) & _
                                    XX($sFolder) & _
                                    XX(_SHA1ForFile($aFiles[$i])) & _
                                ")")
    Next
    _SQLite_Exec($hDB, "end")
    $Time = TimerDiff($Time)
    ConsoleWrite("Inserting " & $iFiles & " files in database took " & Round($Time / 1000, 2) & "s (" & Round($Time * 1000 / $iFiles, 1) & "µs per file)" & @LF)
    _SQLite_Exec($hDB, "create index if not exists ixAlbumTitle on Files (Album, Title)")
EndIf
_SQLite_Close($hDB)
_SQLite_Shutdown()

Func X($s)
    Return "'" & StringReplace($s, "'", "''") & "'"
EndFunc

Func XX($s)
    Return ", '" & StringReplace($s, "'", "''") & "'"
EndFunc

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Of course you'll have to adapt the code to your setup and needs. Once loaded, you can simply use SQLite Expert to query. With the DB created by the above script on my own PC I can now query: select * from files where title like '%love%' order by album, title  and get fast answer.

Adapt Chimp's query to list duplicates hashes. This is only the simplest setup, the schema can be extended to support much, much more.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...