Jump to content

Trying to Find and True Duplicate Files using Dictionary Object


Recommended Posts

I want to create an AutoIt script able to search for true duplicate files (same SIZE + same MD5 HASH/CHECKSUM).

Getting each file's size is relatively easy; unlike getting MD5... However I got an external ActiveX=COM component to easily get the MD5 hash.

I use the "Dictionary Object" since I have "Windows Script Host" installed in my PC, and AutoIt lacks a true and easy similar thing...

My strategy is: use, as dictionary's keys, the combination of [$size & $md5].

As dictionary's items, I use Full Path (which, of course, includes the name of each file).

I tried to use the dictionary's "Exists" method, which checks and does not allow 2 keys to be equal. Thus, if a true duplicate file exists, the keys (combination of [$size & $md5]) would be equal... This is the way how to "hunt"/"catch" the true duplicate files...

In theory, I think I am correct; however, in practice I am making something wrong, because my script never catches true duplicate files (and I voluntary copied the same file many times; just to test/debug my script)...

Here it is:

#Include <File.au3>
#Include <Array.au3>

$path = "L:\mais eBooks\0000\TESTES" 
$files_list_array = _FileListToArray($path, "*.*")

For $n = 1 To $files_list_array[0]
    
    $file_size_bytes = FileGetSize($path & "\" & $files_list_array[$n])

    $md5_object = ObjCreate("XStandard.MD5") ; "ActiveX=COM Component to easily get MD5 Hash/CheckSum
    ; It is FreeWare!!!... If interested, look at:
    ; http://www.xstandard.com/en/documentation/xmd5/

    $md5_hash = $md5_object.GetCheckSumFromFile ($path & "\" & $files_list_array[$n])

    $dict = ObjCreate("Scripting.Dictionary")
    
    $dict.CompareMode = 1 ; "Text Mode"

    $dict_key = $file_size_bytes & $md5_hash

    $dict_item = $path & "\" & $files_list_array[$n]

    If Not $dict.Exists ($dict_key) Then
        $dict.Add ($dict_key, $dict_item)
        
        ; MsgBox (0, "", $dict_key & Chr(13) & $dict_item)
        
    ElseIf $dict.Exists ($dict_key) Then
        FileWrite(@DesktopDir & "\Dupes.csv", $dict_item & "," & $dict_key & Chr(13))
    EndIf
    
Next

Can you please help me to correct/debug this script?

Thanks.

Regards.

MLMK - my blogging craziness...
Link to comment
Share on other sites

I want to create an AutoIt script able to search for true duplicate files (same SIZE + same MD5 HASH/CHECKSUM).

Getting each file's size is relatively easy; unlike getting MD5... However I got an external ActiveX=COM component to easily get the MD5 hash.

I use the "Dictionary Object" since I have "Windows Script Host" installed in my PC, and AutoIt lacks a true and easy similar thing...

My strategy is: use, as dictionary's keys, the combination of [$size & $md5].

As dictionary's items, I use Full Path (which, of course, includes the name of each file).

I tried to use the dictionary's "Exists" method, which checks and does not allow 2 keys to be equal. Thus, if a true duplicate file exists, the keys (combination of [$size & $md5]) would be equal... This is the way how to "hunt"/"catch" the true duplicate files...

In theory, I think I am correct; however, in practice I am making something wrong, because my script never catches true duplicate files (and I voluntary copied the same file many times; just to test/debug my script)...

Create your objects for the dictionary and the MD5 function only once, outside of the loop. The file size is irrelevant. Two differing files may have the same size, but not the same MD5 hash. Just work with the hash, as it is the only reliably unique property:

#Include <File.au3>
#Include <Array.au3>

$md5_object = ObjCreate("XStandard.MD5") ; "ActiveX MD5 Hash/CheckSum

$dict = ObjCreate("Scripting.Dictionary")
$dict.CompareMode = 1 ; "Text Mode"

$path = "L:\mais eBooks\0000\TESTES"

$files_list_array = _FileListToArray($path, "*.*")

For $n = 1 To $files_list_array[0]
    $sFile = $path & "\" & $files_list_array[$n]
    $dict_key = $md5_object.GetCheckSumFromFile($sFile)
    If $dict.Exists($dict_key) Then
        FileWrite(@DesktopDir & "\Dupes.csv", $sFile & "," & $dict_key & Chr(13))
    Else
        $dict.Add($dict_key, $sFile)
    EndIf
Next

Not tested, of course...

:P

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...