Sign in to follow this  
Followers 0
asgarcymed

Trying to Find and True Duplicate Files using Dictionary Object

3 posts in this topic

I want to create an AutoIt script able to search for true duplicate files (same SIZE + same MD5 HASH/CHECKSUM).

Getting each file's size is relatively easy; unlike getting MD5... However I got an external ActiveX=COM component to easily get the MD5 hash.

I use the "Dictionary Object" since I have "Windows Script Host" installed in my PC, and AutoIt lacks a true and easy similar thing...

My strategy is: use, as dictionary's keys, the combination of [$size & $md5].

As dictionary's items, I use Full Path (which, of course, includes the name of each file).

I tried to use the dictionary's "Exists" method, which checks and does not allow 2 keys to be equal. Thus, if a true duplicate file exists, the keys (combination of [$size & $md5]) would be equal... This is the way how to "hunt"/"catch" the true duplicate files...

In theory, I think I am correct; however, in practice I am making something wrong, because my script never catches true duplicate files (and I voluntary copied the same file many times; just to test/debug my script)...

Here it is:

#Include <File.au3>
#Include <Array.au3>

$path = "L:\mais eBooks\0000\TESTES" 
$files_list_array = _FileListToArray($path, "*.*")

For $n = 1 To $files_list_array[0]
    
    $file_size_bytes = FileGetSize($path & "\" & $files_list_array[$n])

    $md5_object = ObjCreate("XStandard.MD5") ; "ActiveX=COM Component to easily get MD5 Hash/CheckSum
    ; It is FreeWare!!!... If interested, look at:
    ; http://www.xstandard.com/en/documentation/xmd5/

    $md5_hash = $md5_object.GetCheckSumFromFile ($path & "\" & $files_list_array[$n])

    $dict = ObjCreate("Scripting.Dictionary")
    
    $dict.CompareMode = 1 ; "Text Mode"

    $dict_key = $file_size_bytes & $md5_hash

    $dict_item = $path & "\" & $files_list_array[$n]

    If Not $dict.Exists ($dict_key) Then
        $dict.Add ($dict_key, $dict_item)
        
        ; MsgBox (0, "", $dict_key & Chr(13) & $dict_item)
        
    ElseIf $dict.Exists ($dict_key) Then
        FileWrite(@DesktopDir & "\Dupes.csv", $dict_item & "," & $dict_key & Chr(13))
    EndIf
    
Next

Can you please help me to correct/debug this script?

Thanks.

Regards.


MLMK - my blogging craziness...

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I want to create an AutoIt script able to search for true duplicate files (same SIZE + same MD5 HASH/CHECKSUM).

Getting each file's size is relatively easy; unlike getting MD5... However I got an external ActiveX=COM component to easily get the MD5 hash.

I use the "Dictionary Object" since I have "Windows Script Host" installed in my PC, and AutoIt lacks a true and easy similar thing...

My strategy is: use, as dictionary's keys, the combination of [$size & $md5].

As dictionary's items, I use Full Path (which, of course, includes the name of each file).

I tried to use the dictionary's "Exists" method, which checks and does not allow 2 keys to be equal. Thus, if a true duplicate file exists, the keys (combination of [$size & $md5]) would be equal... This is the way how to "hunt"/"catch" the true duplicate files...

In theory, I think I am correct; however, in practice I am making something wrong, because my script never catches true duplicate files (and I voluntary copied the same file many times; just to test/debug my script)...

Create your objects for the dictionary and the MD5 function only once, outside of the loop. The file size is irrelevant. Two differing files may have the same size, but not the same MD5 hash. Just work with the hash, as it is the only reliably unique property:

#Include <File.au3>
#Include <Array.au3>

$md5_object = ObjCreate("XStandard.MD5") ; "ActiveX MD5 Hash/CheckSum

$dict = ObjCreate("Scripting.Dictionary")
$dict.CompareMode = 1 ; "Text Mode"

$path = "L:\mais eBooks\0000\TESTES"

$files_list_array = _FileListToArray($path, "*.*")

For $n = 1 To $files_list_array[0]
    $sFile = $path & "\" & $files_list_array[$n]
    $dict_key = $md5_object.GetCheckSumFromFile($sFile)
    If $dict.Exists($dict_key) Then
        FileWrite(@DesktopDir & "\Dupes.csv", $sFile & "," & $dict_key & Chr(13))
    Else
        $dict.Add($dict_key, $sFile)
    EndIf
Next

Not tested, of course...

:P

Edited by PsaltyDS

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

You are 100% correct!!! Your script works 100% perfectly!!!

Thank you very much!!

Regards.


MLMK - my blogging craziness...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0