Sign in to follow this  
Followers 0
cookiemonster

MD5 a folder

15 posts in this topic

I already know how to MD5 check a folders contents using _Crypt_HashFile but what I would like to do is MD5 the folder as a whole rather than each file in it, has anyone got any suggestions to point me in the right direction for this? Is it even possible?

Example:

I have this folder:

c:test

with 5 files in it

At the moment I get the MD5 for all 5 files, I want to MD5 the folder 'test' so I have one MD5 for the folder rather than 5 (one for each file)

Share this post


Link to post
Share on other sites



You can't md5 a whole folder. However you can Enumerate the files with _FileListToArrray and Md5 each. After that you can add all the strings into a whole string. For example:

7d57a619199f55893319bcaf78ddbb94

0d84a154b2c0520b0c9db876b2108e9d

148d2bee8401710e51588117bd5769ff

61f052ffe4903319e89648a0a8881e78

Into:

 

7d57a619199f55893319bcaf78ddbb940d84a154b2c0520b0c9db876b2108e9d148d2bee8401710e51588117bd5769ff61f052ffe4903319e89648a0a8881e78

After that, you count it's md5 again and you will get:

053e630c05a34677d9c13734c6bad6ab

So, you can get an md5 hash which depends on the folder.

Share this post


Link to post
Share on other sites

However you can Enumerate the files with _FileListToArrray and Md5 each. After that you can add all the strings into a whole string.

 

Unc3nZureD, there is a problem with that method, in case the order of the files changes (a file renamed for example), you will get a different string and of course a different hash. Well, of course all depends on what OP really needs ;-).

cookiemonster, I've just created it as an example, it only takes care about md5 hashes for files, no matters whether a file name changes or if a file has been moved to another dir, but again, don't know whether you want this or not. If you take care about the name of the file use Unc3nZureD suggestion or a mix of both ;-).

#include <File.au3>
#include <Crypt.au3>

$sPath = @ScriptDir & "\test\"
$aList = _FileListToArray($sPath, "*", 1, True)
;$aList = _FileListToArrayRec($sPath, "*", 1, 1, 0, 2) ;use this instead if you want to be recursive

_Crypt_Startup()
Local $MD5 = 0
For $i = 1 To $aList[0]
    $aList[$i] = Hex(_Crypt_HashFile($aList[$i], $CALG_MD5))
    Local $aTemp = StringRegExp($aList[$i], "(.{4}+)", 3)
    Local $sumTemp = 0
    For $j = 0 To UBound($aTemp) - 1
        $sumTemp += Dec($aTemp[$j])
    Next
    $MD5 += $sumTemp
Next
$MD5 = _Crypt_HashData($MD5, $CALG_MD5)
_Crypt_Shutdown()

ConsoleWrite("MD5 (well, it isn't but...) for directory " & $sPath & " is: " & Hex($MD5) & @CRLF)

Cheers,

sahsanu

Share this post


Link to post
Share on other sites

Instead of doing a hash of the whole file, how about reading a small percentage e.g.

PS That actually answers the OP's question as well.


_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

 

there is a problem with that method, in case the order of the files changes (a file renamed for example)

You're right.

But his solution seems to be a good way for me, with some changes.

Like Unc3nZureD suggests, you can generate an array with all md5 checksums, and then simply sort the array (so the file name has no more importance).

But wait, what about an empty folder ? Should it be part of the checksum calculation ? You could convert the folder name with StringToBinary ? (just an idea)

Share this post


Link to post
Share on other sites

You're right.

But his solution seems to be a good way for me, with some changes.

Like Unc3nZureD suggests, you can generate an array with all md5 checksums, and then simply sort the array (so the file name has no more importance).

But wait, what about an empty folder ? Should it be part of the checksum calculation ? You could convert the folder name with StringToBinary ? (just an idea)

 

Well, currently I'm against myself :D Let's look the following example:

Folder with the following:

- apple      (1)

- banana   (2)

- cake       (3)

ranamed:

- apple       -> qwerty            (3)

- banana    -> still banana    (2)

- cake        -> apple              (1)

This way ordering by name isn'T good :) However I'Ve got an idea. You could order it by FILE SIZE. That has really really low (nearly impossible without on purpose manual manipulation) chance to conflict.

Probably if the folder is empty then he could return 0 or MD5 of the folder name.

Share this post


Link to post
Share on other sites

Unc3nZureD, my suggestion was : you can generate an array with all md5 checksums, and then simply sort the array.

I meant that the array will contain only the MD5 of each file/folder-name, so there will not be any problem with sorting (I think)...

What do you think about this ?

Share this post


Link to post
Share on other sites

Oh, really :D Yeah, that's true. Agree, that's a good idea :)

Share this post


Link to post
Share on other sites

Good idea, yes and not...

Because if we use the checksum of folder names, what about a renamed folder ?

Cookiemonster should explain the goal and decide if yes or not the file and folder names are important.

Share this post


Link to post
Share on other sites

well, md5 of a folder could be simply 0 :D

Share this post


Link to post
Share on other sites

Good idea, yes and not...

Because if we use the checksum of folder names, what about a renamed folder ?

Cookiemonster should explain the goal and decide if yes or not the file and folder names are important.

Agreed, you'd expect the mds to change if a folder or file is renamed.

I'd guess that the OP wanted to check if any contents had been altered.

List to array, array sort, loop and hash, hash all hashes.


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

As I found the subject interesting and funny, I tried to create a function _FolderCheckSum :shifty:

It offers options for recursivity, algorithm to use and percentage of the files to read (suggested bu Guinness in #4)

#include <Array.au3>
#include <crypt.au3>


Local $checksum = _FolderCheckSum(@Desktopdir, 1, 25, $CALG_MD5)
MsgBox(0, "", "Checksum for the folder " & @Desktopdir & " : " & @CRLF & $checksum)



; #FUNCTION# ====================================================================================================================
; Name ..........: _FolderCheckSum
; Syntax ........: _FolderCheckSum($sDir[, $iRecur = 0[, $iPercent = Default[, $iALG_ID = $CALG_MD5]]])
; Parameters ....: $sDir                - Folder full path.
;                  $iRecur              - [optional] 1 - Search in all subfolders (unlimited recursion)
;                                                    0 - Do not search in subfolders (Default)
;                  $iPercent            - [optional] Percentage of the file size to read for the hash. Default is 25
;                  $iALG_ID             - [optional] Hash ID to use (see crypt.au3). Default is $CALG_MD5
; Return values .: A hash of the whole files (from a combination of hash of all files)
; ===============================================================================================================================
Func _FolderCheckSum($sDir, $iRecur = 0, $iPercent = Default, $iALG_ID = $CALG_MD5)
    If NOT FileExists($sDir) Then Return SetError(1, 0, -1)
    If ($iPercent > 100) Or ($iPercent < 0) Or $iPercent = Default Then $iPercent = 25

    Local $aDirs[1] = [ StringRegExpReplace($sDir, "\\$", "") ], $aFiles[1] = [0]
    Local $iCountDir = 0, $iCountFile = 0, $n = 0
    Local $hSearch, $sFileName
    Local $iRead
    Local $sResult

    While 1
        $hSearch = FileFindFirstFile( $aDirs[$n]  & "\*.*"  )
        If $hSearch <> -1 Then
            While 1
                $sFileName = FileFindNextFile($hSearch)
                If @error Then ExitLoop

                If @Extended Then
                    If $iRecur Then
                        $iCountDir += 1
                        If $iCountDir >= UBound($aDirs) Then Redim $aDirs[ UBound($aDirs) * 2]
                        $aDirs[$iCountDir] = StringRegExpReplace($aDirs[$n], "\\$", "") & "\" & $sFileName
                    EndIf
                Else
                    $iCountFile += 1
                    If $iCountFile >= UBound($aFiles) Then Redim $aFiles[ UBound($aFiles) * 2]
                    $iRead = ($iPercent / 100) * FileGetSize($aDirs[$n] & "\" & $sFileName)
                    $aFiles[$iCountFile] =  _Crypt_HashData(FileRead($aDirs[$n] & "\" & $sFileName, $iRead), $iALG_ID)
                EndIf

            WEnd
        EndIf
        
        FileClose($hSearch)

        If $n = $iCountDir Then ExitLoop
        $n += 1
    WEnd
    
    If $iCountFile = 0 Then Return SetError(2, 0, 0)

    _ArraySort($aFiles)
    For $i = 1 To $iCountFile
        $sResult &= @CRLF & StringReplace($aFiles[$i], "0x", "")
    Next

    $sResult = StringRegExpReplace($sResult, "^\R", "")
    Return _Crypt_HashData($sResult, $iALG_ID)

EndFunc

(i'm happy de show you my own non-recursive function for subfolders) :sweating:

Edited by jguinch

Share this post


Link to post
Share on other sites

As I found the subject interesting and funny, I tried to create a function _FolderCheckSum :shifty:

It offers options for recursivity, algorithm to use and percentage of the files to read (suggested bu Guinness in #4)

#include <Array.au3>
#include <crypt.au3>

Local $checksum = _FolderCheckSum(@Desktopdir, 1, 25, $CALG_MD5)
MsgBox(0, "", "Checksum for the folder " & @Desktopdir & " : " & @CRLF & $checksum)



; #FUNCTION# ====================================================================================================================
; Name ..........: _FolderCheckSum
; Syntax ........: _FolderCheckSum($sDir[, $iRecur = 0[, $iPercent = Default[, $iALG_ID = $CALG_MD5]]])
; Parameters ....: $sDir                - Folder full path.
;                  $iRecur              - [optional] 1 - Search in all subfolders (unlimited recursion)
;                                                    0 - Do not search in subfolders (Default)
;                  $iPercent            - [optional] Percentage of the file size to read for the hash. Default is 25
;                  $iALG_ID             - [optional] Hash ID to use (see crypt.au3). Default is $CALG_MD5
; Return values .: A hash of the whole files (from a combination of hash of all files)
; ===============================================================================================================================
Func _FolderCheckSum($sDir, $iRecur = 0, $iPercent = Default, $iALG_ID = $CALG_MD5)
    If NOT FileExists($sDir) Then Return SetError(1, 0, -1)
    If ($iPercent > 100) Or ($iPercent < 0) Or $iPercent = Default Then $iPercent = 25

    Local $aDirs[1] = [ StringRegExpReplace($sDir, "\\$", "") ], $aFiles[1] = [0]
    Local $iCountDir = 0, $iCountFile = 0, $n = 0
    Local $hSearch, $sFileName
    Local $iRead
    Local $sResult

    While 1
        $hSearch = FileFindFirstFile( $aDirs[$n]  & "\*.*"  )
        If $hSearch <> -1 Then

            While 1
                $sFileName = FileFindNextFile($hSearch)
                If @error Then ExitLoop

                If @Extended Then
                    If $iRecur Then
                        $iCountDir += 1
                        If $iCountDir >= UBound($aDirs) Then Redim $aDirs[ UBound($aDirs) * 2]
                        $aDirs[$iCountDir] = StringRegExpReplace($aDirs[$n], "\\$", "") & "\" & $sFileName
                    EndIf
                Else
                    $iCountFile += 1
                    If $iCountFile >= UBound($aFiles) Then Redim $aFiles[ UBound($aFiles) * 2]
                    $iRead = ($iPercent / 100) * FileGetSize($aDirs[$n] & "\" & $sFileName)
                    $aFiles[$iCountFile] =  _Crypt_HashData(FileRead($aDirs[$n] & "\" & $sFileName, $iRead), $iALG_ID)
                EndIf

            WEnd
        Else
            Return SetError(2, 0, -1)
        EndIf

        FileClose($hSearch)

        If $n = $iCountDir Then ExitLoop
        $n += 1

    WEnd

    _ArraySort($aFiles)
    For $i = 1 To $iCountFile
        $sResult &= @CRLF & StringReplace($aFiles[$i], "0x", "")
    Next

    $sResult = StringRegExpReplace($sResult, "^\R", "")
    Return _Crypt_HashData($sResult, $iALG_ID)

EndFunc

(i'm happy de show you my own non-recursive function for subfolders) :sweating:

Hey so ive been away this weekend hence no responses, friday night i started getting somewhere with doing a MD5 on all the files in the folder, then md5 the result. I have just now tried the code ive quoted but I can seem to specify the directory to it, ive put 

Local $checksum = _FolderCheckSum("c:\Test", 1, 25, $CALG_MD5)
MsgBox(0, "", "Checksum for the folder " & "c:\Test" & " : " & @CRLF & $checksum)

But this gives me a -1 response

Share this post


Link to post
Share on other sites

No Matter what I change @desktopdir to, eg: "c:newfolder" it still gives me a response of -1

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

Hey so I thought id post a bit more information on what im trying to achieve seen as it was reuqested.

I have four folders:

C:Folder1, C:Folder2, C:Folder3, C:Folder4

Each folder has a range of sub folders and files.

What I am aiming for is one folder at a time, get the md5 for all files, then get the md5 for that total string, check it against the hard coded md5 in the script, if they are identical, then we move to the next MD5 folder check, if they are not identical then it will call a function to delete the folder, then download the original from a centralized server. Once downloaded it will need to run the check again before continuing to the next MD5 folder check to ensure it has downloaded all necessary folders and files.

Does that make sense? Anyone have any questions for further clarification?

At until recent I only had to check one folder, which had around 200 files, so each files md5 was hard coded into the .au3, but now I have four files totaling around 1000 files, so I dont really want to carry on the way I was going as it will put a big bit of bulk in the code and is messy

No matter what was I do it, i need to do 100% checks for the files, so if i get the MD5 I think I need to do a 100% MD5 on the files

Edited by cookiemonster

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0