Jump to content

Script to recursively get the most frequent file extension


Recommended Posts

Script to recursively get the most frequent file extension

Inside a main folder (aka directory) I have many subfolders, which have sub-subfolders and files. For each main subfolders (depth = 1) I want to know what is the most frequent extension.

Example:

{

C:\Main\Subfolder1\Sub-Subfolder1\01.pdf

C:\Main\Subfolder1\Sub-Subfolder2\02.pdf

C:\Main\Subfolder1\03.txt

}

The most frequent file inside Subfolder1 is pdf

How to do a script to recursively get the most frequent file extension?

Thanks.

Regards.

MLMK - my blogging craziness...
Link to comment
Share on other sites

Here's a glimpse of what you could do if you have one dir:

#include <file.au3>
#include <array.au3>
Dim $szDrive, $szDir, $szFName, $szExt
Dim $Ini=@ScriptDir&'\Test.ini'

; Shows the filenames of all files in the current directory.
$search = FileFindFirstFile("*.*")  

; Check if the search was successful
If $search = -1 Then
    MsgBox(0, "Error", "No files/directories matched the search pattern")
    Exit
EndIf

While 1
    $file = FileFindNextFile($search)
    If @error Then ExitLoop
    If StringInStr(FileGetAttrib (@ScriptDir&'\'&$file), 'D') Then ContinueLoop
   $Splited = _PathSplit(@ScriptDir&'\'&$file, $szDrive, $szDir, $szFName, $szExt)
   If $szExt='' Then ContinueLoop
   $Count=IniRead($Ini, 'Extensions', $szExt, 0)
   IniWrite($Ini, 'Extensions', $szExt, $Count+1)
WEnd

; Close the search handle
FileClose($search)

$Section=IniReadSection($Ini, 'Extensions')

Now do an _ArraySort on the numbers and check what's the corresponding extension...

But IniRead/Write is kinda ugly and you may use arrays instead. ;)

You should search for some udf to get you all files in an array (SmOke_N did one nicely) - not one dir, as desired.

Happing coding. :P

Edited by dabus
Link to comment
Share on other sites

I have a friend that knows a lot about VBScript, although he knows nothing about AutoIt :P

He made this VBS:

CODE
'================

' Count_Most_Frequent_File_Extensions_In_Each_Subfolder_Of_Specific_Folder_With_Recursion.vbs

Const FOLDER_PATH = "C:\Temp"

Const OUTPUT_FILE = "FileExtensionCount.csv"

Set objFileSystem = CreateObject("Scripting.FileSystemObject")

'Set objFolder = objFileSystem.GetFolder(FOLDER_PATH)

strResults = """Folder"",""Most Frequent"",""Amount"""

For Each objFirstSubFolder In objFileSystem.GetFolder(FOLDER_PATH).SubFolders

Set objFileTypes = CreateObject("Scripting.Dictionary")

For Each objFile in objFirstSubFolder.Files

strExtension = LCase(objFileSystem.GetExtensionName(objFile.Path))

If objFileTypes.Exists(strExtension) Then

objFileTypes(strExtension) = objFileTypes(strExtension) + 1

Else

objFileTypes.Add strExtension, 1

End If

Next

Recurse_SubFolder objFirstSubFolder

intMostFrequent = 0

strMostFrequent = ""

For Each strExtension in objFileTypes

If objFileTypes(strExtension) > intMostFrequent Then

intMostFrequent = objFileTypes(strExtension)

strMostFrequent = strExtension

ElseIf objFileTypes(strExtension) = intMostFrequent Then

intMostFrequent = objFileTypes(strExtension)

If strMostFrequent = "" Then

strMostFrequent = strExtension

Else

strMostFrequent = strMostFrequent & " and " & strExtension

End If

End If

Next

strResults = strResults & VbCrLf & """" & objFirstSubFolder.Path & """,""" & strMostFrequent & """,""" & intMostFrequent & """"

Set objFileTypes = Nothing

Next

Set objOutputFile = objFileSystem.CreateTextFile(OUTPUT_FILE, True)

objOutputFile.Write strResults

objOutputFile.Close

Set objOutputFile = Nothing

Set objFolder = Nothing

Set objFileSystem = Nothing

MsgBox "Done. Please see " & OUTPUT_FILE

' RECURSE SUB ROUTINE

Sub Recurse_SubFolder(objSubFolder)

For Each objFolder In objFileSystem.GetFolder(objSubFolder.Path).SubFolders

'Set objFileTypes = CreateObject("Scripting.Dictionary")

For Each objFile in objFolder.Files

strExtension = LCase(objFileSystem.GetExtensionName(objFile.Path))

If objFileTypes.Exists(strExtension) Then

objFileTypes(strExtension) = objFileTypes(strExtension) + 1

Else

objFileTypes.Add strExtension, 1

End If

Next

Recurse_SubFolder objFolder

Next

End Sub

'================

Can anyone help me to translate it, from VBS to AU3

Thanks!

Regards ;)

MLMK - my blogging craziness...
Link to comment
Share on other sites

Hi,

That may be a better approach and would not be too difficult;

You could also let me know if my script achieves your goal, perhaps (i use it modified in 'SearchMiner")

best, randall (see my sig for the udf too for _FilelistToArrayNew)

; _MaxExtensionRegExp3.au3
#include <file.au3>
#include<_FileListToArrayNew2h.au3>; _FileListToArray3($sPath, $sFilter = "*", $iFlag = 0, $iRecurse = 0, $iBaseDir = 1, $sExclude = "")
$s_FileExt = @ScriptDir & "\FindLines.txt" 
local $ar_Array = _FileListToArray3 (@ScriptDir, "*", 1, 1),$c= _ArrayDisplay($ar_Array, "File  List")
FileDelete($s_FileExt)
FileWrite($s_FileExt,_ArrayToString($ar_Array,@CRLF))
$s_Extnlocal = _MaxExtensionRegExp3($s_FileExt)
ConsoleWrite("$s_Extnlocal=" & $s_Extnlocal & @LF)
Func _MaxExtensionRegExp3($s_file)
    Local $j, $s_AnswerFile4, $k, $i_TotalFound = 0;,$ar_Icons[1],$szIconFile, $nIcon=0
    $s_ExtString = "au3" 
    $i_TotalLines = _FileCountLines ($s_file)
    $s_AnswerFile3 = @ScriptDir & "\SearchMiner" & "\AnswerFindLines3.txt" 
    FileCopy($s_file, $s_AnswerFile3, 9)
    $s_AnswerFile4 = @ScriptDir & "\SearchMiner" & "\Answer.txt" 
    $s_ExtMaxName = $s_ExtString
    $s_ExtStrList = $s_ExtString
    $s_ExtMaxNum = 0
    $i_Find = 0
    Local $ar_AnsReg = _FindExtRegExp($s_file, StringRight($s_ExtString, 3))
    $i_Find = UBound($ar_AnsReg) - 1
    If $i_Find > $s_ExtMaxNum Then
        $s_ExtMaxNum = $i_Find
        $s_ExtMaxName = $s_ExtString
    EndIf
    $i_TotalFound += $i_Find
    $h_file = FileOpen($s_AnswerFile3, 0)
    While ($s_ExtMaxNum < ($i_TotalLines - $i_TotalFound))
        $s_Line = FileReadLine($h_file)
        If @error Then ExitLoop
        $s_FileExt = "." & StringRight($s_Line, 3)
        If Not StringInStr($s_ExtStrList, $s_FileExt) Then
            $s_ExtStrList &= "|" & $s_FileExt
            $s_ExtString = $s_FileExt
            $i_Find = 0
            Local $ar_AnsReg = _FindExtRegExp($s_file, StringRight($s_ExtString, 3))
            $i_Find = UBound($ar_AnsReg) - 1
            If $i_Find > $s_ExtMaxNum Then
                $s_ExtMaxNum = $i_Find
                $s_ExtMaxName = $s_ExtString
            EndIf
            $i_TotalFound += $i_Find
        EndIf
    WEnd
    $s_ExtMaxName = StringRight($s_ExtMaxName, 3)
    ConsoleWrite("Max;" & $s_ExtMaxName & @LF)
    FileClose($h_file)
    If StringInStr("exe|ico|cur|bmp|", $s_ExtMaxName) Then $s_ExtMaxName = "fld" 
;~  If StringInStr("exe|ico|cur|dll|bmp|", $s_ExtMaxName) Then $s_ExtMaxName = "fld"
    Return $s_ExtMaxName
EndFunc   ;==>_MaxExtensionRegExp3
Func _FindExtRegExp(ByRef $s_file, $s_Searches, $i_Case = 0)
    If FileExists($s_file) Then
        Local $h_file = FileOpen($s_file, 0), $s_FileRead = FileRead($h_file), $a = FileClose($h_file)
        $s_Searches = StringReplace($s_Searches, ".", "\.")
        Local $pattern = '(?m)(^(?i).*' & $s_Searches & '.*$)' ;, $a = ConsoleWrite("_FindExtRegExp $pattern=" & $pattern & @LF)
        Local $asList = StringRegExp($s_FileRead, $pattern, 3)
        Return $asList
    Else
        SetError(1)
    EndIf
EndFunc   ;==>_FindExtRegExp

PS

Edited by randallc
Link to comment
Share on other sites

Here is my alternative version (recursive):

$extensionsObj = ObjCreate("Scripting.Dictionary")
recursiveExtensionStats("D:\temp")

;Combine file extension and count into string
$string = ""
For $key In $extensionsObj.Keys()
    $string &= $key & " " & $extensionsObj.Item($key) & @CRLF
Next

MsgBox(0,"File Extension Occurrences",$string)

Func recursiveExtensionStats($startDir)
    $search = FileFindFirstFile($startDir & "\*.*")
    If @error Then Return
    
    ;Search through all files and folders in directory
    While 1
        $next = FileFindNextFile($search)
        If @error Then Return
        
        ;If folder, recurse
        If FileGetAttrib($startDir & "\" & $next) = "D" Then
            recursiveExtensionStats($startDir & "\" & $next)
        Else    
            $tempSplit = StringSplit($next,".")
            $ext = StringLower($tempSplit[$tempSplit[0]])
            
            ;If the extension already exists in the dictionary, increment
            If $extensionsObj.Exists($ext) Then
                $extensionsObj.Item($ext) = $extensionsObj.Item($ext) + 1
            Else
                $extensionsObj.Add($ext, 1)
            EndIf
        EndIf
    WEnd
    FileClose($search)
EndFunc

EDIT: Added FileClose()

EDIT: $extensionsObj.Add($ext, 0) changed to $extensionsObj.Add($ext, 1)

Edited by weaponx
Link to comment
Share on other sites

Good solution -- I would use $next.Type to get the extension instead of using StringSplit

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Maybe i'm missing something Dale, $next is returned from FileFindNextFile()...which is just a string. I'm using 3.2.8.1, is this an object in the beta?

Nope, you didn't miss it -- I did. I was looking at your code and thinking what I would have written. Also, it is not .Type (which returns Text Document instead of TXT).

Your code was great -- I should have stayed out of it :P

Here the logic for recursively getting the tile extensions using the FileSystemObject. It does not include your scripting dictionary code...

Global $fso = ObjCreate("Scripting.FileSystemObject")
$oRootFolder = $fso.GetFolder("D:\Temp")
_ParseFiles($oRootFolder)

Func _ParseFiles($oFolder)
    $oFolders = $oFolder.SubFolders
    If $oFolders.Count Then
        For $oFolder in $oFolders
            _ParseFiles($oFolder)
        Next
    EndIf
    $oFiles = $oFolder.Files
    For $oFile in $oFiles
        ConsoleWrite($fso.GetExtensionName($oFile) & @CR)
    Next
    Return
EndFunc

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

I fixed the first version I posted (counts were off by 1).

And here is a graphical version:

#include <GUIConstants.au3>

GUICreate("File Extension Occurrences", 200)  ; will create a dialog box that when displayed is centered
GUISetState (@SW_SHOW)       ; will display an empty dialog box

$extensionsObj = ObjCreate("Scripting.Dictionary")
recursiveExtensionStats("D:\temp")

$listview = GUICtrlCreateListView ("Extension | Occurrences",10,10,180,380,$LVS_SORTASCENDING)
For $key In $extensionsObj.Keys()
    GUICtrlCreateListViewItem($key & "|" & $extensionsObj.Item($key),$listview)
Next

; Run the GUI until the dialog is closed
While 1
    $msg = GUIGetMsg()
    If $msg = $GUI_EVENT_CLOSE Then ExitLoop
Wend

Func recursiveExtensionStats($startDir)
    $search = FileFindFirstFile($startDir & "\*.*")
    If @error Then Return
    
    ;Search through all files and folders in directory
    While 1
        $next = FileFindNextFile($search)
        If @error Then Return
        
        ;If folder, recurse
        If FileGetAttrib($startDir & "\" & $next) = "D" Then
            recursiveExtensionStats($startDir & "\" & $next)
        Else    
            $tempSplit = StringSplit($next,".")
            $ext = StringLower($tempSplit[$tempSplit[0]])
            
            ;If the extension already exists in the dictionary, increment
            If $extensionsObj.Exists($ext) Then
                $extensionsObj.Item($ext) = $extensionsObj.Item($ext) + 1
            Else
                $extensionsObj.Add($ext, 1)
            EndIf
        EndIf
    WEnd
    FileClose($search)
EndFunc

NOTE: I tested this on a drive containing 200GB of mp3s and whatnot and it took about 45 seconds.

Edited by weaponx
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...