SlackerAl Posted August 22, 2019 Share Posted August 22, 2019 Hi All, Overview I have a network shared file system of reasonable size (25 TB) which is mounted as a mapped NTFS drive to the PCs of about 30 users. The storage is forever full due to a variety of poor practices beyond my control. One of the main problems is that the users have their data distributed over a wide and deep directory structure and they struggle to find their old data to archive. The ownership of the files within the directory structure is mixed. I wanted to write a tool which would help individual users find their data heavy directories, preferably without thrashing the file system to death. I've written a tree viewer that works well, assuming I do not want to restrict my summary to a specific owner name (I employ some filters / selective starting positions, to restrict the search range within the directory structure). Problem I need to code up something to replace DirGetSize to return size for a specific username (files are domain user owned e.g. EUROPE\slacker) which cascades through the sub-directories with the same user name requirement. I'm trying to avoid an ugly, unknown size looping function that checks each file etc... I was wondering if there are any better approaches / suggestions e.g. calls to an existing API? Thanks Al Problem solving step 1: Write a simple, self-contained, running, replicator of your problem. Link to comment Share on other sites More sharing options...
water Posted August 22, 2019 Share Posted August 22, 2019 Why not use something like TreeSize? My UDFs and Tutorials: Spoiler UDFs:Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsOutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - DownloadOutlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - WikiPowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - WikiTask Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs:Excel - Example Scripts - WikiWord - Wiki Tutorials:ADO - WikiWebDriver - Wiki Link to comment Share on other sites More sharing options...
SlackerAl Posted August 22, 2019 Author Share Posted August 22, 2019 Network drive scans require the commercial license for each user. It requires admin privs to install (not always available). It automatically starts scanning at the drive root (the level of load is controversial for multiple users on the large drive). Problem solving step 1: Write a simple, self-contained, running, replicator of your problem. Link to comment Share on other sites More sharing options...
Nine Posted August 22, 2019 Share Posted August 22, 2019 I would suggest this basic recursive algorithm : #include <Constants.au3> Opt("MustDeclareVars", 1) Global $oShellApplication = ObjCreate("Shell.Application") MsgBox ($MB_SYSTEMMODAL,"",GetOwnerSize("C:\Apps\AutoIt", "EUROPE\slacker")) Func GetOwnerSize($sFolder, $sOwner) ;ConsoleWrite ("Folder Name = " & $sFolder & @CRLF) Local $oShellFolder = $oShellApplication.NameSpace($sFolder) Local $oShellFolderItems = $oShellFolder.Items() $oShellFolderItems.Filter(0x40, "*") ;ConsoleWrite("File count = " & $oShellFolderItems.count & @CRLF) Local $nCount = 0 For $oShellFolderItem In $oShellFolderItems If $oShellFolder.GetDetailsOf($oShellFolderItem, 10) <> $sOwner Then ContinueLoop $nCount += $oShellFolder.GetDetailsOf($oShellFolderItem, 1) Next $oShellFolderItems.Filter(0x20, "*") ;ConsoleWrite("Folder count = " & $oShellFolderItems.count & @CRLF) For $oShellFolderItem In $oShellFolderItems $nCount += GetOwnerSize($sFolder & "\" & $oShellFolderItem.name, $sOwner) Next Return $nCount EndFunc ;==>GetOwnerSize “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
SlackerAl Posted August 22, 2019 Author Share Posted August 22, 2019 Thanks for that, I've now got something working. Problem solving step 1: Write a simple, self-contained, running, replicator of your problem. Link to comment Share on other sites More sharing options...
Moderators JLogan3o13 Posted August 22, 2019 Moderators Share Posted August 22, 2019 (edited) The issue is always going to be speed. You have to, in essence, grab every file/folder object and look at the ACLs to determine the owner and size, keeping a running total. You could use BrewManNH's excellent _FileGetProperty UDF (below), just change the _FileListToArray call to a _FileListToArrayRec. There is also an older function called _GetExtProperty that still works pretty well: _FileGetPropertyUDF _GetExtProperty (example below) $aProps = _GetExtProperty(FileOpenDialog("Choose File", @UserProfileDir, "ALL (*.*)"), -1) If IsArray($aProps) Then ConsoleWrite($aProps[1] & ", " & $aProps[10] & @CRLF) In either case, however, I think you're going to run into a speed issue if you're parsing a large number of files. Running the _GetExtProperty against a directory with ~45,000 files and pulling only files that match a specific user took more than 10 minutes. You may have to resort to Powershell; the same query took only 3 minutes. Edit: Dang page refresh. Glad you found a solution. Edited August 22, 2019 by JLogan3o13 "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum! Link to comment Share on other sites More sharing options...
SlackerAl Posted August 22, 2019 Author Share Posted August 22, 2019 Thanks JLogan3o13. I did indeed use _FileGetProperty. I took advantage that below a certain directory level users stop swapping around, so by forcing them to check a chunk of disk at a time performance was OK. Problem solving step 1: Write a simple, self-contained, running, replicator of your problem. Link to comment Share on other sites More sharing options...
Neutro Posted August 22, 2019 Share Posted August 22, 2019 This is typically the kind of problems that you get when a company uses a file server as a file sharing platform across all users without no clear plan of action or management. This results in files and folders beeing spread everywhere with no logic, no maintenance and horrible rights everywhere. To solve it you need to either enforce strict and specific rules on how to use the fileserver and correct the actual filesystem to go accordingly, which can represent a lot of work. The other way around is to setup properly a new file server from the ground up (either a traditional file server or something more sophisticated like nextcloud) then ask and warn the users that they have until x to move their files to the new server as explained by the new server rules. Given your situation, if possible the 2nd choice would probably be better. Otherwise, good luck Identify active network connections and change DNS server - Easily export Windows network settings Clean temporary files from Windows users profiles directories - List Active Directory Groups members Export content of an Outlook mailbox to a PST file - File patch manager - IRC chat connect example Thanks again for your help Water! Link to comment Share on other sites More sharing options...
SlackerAl Posted August 23, 2019 Author Share Posted August 23, 2019 Hi Neutro, Fair comments for many situations. Here there is a clear use plan, generally with quite a good structure - there is some trade-off in freedom of working methods (for various complex problems) versus completely rigid structure. There are two main causes of the problem - no quota system, as this is unwanted by those ultimately in-charge (there is an expectation of self-management, which works for 90% of the users). And a lack of supervision of live project spaces - because those running the projects want to spend their time on other, more productive, things. The file system is the transient data store for live HPC projects. The cluster is able to rapidly generate large volumes of data, so a cloud solution is not ideal. Best of all - I'm a user not an administrator 🙂 Now that my fellow users and I can find our occasional chunks of forgotten data, we are back to a working space. Problem solving step 1: Write a simple, self-contained, running, replicator of your problem. Link to comment Share on other sites More sharing options...
Neutro Posted August 23, 2019 Share Posted August 23, 2019 (edited) Hey, What you are dealing with right now is something that the people managing your IT system should have anticipated and dealt with even before it became a problem for you. Sorry to be a bit blunt but even if they are nice people to you, they're not doing they job right. It's OK to have a no quota system but the admins should have a server monitoring interface which should alert them when there is a space problem and they should be able to see immediately from where it is coming from without having to re-scan the whole server data. Also the volume of the data available has nothing to do with the software running it. This is linked to the hardware layer, not software "cloud solution" = data hosted remotely, which is technically what you already have right now. But if you had a nextcloud server to manage your data instead of a simple file server the file management would be more granular and you probably wouldn't have to deal with your data problems A cloud solution can be hosted only for LAN, it doesn't have to be available through the internet as well, even if it's more convenient but it requires a very fast symmetrical internet line which is not always possible to get. Edited August 23, 2019 by Neutro Identify active network connections and change DNS server - Easily export Windows network settings Clean temporary files from Windows users profiles directories - List Active Directory Groups members Export content of an Outlook mailbox to a PST file - File patch manager - IRC chat connect example Thanks again for your help Water! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now