Jump to content
Sign in to follow this  
james3mg

Detecting file changes in a directory

Recommended Posts

james3mg

Hello all...

I've been trying to come up with a way to have my script check for file changes to a directory without the time-consuming task of actually READING the entire directory (it's a network share). Please bear with me as I prove I've done some homework :)

I've tried the methods documented in these two threads: (ptrex's method) (arker's method), but both of these require the change to happen while the script is running to grab it.

I considered using some kind of hash string to determine if any files had changed, but that would still require reading all the files.

I'd hoped that robocopy might be able to do this, comparing the files against a log of some kind...but it doesn't seem like it's capable of this.

Finally, this morning, I had an inspiration...but I don't know how to get into it. I'm hoping someone can point me in the right direction. I remembered that DFS reads the NTFS Change Journal to determine if any files have been changed, even if the server was powered off when the change happened. That way, the servers remain in sync.

So I started researching a bit on technet and found that FSUTIL gives basic access to the journal. Running "fsutil fsinfo statistics c:" gives me lots of fun information. Most notably, if I run it again after creating/changing/deleting a file on my desktop, I find that the MftWrites number has increased. So it seemed at first like I was well on my way. I tried running it on a share, no luck (error: FSUTIL utility requires a local NTFS volume). OK, maybe it'll be something that runs on the server instead. But I don't want to check the whole drive, just the shared directory. I tried running it on a directory instead of the whole C:...no surprise that doesn't work, since the log is drive-wide. Some of the fsutil functions mention that they include subst-ed directories, so I tried subst-ing the share to a different drive letter. For some reason, it saw the subst-ed drive as not a local NTFS volume.

A bit more research led me to this article. It looks like what I want to do could be done by examining the log directly and parsing any "newer" data than was last examined by my script for files meeting the criteria I want (located within a specific directory). However, I'm quite over my head at this point. He's quickly making pointers to files in C, and I just don't know how this translates, nor am I interested in screwing up my system with experimenting with this...

Does anyone know how this might translate into AutoIt-type code, or could you at least give me pointers (no pun intended) on how to go about translating it, or it's possible?

Thanks

Edited by james3mg

"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
DavidKarner

I believe that the modification date/time stamp of a directory (at least for windows filesystems) matches the date/time stamp of the last change to the file contents of a directory. You might simply be able to track in a log file that last modification time of the directory and check it's current modification time against the log to see if it has changed. This can help identify if files have been added or removed, but if an existing file is modified it will not detect that change.

However, I am not sure if that is sufficient for your needs.

Share this post


Link to post
Share on other sites
james3mg

I believe that the modification date/time stamp of a directory (at least for windows filesystems) matches the date/time stamp of the last change to the file contents of a directory. You might simply be able to track in a log file that last modification time of the directory and check it's current modification time against the log to see if it has changed. This can help identify if files have been added or removed, but if an existing file is modified it will not detect that change.

However, I am not sure if that is sufficient for your needs.

thanks for the reply :)

Actually, good thought. Unfortunately, it doesn't work for sub-directories, which I need to check (that is, c:\folder1\folder2\file.txt is created, the timestamp changes on folder 2, but not folder 1).

But I hadn't thought of this, so thanks! :(

Anything else?


"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
TheCuz

Would you be able to run a program on the computer that you want to keep track of changes locally?

If so, I have done some work like this. I made a program (in C#) that monitors changes in a couple of different folders, and logs those changes in a SQLite database. It is only looking for files that have been created or deleted, but could be extended to include file modifications as well.

Granted this will only work on a M$ machine with .NET 2, but it was pretty easy to create the code to monitor files in a folder. If this sounds like something you would be interested in, I can get you some more info on this.


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
james3mg

Would you be able to run a program on the computer that you want to keep track of changes locally?

If so, I have done some work like this. I made a program (in C#) that monitors changes in a couple of different folders, and logs those changes in a SQLite database. It is only looking for files that have been created or deleted, but could be extended to include file modifications as well.

Granted this will only work on a M$ machine with .NET 2, but it was pretty easy to create the code to monitor files in a folder. If this sounds like something you would be interested in, I can get you some more info on this.

Yes, I can run programs on the server (2003 sp1...the machine hosting the files locally) if necessary. I don't much care about file modifications, so it sounds like your work would be a perfect fit...I was even already going to store the data in a SQL database! It works in any number of subdirectories and even if the program isn't running when the file is added/deleted, as long as it's run at least once before?

Thanks for ANY help you can give about this! :)

Edited by james3mg

"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
TheCuz

What is it that you want to accomplish, just so that I can get an idea on what it is that you want to do.

Just a fair warning, I only know how to use an SQLite database with C#, would that be a limitation of what you want to do? There are some good UDF's created in AutoIt which work really well with SQLite, and have used it quite a bit in some programs that I have made.

In the program that I mentioned in a previous post, the C# program is the one that does the monitoring of the folders, and also writes the needed information into the SQLite database. I then use a program made in AutoIt that uses the database to get the file information and use it for some operational functions. It works good for my scenario because I have to work with 16,000 plus files, and trying to sift through the files over a network connection was taking way to long and sometimes just trying to open the folder would crash the computer (which is really busy creating ASF files).

Let me know, and I'll see what I can help with.


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
james3mg

What is it that you want to accomplish, just so that I can get an idea on what it is that you want to do.

Just a fair warning, I only know how to use an SQLite database with C#, would that be a limitation of what you want to do? There are some good UDF's created in AutoIt which work really well with SQLite, and have used it quite a bit in some programs that I have made.

In the program that I mentioned in a previous post, the C# program is the one that does the monitoring of the folders, and also writes the needed information into the SQLite database. I then use a program made in AutoIt that uses the database to get the file information and use it for some operational functions. It works good for my scenario because I have to work with 16,000 plus files, and trying to sift through the files over a network connection was taking way to long and sometimes just trying to open the folder would crash the computer (which is really busy creating ASF files).

Let me know, and I'll see what I can help with.

Here's my situation:

Our office has two servers (2003 sp1) in different cities, connected via a vpn and in the same domain. I have a synchronizing Domain DFS root on them, which all of our domain users map as their home folder. I've written an extensive database program (in AutoIt) that's hosted within the DFS share, but is run on each client computer (every instance of the program shares access to the same network db file). It allows the users to find lots of information about our jobs, past and present. One feature that I'm trying to include in this program is an AutoCad Block finder for our block library (which is thousands of files stored in two folders in the DFS root...each of those folders has more than 20 subfolders, and some of those also have subfolders).

It's easy enough to get a list of all the .dwg files in those two folders and store the list in a SQLite table, and also easy to provide the functionality to search/view/edit a description associated with each .dwg file. The issue is that if a file is added or deleted from our block library directories, I need the database to update itself to reflect that change (if a file is added, add a new entry for that file with a blank description...if a file is deleted, delete the entry for that file including the description). Ideally, it would be intelligent enough that if a file is just renamed, the old description would follow to the new filename. If this is just something that happens once an hour via a scheduled task on the server, that would be fine.

So I guess that C# would be fine, but the issue is that I don't know C#, so if anything in our infrastructure changed (i.e. they decided we need a third block library directory that needs to be included in the same way in the database, or I have to upgrade the physical hdds that the shares are on, so the path on the server changes drive letters), I'd be up the creek without a paddle. It's for that reason I was hoping for an AutoIt solution, so that I could change it as necessary in the future. I guess if it read the directories it needed to check from a text or ini file, that would work for anything I can forsee, but I'm still afraid a change that I can't forsee would come down and require a full rewrite of the way this function worked, and again, I'd be up the creek.

Anyway, thanks for any help you can give! :)

Edited by james3mg

"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
TheCuz

I think I have a pretty good idea at what you are looking to do.

The .NET FileSystemWatcher can recurse through directory's, so that wouldn't be an issue.

As for the folders that you want to watch, are they located in the same directory, or different directory's.

Example:

same parent folder -> x:\MainFolder\watchfolder1; x:\MainFolder\watchfolder2

different parents -> x:\Parent1\watchfolder1; x:\ParentFolder2\watchfolder2

If the folders that you want to watch have the same parent folder, then it would be pretty easy to implement. If you needed to watch an additional folder, you could put it in the same parent folder as the other two (or three) and it wouldn't require a change within the program as you specify which parent directory you want to monitor, and the FileSystemWatcher pretty much takes care of the rest.

With the program I have running, I specified which parent folder to watch in an INI file, so that if it needed changing for some reason, all it would require is a change to the INI file and a restart of the program and it was good to go.

The only real tricky thing that I could see causing an issue would be if the watch folders are in different parents, and you wanted to be able to dynamically change how many watch folders there are in different locations. I haven't messed with the FileSystemWatcher enough to know if I can dynamically create watchers based on how many entries are in a INI file.

As for C#, it isn't that bad to learn. If you have experience with Java or PHP, the syntax is pretty similar(minus the start and end code tags for PHP). I picked up on it pretty quick, and I just got done with a Java class a couple of months ago.

If you are worried about using C#, I can understand, if you want to pursue some other avenue, then by all means use what you are comfortable with.

But, if you do want to pursue this option further, I have limited time to work on this as I am in school and working full time, and wouldn't be able to pump this out very quickly. If this isn't something that you need in short order, I would be willing to help.

Just let me know :)


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
weaponx

Sounds like a case of over-complication. I would use one of the many recursive _FileListToArray functions to update a database like every 5 minutes or as often as possible (performance allowing) with all of the available files.

Share this post


Link to post
Share on other sites
TheCuz

It's not really over-complicated, just more efficient at doing the job.

Not knowing how many files are in the folders that james3mg wants to monitor, that would mean that each file would have to be checked in the database to see if it has been changed or not, and if so, then modify/update/delete the database record. Where as using what I had proposed, only files that have been deleted, changed, or created would trigger a transaction, using less CPU time and less disk access.

But like I had told james3mg, it is what ever he is comfortable with using.


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
james3mg

It's not really over-complicated, just more efficient at doing the job.

Not knowing how many files are in the folders that james3mg wants to monitor, that would mean that each file would have to be checked in the database to see if it has been changed or not, and if so, then modify/update/delete the database record. Where as using what I had proposed, only files that have been deleted, changed, or created would trigger a transaction, using less CPU time and less disk access.

But like I had told james3mg, it is what ever he is comfortable with using.

Yeah, I like this method better than the _FileList functions for that reason.

@TheCuz, it's not really time-critical; I've been looking for a solution for months. So whatever you could work up would help. I'd love it if I could learn something new out of this, too! :)

Thanks for whatever you can do.

Edit: oops, I forgot to answer your question. They're in different parent directories. That is, they're each a DFS share directly under the root. Sorry if that makes it more complicated.

Edited by james3mg

"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
TheCuz

I'll have to do some testing in creating a dynamic directory watcher based on a ini file, so that if folders change or you need to add a folder, you can modify an ini file and have the changes made.

What were you looking at storing in the database? In the project that I did, I had the file name (with extension), modified date, and full path of the file.


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
TheCuz

Good news, after some testing, I was able to get the program to dynamically create directory watchers based on the number of entries in a ini file.

I am working on handling the different events to make sure they respond correctly and get it setup to interact with an SQLite database.


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
james3mg

Good news, after some testing, I was able to get the program to dynamically create directory watchers based on the number of entries in a ini file.

I am working on handling the different events to make sure they respond correctly and get it setup to interact with an SQLite database.

That awesome...thanks so much!

I'm just storing the file path and a "notes" field- I only need it to add new files to the list, delete rows from the list when a file is deleted and (ideally) update just the file path when a file is renamed.


"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
TheCuz

Is the "notes" field derived from information about the file, or is that something that is manually populated?

Also, do you want all of the information stored in one database table, or have a table for each folder that is being watched?

Just to make sure I understand what needs to be stored in the database, you need fields for the full file path(not a relative path), and a notes field. Correct?


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
james3mg

Is the "notes" field derived from information about the file, or is that something that is manually populated?

Also, do you want all of the information stored in one database table, or have a table for each folder that is being watched?

Just to make sure I understand what needs to be stored in the database, you need fields for the full file path(not a relative path), and a notes field. Correct?

The contents of the notes column is user-provided in the program I've already written- not related to the file information. And actually, I forgot there's third column too; it's named "preferred" and defaults to 0. Again, this is user-switched, not based on anything you could find by scanning the file.

All the files should be aggregated with the FULL path into the 'path' column in a single table (called Blocks).

Thanks :)


"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
TheCuz

Alright, I'll get the database functionality built in once I know that the directory's watcher part of the program works good. What I'll try to do is to get a test build made and send it to you to run on the server to make sure that part of it is working. After that has been verified as working, implementing the database part will be next.


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
TheCuz

One question that I have is, what did you want to log in the database when a file was changed. I just realized that there isn't anything in the database which would show whether or not a file was changed.

Are you only looking to have triggers based on files that have been created and deleted? What did you have in mind dealing with files that have been changed?


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites
james3mg

One question that I have is, what did you want to log in the database when a file was changed. I just realized that there isn't anything in the database which would show whether or not a file was changed.

Are you only looking to have triggers based on files that have been created and deleted? What did you have in mind dealing with files that have been changed?

I just want the table to be a constant, accurate list of the *.dwg files within those two parent directories- I don't need notification when something changes, simply the change to be made (row added/deleted/updated) in the table. If the file has only been modified, no database change is necessary.

Does that answer your question? Thanks for your work on this! :)


"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post


Link to post
Share on other sites
TheCuz

Is there going to be an SQLite database already established, or is it going to be created for the directory watcher? I am wondering if I should have the directory watcher create the database on first run, or if it can't find the database to just create a new one.


[font="Verdana"]People who say it cannot be done should not interrupt those who are doing it. - George Benard Shaw[/font]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×