Sign in to follow this  
Followers 0
John117

Looking for a way to find duplicate pictures.

14 posts in this topic

#1 ·  Posted (edited)

Hey, I have about 5k pictures in one folder. about 2/3 of them are duplicates with different names. Picassa is now exhausted.

I need way to identify duplicates and then move them.

Here is what am thinking

match based on dimensions, dpi, and size.

They have different names, but may be the same picture.

I was also considering a pixel match based on 4 points. (like a square) in from the corners.

So, maybe 10% in then 20% in and so forth. -store these then compare to the next. . . .

Any better ideas?

Will post in examples when complete with credit for anyone that helps! Could make for a nice piece of software!

Edit: eye can spell!

Edited by John117

Share this post


Link to post
Share on other sites



What format is the pictures in? Jpg, Bmp or.... ?

I would say 95% are jpg

Share this post


Link to post
Share on other sites

How about using a MD5 Hash or CRC? Identical files will have the same MD5/CRC value.

http://www.autoitscript.com/forum/index.php?showtopic=76976

yeah, I was thinking about that using crc32 - I downloaded duplicate finder. but it found 18 dupes. I can spot that many in the first 30-40 files alone. Maybe limitation of the trial?

In either case, looking up post now . . .

Share this post


Link to post
Share on other sites

if i remember right, if you change the filename then the MD5 / SHA1 Hash will be different...been awhile since i did this in college..


Dating a girl is just like writing software. Everything's going to work just fine in the testing lab (dating), but as soon as you have contract with a customer (marriage), then your program (life) is going to be facing new situations you never expected. You'll be forced to patch the code (admit you're wrong) and then the code (wife) will just end up all bloated and unmaintainable in the end.

Share this post


Link to post
Share on other sites

if i remember right, if you change the filename then the MD5 / SHA1 Hash will be different...been awhile since i did this in college..

odly enough, I found that some of the files, have different names and are 1-5kb different in size.

not sure how, much copy paste I guess . . .

Share this post


Link to post
Share on other sites

odly enough, I found that some of the files, have different names and are 1-5kb different in size.

not sure how, much copy paste I guess . . .

does the time stamp do anything for you?

All by me:

"Sometimes you have to go back to where you started, to get to where you want to go." 

"Everybody catches up with everyone, eventually" 

"As you teach others, you are really teaching yourself."

From my dad

"Do not worry about yesterday, as the only thing that you can control is tomorrow."

 

WindowsError.gif

WIKI | Tabs; | Arrays; | Strings | Wiki Arrays | How to ask a Question | Forum Search | FAQ | Tutorials | Original FAQ | ONLINE HELP | UDF's Wiki | AutoIt PDF

AutoIt Snippets | Multple Guis | Interrupting a running function | Another Send

StringRegExp | StringRegExp Help | RegEXTester | REG TUTOR | Reg TUTOT 2

AutoItSetOption | Macros | AutoIt Snippets | Wrapper | Autoit  Docs

SCITE | SciteJump | BB | MyTopics | Programming | UDFs | AutoIt 123 | UDFs Form | UDF

Learning to script | Tutorials | Documentation | IE.AU3 | Games? | FreeSoftware | Path_Online | Core Language

Programming Tips

Excel Changes

ControlHover.UDF

GDI_Plus

Draw_On_Screen

GDI Basics

GDI_More_Basics

GDI Rotate

GDI Graph

GDI  CheckExistingItems

GDI Trajectory

Replace $ghGDIPDll with $__g_hGDIPDll

DLL 101?

Array via Object

GDI Swimlane

GDI Plus French 101 Site

GDI Examples UEZ

GDI Basic Clock

GDI Detection

Ternary operator

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

For truly identical files give my program "SMF" a try :)...

If the files are not identical but similar the only program I know to be capable of identifying those is d'peg (http://www.gotdupes.com/index.cfm?page=3495&pagename=d%60peg!). I guess you could perform something similar by calculating Color Checksums with the ImageMagick Suite.

Edited by KaFu

Share this post


Link to post
Share on other sites

No, the file name has no effect on the hashes. As long as the the file type and file content is the same, they will match regardless of the name.

couldnt remember for sure...i went to college for computer forensics but ended up doing programming once i got a job lol....


Dating a girl is just like writing software. Everything's going to work just fine in the testing lab (dating), but as soon as you have contract with a customer (marriage), then your program (life) is going to be facing new situations you never expected. You'll be forced to patch the code (admit you're wrong) and then the code (wife) will just end up all bloated and unmaintainable in the end.

Share this post


Link to post
Share on other sites

does the time stamp do anything for you?

not sure, which timestamp? The created date is different, but the modified date is the same. Sometimes it was modified before it was created :-)

Share this post


Link to post
Share on other sites

Am currently running Dupdetector - it may do the trick! :-) cleaned about a thousand so far . . . we will see.

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

Hash (_Crypt_HashFile) all files and compare the hash.

Edited by AutoBert

Share this post


Link to post
Share on other sites

Hopefully OP comes back to look at his 7 year old topic to see the solution.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0