John117 Posted May 18, 2009 Posted May 18, 2009 (edited) Hey, I have about 5k pictures in one folder. about 2/3 of them are duplicates with different names. Picassa is now exhausted. I need way to identify duplicates and then move them. Here is what am thinking match based on dimensions, dpi, and size. They have different names, but may be the same picture. I was also considering a pixel match based on 4 points. (like a square) in from the corners. So, maybe 10% in then 20% in and so forth. -store these then compare to the next. . . . Any better ideas? Will post in examples when complete with credit for anyone that helps! Could make for a nice piece of software! Edit: eye can spell! Edited May 18, 2009 by John117
AdmiralAlkex Posted May 18, 2009 Posted May 18, 2009 What format is the pictures in? Jpg, Bmp or.... ? .Some of my scripts: ShiftER, Codec-Control, Resolution switcher for HTC ShiftSome of my UDFs: SDL UDF, SetDefaultDllDirectories, Converting GDI+ Bitmap/Image to SDL Surface
John117 Posted May 18, 2009 Author Posted May 18, 2009 AdmiralAlkex said: What format is the pictures in? Jpg, Bmp or.... ?I would say 95% are jpg
John117 Posted May 18, 2009 Author Posted May 18, 2009 Thanubis said: How about using a MD5 Hash or CRC? Identical files will have the same MD5/CRC value.http://www.autoitscript.com/forum/index.php?showtopic=76976 yeah, I was thinking about that using crc32 - I downloaded duplicate finder. but it found 18 dupes. I can spot that many in the first 30-40 files alone. Maybe limitation of the trial? In either case, looking up post now . . .
LurchMan Posted May 18, 2009 Posted May 18, 2009 if i remember right, if you change the filename then the MD5 / SHA1 Hash will be different...been awhile since i did this in college.. Dating a girl is just like writing software. Everything's going to work just fine in the testing lab (dating), but as soon as you have contract with a customer (marriage), then your program (life) is going to be facing new situations you never expected. You'll be forced to patch the code (admit you're wrong) and then the code (wife) will just end up all bloated and unmaintainable in the end.
John117 Posted May 18, 2009 Author Posted May 18, 2009 LurchMan said: if i remember right, if you change the filename then the MD5 / SHA1 Hash will be different...been awhile since i did this in college..odly enough, I found that some of the files, have different names and are 1-5kb different in size.not sure how, much copy paste I guess . . .
nitekram Posted May 18, 2009 Posted May 18, 2009 John117 said: odly enough, I found that some of the files, have different names and are 1-5kb different in size.not sure how, much copy paste I guess . . .does the time stamp do anything for you? 2¢ Reveal hidden contents All by me:"Sometimes you have to go back to where you started, to get to where you want to go." "Everybody catches up with everyone, eventually" "As you teach others, you are really teaching yourself." From my dad "Do not worry about yesterday, as the only thing that you can control is tomorrow." Reveal hidden contents WIKI | Tabs; | Arrays; | Strings | Wiki Arrays | How to ask a Question | Forum Search | FAQ | Tutorials | Original FAQ | ONLINE HELP | UDF's Wiki | AutoIt PDF AutoIt Snippets | Multple Guis | Interrupting a running function | Another Send StringRegExp | StringRegExp Help | RegEXTester | REG TUTOR | Reg TUTOT 2 AutoItSetOption | Macros | AutoIt Snippets | Wrapper | Autoit Docs SCITE | SciteJump | BB | MyTopics | Programming | UDFs | AutoIt 123 | UDFs Form | UDF Learning to script | Tutorials | Documentation | IE.AU3 | Games? | FreeSoftware | Path_Online | Core Language Programming Tips Excel Changes ControlHover.UDF GDI_Plus Draw_On_Screen GDI Basics GDI_More_Basics GDI Rotate GDI Graph GDI CheckExistingItems GDI Trajectory Replace $ghGDIPDll with $__g_hGDIPDll DLL 101? Array via Object GDI Swimlane GDI Plus French 101 Site GDI Examples UEZ GDI Basic Clock GDI Detection Ternary operator
KaFu Posted May 18, 2009 Posted May 18, 2009 (edited) For truly identical files give my program "SMF" a try ... If the files are not identical but similar the only program I know to be capable of identifying those is d'peg (http://www.gotdupes.com/index.cfm?page=3495&pagename=d%60peg!). I guess you could perform something similar by calculating Color Checksums with the ImageMagick Suite. Edited May 18, 2009 by KaFu OS: Win10-22H2 - 64bit - German, AutoIt Version: 3.3.16.1, AutoIt Editor: SciTE, Website: https://funk.eu AMT - Auto-Movie-Thumbnailer (2024-Oct-13) BIC - Batch-Image-Cropper (2023-Apr-01) COP - Color Picker (2009-May-21) DCS - Dynamic Cursor Selector (2024-Oct-13) HMW - Hide my Windows (2024-Oct-19) HRC - HotKey Resolution Changer (2012-May-16) ICU - Icon Configuration Utility (2018-Sep-16) SMF - Search my Files (2024-Oct-20) - THE file info and duplicates search tool SSD - Set Sound Device (2017-Sep-16)
LurchMan Posted May 18, 2009 Posted May 18, 2009 Thanubis said: No, the file name has no effect on the hashes. As long as the the file type and file content is the same, they will match regardless of the name.couldnt remember for sure...i went to college for computer forensics but ended up doing programming once i got a job lol.... Dating a girl is just like writing software. Everything's going to work just fine in the testing lab (dating), but as soon as you have contract with a customer (marriage), then your program (life) is going to be facing new situations you never expected. You'll be forced to patch the code (admit you're wrong) and then the code (wife) will just end up all bloated and unmaintainable in the end.
John117 Posted May 18, 2009 Author Posted May 18, 2009 nitekram said: does the time stamp do anything for you?not sure, which timestamp? The created date is different, but the modified date is the same. Sometimes it was modified before it was created :-)
John117 Posted May 18, 2009 Author Posted May 18, 2009 Am currently running Dupdetector - it may do the trick! :-) cleaned about a thousand so far . . . we will see.
junkew Posted May 19, 2009 Posted May 19, 2009 With logic inhttp://www.autoitscript.com/forum/index.php?showtopic=66545you could write a picture comparison in autoit FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets
AutoBert Posted March 23, 2016 Posted March 23, 2016 (edited) Hash (_Crypt_HashFile) all files and compare the hash. Edited March 23, 2016 by AutoBert
InunoTaishou Posted March 23, 2016 Posted March 23, 2016 Hopefully OP comes back to look at his 7 year old topic to see the solution.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now