Feature Request: image checksum

Started by sgbotsford, December 16, 2016, 11:44:10 PM

Previous topic - Next topic

sgbotsford

I've been thinking about the duplicate files problem since of late Aperture imports all the images off my phone regardless of whether I have 'do not import dups' checked or not.

Turns out that THAT is due to my renameing the file on import.  Libraries get moved, split, merged.  A way to find at least the exact dups would be handy.

Alas, more programs add keywords to the file itself.  This of course changes the sum, so you cannot count on the sum being constant.

But wait:  The developers of this program know where the metadata is kept.   So two methods occur to me:

A: Do a checksum only on the main image, ignoring the thumbnails, and embedded metadata.
B:  Strip the changeable parts out, do the md5 sum, and put them back in.

Now if the checksum is embedded in the file, you have an easy field to pull and check for duplicates.  Slurp the checksum, sort, and any ones that match are perfect dups.

This does NOT find the near dups.

It does however in principle create a way to track derivative files:  Subborn another field for "Original file checksum"  By default it's set to all zeros.  But editing software could copy the image checksum to the Original file checksum when exporting a file to an editor that will change it.  Then when it comes back, the new checksum can be calculated.

This won't help with programs like Aperture that can have different versions with different sets of adjustments based on the same file.  That however is Aperture's problem.

StarGeek

A couple previous threads on this
Link 1
Link 2

And this one creates a user-defined tag that will call MD5 to get the checksum, though it doesn't strip the metadata.  It might be possible to combine the two, put the `exiftool FILE -all= -o - | md5` from link 2 into the user-defined tag.  It would be very slow, though, as it would have to re-run exiftool for every file as well as calling md5. 
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype