Unique images: strip all varieties of metadata

Started by Jeff, September 27, 2016, 03:25:41 PM

Previous topic - Next topic

Jeff

I'd like to search my old hard drives for the relatively few photos that are not on my current backup drive.  Thus I need to find the unique images.

I'm only interested in uniqueness of the image itself and not uniqueness due to differences in, say, the values of exif tags, the presence/absence of a given exif tag itself, embedded thumbnails, etc ...

Formats include jpg, png, tif, etc..., as well as various raw formats (different camera models and manufacturers).

Even though i don't expect to find any corruption/bit-rot between different copies of otherwise identical images, I'd like to detect that, as well as differences due to resizing and color changes.

My plan is to assess uniqueness based on md5sums calculated after stripping any and all metadata.  This will be extremely computational intensive, so I'd like to get it right the first time.

How can I strip the metadata?

Will running

exiftool -all= <filename>

strip all varieties of metadata? including embedded thumbnails and non-exif metadata?

Phil Harvey

This command will strip all metadata from JPEG images. Here is one thread where someone else was doing the same thing.  Here is another.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Jeff

Great. Is there now a simple way to strip the IDF0 tags as mentioned here in 2014 by you and Dave

Jeff

Just found pertinent information in the FAQ:

QuoteWriter Limitations
    ExifTool is not guaranteed to remove metadata completely from a file when attempting to delete all metadata. For JPEG images, all APP segments (except Adobe APP14, which is not removed by default) and trailers are removed which effectively removes all metadata, but for other formats the results are less complete:
        JPEG - APP segments (except Adobe APP14) and trailers are removed.
        TIFF - XMP, IPTC and the ExifIFD are removed, but some EXIF may remain in IFD0.
        PNG - Only iTXt, tEXt and zTXt chunks (including XMP) are removed.
        PDF - The original metadata is never actually removed.
        PS - Only some PostScript and XMP may be deleted.
        MOV/MP4 - Only XMP is deleted.
        RAW formats - It is not recommended to remove all metadata from RAW images because this will likely remove some proprietary information that is necessary for proper rendering of the image.

and, referring to -exif:all=

QuoteWith a JPEG image, this command removes IFD0 (the main Image File Directory) as well as any subdirectories, thus removing all EXIF information. But with the TIFF format, ...

Phil Harvey

Quote from: Jeff on September 27, 2016, 04:11:55 PM
Great. Is there now a simple way to strip the IDF0 tags as mentioned here in 2014 by you and Dave

There is a CommonIFD Shortcut tag you can use for this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).