Calculating MD5 for RawImageDigest

Started by HBox, July 14, 2010, 05:56:16 AM

Previous topic - Next topic

HBox

Hi, I am wondering whether there is a way to calculate and write a MD5 checksum into the
field RawImageDigest.

The MD5 should correspond to the pure raw image data of a Jpeg (without any EXIF)
Aside from that is there a way to verify that the checksum still matches the actual file.

This would allow to get a file from a camera (calculate the MD5) and do keywords and other exif stuff
but being still able to verify that the image itself did not change.

Is something like that existing in in ExifTool?


Phil Harvey

Sorry, ExifTool doesn't have a feature which will do this.  In fact, ExifTool doesn't have any features which process the image data in any way.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

HBox

Okay, than to the question(s) how I could do that myself:

1. Do you think RawImageDigest would be the right spot for such MD5 hash (just the raw image data)
2. How could I extract the raw image data to the console? (Something like exiftool -exif:all= image.jpg)
But I do not want the file on the disk to be altered. I would like to get the result to the std output. So my own application
could directly read it into its memory.

Thanks

HBox

Phil Harvey

Quote from: HBox on July 14, 2010, 07:43:56 AM
1. Do you think RawImageDigest would be the right spot for such MD5 hash (just the raw image data)

This is a DNG tag used to store the digest of the original RAW image file, so what you intend is not the standard usage for this tag, but it makes sense to "borrow" the tag for this purpose.

Quote
2. How could I extract the raw image data to the console? (Something like exiftool -exif:all= image.jpg)
But I do not want the file on the disk to be altered. I would like to get the result to the std output. So my own application
could directly read it into its memory.

exiftool FILE -all= -o -

Good point.  So you can do what you want with this command line in Linux or OS X:

exiftool FILE -rawimagedigest=`exiftool FILE -all= -o - | md5`

Cool.  I didn't think of this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

HBox


xipmix

I had a related question - is there anything in exiftool that burps out the main raw image as a binary blob, in the way that you can do with Jpeg preview images etc? I've scoured the manuals and come up with nothing. I suppose this steps outside exiftool's target area a bit, but not too terribly far.

In my case I am interested in processing (Nikon) NEF RAW images. I can easily do calculations like MD5 externally and add a tag to the original file (with exiftool) afterward.

Phil Harvey

There is nothing the writes the raw image, but you may be able to get an MD5 that is relatively independent of metadata changes with something like this:

exiftool a.nef -m -o - -all= | md5

This works great for JPEG images, but there will be a bit of metadata left over in IFD0 with TIFF images.  Be very careful not to this to an actual NEF image (without "-o -", this will effectively destroy the image.  But of course exiftool would back up the original).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

imthenachoman

Quote from: Phil Harvey on July 14, 2010, 08:35:55 AM

exiftool FILE -all= -o -


I just wanted to confirm, this command would dump the image data without exif or anything else, right?

I am using this to compare two images -- the EXIF data in them may be different because I updated the EXIF data of some.

exiftool FILE -all= -o - | md5sum

Would this be the best way or is there a better way to compare two images to see if they are the same, excluding the EXIF data?


Phil Harvey

If you wrote any metadata to IFD0 you would have to remove that separately.  You can use the CommonIFD0 tag to catch some of these:

exiftool FILE -all= -commonifd0= - | md5sum

But you should take a look at the remaining tags to see if there is anything left over that you may have written.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

imthenachoman

Quote from: Phil Harvey on February 11, 2020, 07:05:42 AM
If you wrote any metadata to IFD0 you would have to remove that separately.  You can use the CommonIFD0 tag to catch some of these:

exiftool FILE -all= -commonifd0= - | md5sum

But you should take a look at the remaining tags to see if there is anything left over that you may have written.

- Phil

Got it. I was only writing to comments. This will help me de-duplicate photos between my wife's and my HD from before marriage with a lot of common photos. Thank you so much!

Keith

I just stumbled across this post, and I must say it amuses me how seamlessly the 10-yr gap between the 7th-8th posts glides by! It speaks well to the persistence of ExifTool as well as your support, Phil.

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

sidneyd

A new MD5 image checksum feature has been introduced with 12.58, thanks to great work by Phil.

As an example of generating an image MD5 to a CSV file for all NEF files in the current folder
exiftool -p "$filename, $imagedatamd5" -ext nef . > checksum.csv

simonmcnair

So to write the checksum in to the file the command would be

example:

exiftool -P -overwrite_original "-RawImageDigest<$imagedatamd5" "C:\Users\Simon\Desktop\xh0o9gxe8soa1.jpg"

or recursively

exiftool -P -r -overwrite_original "-RawImageDigest<$imagedatamd5" "./processthisfolder"


My question is, please, how do I get it to only update the RawImageDigest if it needs it ?

simonmcnair

I wonder too (sorry for resurrecting this thread, but it seems relevant) if there is a possibility fof a feature request to add SHA256 or a Blake algorithm as md5 is considered pretty poor today