In search of some type of unique id

Started by pedzsan, September 02, 2022, 02:50:15 PM

Previous topic - Next topic

pedzsan

Over the years, I would either save the hard drive from my laptop or desktop when I sold it or make a backup of it.  Now that I'm retired, I've put the contents of all of these 100+ old hard drives on a NAS and I'm going through it saving important stuff to the cloud and deleting duplicate stuff or stuff I don't want.  My major concern is all my photographs.

The problem is I was not consistent in the past.  I would sometimes rename the images and sometimes not.  And I would often format a batch into jpg, dng, or tif -- and of course, this may be after a crop.  Generally I don't want to keep these.  My logic is I can crop them again using better tech such as Lightroom's virtual copies or snapshots. 

But, I've yet to figure out how to algorithmically find that a.jpg is a crop or b.tif or b.cr2 or ...  Back in 2003, the raw images from Canon were TIF files.  So... basically I'm trying to figure out if a.jpg is a crop of a raw image in my catalog.

I thought I found a solution with DateTimeOriginal that appears to come in various forms but my jpg files (which I'm sure came from Photoshop) did not keep the DateTimeOriginal of the original raw file.  It appears to be roughly the timestamp of when it was created.  I'm still looking at various examples trying to figure out the pattern.

I thought, long ago, I read that Canon put a unique ID inside each image.  And... I thought also that at some point, the raw image had a cryptographic signature so it could be proven to be the original raw image.  But I can't find anything like that in the exif data so I'm coming here for help.

I'm doing:

exiftool -G -a -s2 /path/to/image
to look at the exif data.  Is there a better approach?  Is there a known way to do what I'm trying to do?

StarGeek

Quote from: pedzsan on September 02, 2022, 02:50:15 PMI thought I found a solution with DateTimeOriginal that appears to come in various forms but my jpg files (which I'm sure came from Photoshop) did not keep the DateTimeOriginal of the original raw file.  It appears to be roughly the timestamp of when it was created.  I'm still looking at various examples trying to figure out the pattern.

That is extremely odd, especially if it came from an Adobe program.  You might take a look at all the time related tags in the file with this command to be sure
exiftool -time:all -G1 -a -s /path/to/files/

QuoteI thought, long ago, I read that Canon put a unique ID inside each image.

I've never heard that, nor have I seen it in the Canon images I have.  Usually, there would be a serial number embedded in the file's Canon MakerNotes which is unique to each camera (but not image) and there's usually some sort of file count or shutter count, but those can be notoriously inaccurate.

QuoteI thought also that at some point, the raw image had a cryptographic signature so it could be proven to be the original raw image.

I've never heard of that either.

You might look over the various Canon MakerNotes pages: Canon, CanonCustom, CanonVRD (I think this only appears if you used Canon Digital Photo Professional), and CanonRaw.

There would also be the problem of any program you might have used to edit the files.  While programs are better at preserving data these days, they weren't always.  Google's Picasa, for example, was very popular, but it would often drop any MakerNotes if you used it to write any data into the file.

In theory, if the file was edited by Lightroom (and Photoshop, I assume), it would have put a unique identifier for that image in the the XMP data, specifically part of the XMP-xmpMM group.  This isn't an area I know much about but I believe that this data is passed on to files take from the original.  You'll see this in the XMP-xmpMM:DerivedFrom* and XMP-xmpMM:History* tags.  I assume Lightroom has a way to find these related tags, but I'm not sure, as I don't use LR. 

QuoteI'm doing:

exiftool -G -a -s2 /path/to/image
to look at the exif data.  Is there a better approach?  Is there a known way to do what I'm trying to do?

A good Digital Asset Management (DAM) program should be able to find similar images.  Lightroom should have an option for this.  I believe DigiKam does.

What these programs do is create a visual hash of each image, usually as part of the process of loading the file into the library.  A visual hash is different than the type of hash that is used to verify files, such as MD5.  Similar images will produce similar visual hashes, which allows the program to find duplicate files, even when they are cropped a bit.

Going outside of a DAM program, DupeGuru has the ability to find similar images, though I've never actually tested it out for images.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).