Identifying the core image in JPEG

Started by panicnow, February 07, 2013, 07:17:08 PM

Previous topic - Next topic

panicnow

I've noticed that all the information on the use of the "binary" switch refer to raw images or thumbnails and previews in JPEG. Is there any way to access the core image of a JPEG?

My motivation is simply to be able to checksum the core image ignoring all other data.

I'm in the process of consolidating a number of collections of files with significant overlaps. Photos which have been stored on original SD and CF cards, old machines and external drives. Some have been recovered from failed or failing hard drives and whilst I know that they are all valid files I don't know how much I can trust the data. I have already de-duped identical files and I am now trying to find files which have identical images but may have had the metadata (or embedded thumbnails changed).

Phil Harvey

Sorry, ExifTool doesn't deal with image data at all.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

But I had a thought.  You can strip the metadata and do something like this...

exiftool -all= -o - image.jpg | md5 -

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

panicnow

Thanks Phil, that's a great help.

I've managed to identify about 3000 duplicates images in my list of 16000 apparent duplicates, finding how the metadata differs won't be difficult.

Now I just need to find a tool to process the next step as I seem to have a much larger set of mismatches than I'd expected as only one file generated an error and I estimate less than a quarter differ due to rotation. I realise ExifTool can't help with the remaining 13000.