Sorting out photos with content verification (md5 / sha256)

Started by lumiere, July 13, 2020, 02:35:41 PM

Previous topic - Next topic

lumiere

Hello all,

I have several family pictures scattered across many, many folders often with clones around.
Some of them were taken with burst mode (many pictures in the same second) so YYYY/MM/DD/HH/MM/SS is not the right pattern.
I would like to copy them over to final folder destination sorting them by:
YYYY/MM/DD/HH-MM-SS.[ext]
But if there is already a picture with particular name in the target folder then I would like to compare md5/sha256 of their content to really make sure they are the same.
If they are the same then skip it. If they are not then create new file with [name]-1.jpg, [name]-2.jpg etc
Is there any way I can achieve this with exiftool ?
Also is there any way I can check the md5/sha256 in parallel ? I have quite good CPU with many threads and can burn it to speed up the check.

Phil Harvey

ExifTool doesn't have the ability to do a checksum of the entire file.  I would suggest using an md5 utility to generate checksums for all of your files, then removing the files from the list that have duplicate checksums, then sending the remaining file names to ExifTool with a command like this:

exiftool "-filename<createdate" -d %Y/%m/%d/%H-%M-%S%%-c.%%e -ext jpg -r DIR

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

lumiere

Thanks @Phil !
Could it be more generic command working not only with jpg but all jpg/png/gif at once ?
Also can it be sorted based on exif data stored inside each file ?

Phil Harvey

Yes.  Just add more -ext options with whatever files you want to move.  Or leave out the -ext option entirely to move anything in the exiftool -listwf extension list (provided it has a CreateDate tag).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

lumiere

Thanks Phil again :-)
WIll it sort based on exif data stored inside those files or based on ... filename ?
I am not sure I understand what the syntax "-filename<createdate" means ...

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

TaToToMeK

Hi
following lumiere's question:

I am looking for a method to calculate the MD5 hash for image only - not for the whole file with metadata.
I sometimes correct exiftags manually (e.g. date, GPSloc). And I need a method to find all copies of the same image whether their metadata are modified or not.
What is your recommendation?

Phil Harvey

For JPEG files, an MD5 for the image only may be calculated like this:

exiftool -all= -o - FILE | md5

Other formats are more problematic because metadata may not be completely removed.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

As of ExifTool 12.58 there is a new Extra ImageDataMD5 tag which returns an MD5 digest of the image data only for JPEG and TIFF-based files.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).