Create reproducible filename unique to file?

Started by lnjustin, January 29, 2023, 08:44:32 AM

Previous topic - Next topic

lnjustin

To take advantage of Apple shared libraries, I need to prevent duplicates from being backed up. A shared photo downloads to both my wife and I's account with different filenames, but the same metadata. So I need to process those two files in a way that will allow one of them to be copied to my backup but prevent the other one from being copied. I assume the best way to do that is to create a reproducible filename from the metadata so that the second copy operation will fail?

If that's the case, then what's the best way to do that? I already create a filename from the timestamp but any other metadata that I can add to that to safeguard against two non-duplicate photos being taken at the same time?

wywh

> To take advantage of Apple shared libraries

YMMV so I do not quite follow what "Apple shared libraries" means (I do not use iCloud Photos).

Anyways, I occasionally put my Photos.app library at /Users/Shared so it is shared between all users (it is not recommended to access that library at the same time from different logged-in users...). ...that way I can more easily set my Mac to no sleep so "faces" might be updated after a week or so... but I digress ...

Currently I have the Photos.app library in an external APFS (SSD) or Mac OS Extended (HDD) volume (Finder > Get Info > "Ignore ownership on this volume" should be ON by default). Also Final Cut Pro.app and maybe Music.app libraries should be saved on such Mac volumes  (i.e. NOT FAT, exFat, NTFS, case-sensitive TimeMachine volume, network volumes etc).

So: a simple way might be to use Carbon Copy Cloner to clone both local Photos libraries to an external disk as a disk image. The initial backup takes some time. Then update both backups as needed which is quite fast via CCC's rsync.

https://bombich.com

StarGeek

If the cameras are different, you might check the make and model.  Also check if there is a serial number.

As long as the image data hasn't been edited, you might try an md5 hash as in this post.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

lnjustin

#3
Interesting. Might try the MD5 hash. How would I modify that command in the linked post to:
(1) write a certain number of characters of the MD5 hash into the filename?

Or

(2) write the md5 hash to a tag. Then on a different pic compute the hash and copy the pic to a backup only if the computed hash isn't in any of the tags of pics already backed up?

EDIT: well, I think I know how to do (2). But only seems to work for images, not videos (even after using ffmpeg to strip metadata). And I get a warning that the ICC profile of heic images is removed. I think I would prefer an approach that could be used for both images and videos, so might be reverting back to the original question about what other metadata I could use....starting to think maybe the image size would suffice. Seems unlikely that two images/videos would be created at the same time with the same file size, but then again maybe it's more likely if someone takes a few rapid shots of the same subject...perhaps I could use the image hash for pictures and the file size for videos since it'd be highly unlikely for two videos to have the same timestamp and filesize...

lnjustin

#4
I think I've convinced myself that my problem is solved if I just name files based on their timestamp and file size in bytes. That should prevent exact duplicates that happen due to the shared nature of the iCloud library, while at the same time not being over-inclusive.

Final command:
/usr/local/bin/exiftool -d /Volumes/Test/%Y/%m/%Y-%m-%d_at_%Hh%Mm%Ss -overwrite_original '-filename<${filemodifydate}_${filesize#}.%e' '-filename<${DateCreated}_${filesize#}.%e' '-filename<${CreateDate}_${filesize#}.%e' '-filename<${DateTimeOriginal}_${filesize#}.%e' '-filemodifydate<DateCreated#' '-filemodifydate<CreateDate#'  '-filemodifydate<DateTimeOriginal#' /Volumes/Test/020125CC-5DF2-4E38-B4A5-789F5BF54E3D.heic

Hopefully that'll work well! Open to any other suggestions if there are any, but otherwise we'll see!

StarGeek

Filesize is pretty good, though some editing some tags won't change the size.  But the most common would be editing the date, which alone if edited would give a different filename.  FAQ #13 is relevant here.

Overall, I feel that Digital Asset Management (DAM) programs are effective.  Programs such as Lightroom (paid), Imatch (Windows, paid) and DigiKam (free, open source) have built in ability to find duplicates.

If you're able to use Python, then there are libraries that perform a Perceptual hash.  Unfortunately, I haven't found a simple command line solution for this and I don't know enough about Python to script one.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

lnjustin

#6
Yeah, file size actually is a few bytes off. Using mediadatasize works though:

/usr/local/bin/exiftool -d /Volumes/Test/%Y/%m/%Y-%m-%d_at_%Hh%Mm%Ss -overwrite_original '-filename<${filemodifydate}_${mediadatasize#}.%e' '-filename<${DateCreated}_${mediadatasize#}.%e' '-filename<${CreateDate}_${mediadatasize#}.%e' '-filename<${DateTimeOriginal}_${mediadatasize#}.%e' '-filemodifydate<DateCreated#' '-filemodifydate<CreateDate#'  '-filemodifydate<DateTimeOriginal#' /Volumes/Test/020125CC-5DF2-4E38-B4A5-789F5BF54E3D.heic