Managing duplicates, organization and normalization

Started by joesmoe, March 27, 2023, 04:21:53 PM

Previous topic - Next topic

joesmoe

-- Use fslint to remove identical files (i.e. same hash), keep oldest.

        rmlint -S m

-- Handle google takeout .json's
       
        exiftool -tagsfromfile "%d/%F.json" "-DateTimeOriginal<PhotoTakenTimeTimestamp" -d %s -overwrite_original

-- Normalize exif data to the oldest date contained.
       
        exiftool -r '-datetimeoriginal<createdate' -P -if '($createdate le $modifydate) and ($createdate le $datetimeoriginal)' -overwrite_original_in_place .
        exiftool -r '-datetimeoriginal<modifydate' -P -if '($modifydate le $datetimeoriginal) and ($modifydate le $createdate)' -overwrite_original_in_place .
        exiftool -r '-alldates<filemodifydate' -P -if 'not $datetimeoriginal' -overwrite_original_in_place .
       
        exiftool -r -ext cr2 -ext jpg -ext jpeg -ext tiff -ext tif -ext nef -tagsfromfile %d%f.xmp -xmp ##Import XMP to EXIF
or
        exiftool -r --ext xmp -tagsfromfile %d%f.xmp -xmp -overwrite_original
       
        find . -name '*.XMP' -type f -delete; find . -name '*.xmp' -type f -delete ##DELETE XMP files.

-- Use exiftool to update file creation date to that of the exif (if it exists)
       
        exiftool '-DateTimeOriginal>FileModifyDate' directory
 
or
       
        $ jhead -ft file.jpg (only supports jpeg so wont work)
 
or
       
        exiv2 -T rename image.jpg (supports many formats)
 
or
       
        exifdate2fs /home/username/myphotos (only supports jpeg?)
 
or
       
        https://www.mattcrampton.com/blog/updating-file-create-date-from-exif-data/

-- Use fclones to remove identical (by hash of file without tag) (is larger tag better? - https://github.com/DRRDietrich/DeDup-Image/blob/master/dedup-img)
       
        fclones group --cache .fclones.cache . --name '*.jpg' -i --transform 'exiv2 -d a $IN' --in-place
         
-- Move video files to videos directory.

        find /. -type f | grep -iE "\.webm$|\.flv$|\.vob$|\.ogg$|\.ogv$|\.drc$|\.gifv$|\.mng$|\.avi$|\.mov$|\.qt$|\.wmv$|\.yuv$|\.rm$|\.rmvb$|/.asf$|\.amv$|\.mp4$|\.m4v$|\.$
        exiftool '-filename<filemodifydate' '-filename<DateTimeOriginal' '-filename<CreateDate' -d /Volumes/Bilder3/video/%Y/%Y%m%d_%H%M%S%%-c.%%le -r -ext avi -ext mov -ex$
        find /Volumes/Bilder3 \( -name "*.avi" -or -name "*.mpg" -or -name "*.mov" \) -type f -exec mv {} /Media/Video/ \;
       
-- Move Screenshots out of the way.

        exiftool '-filename<filemodifydate' '-filename<DateTimeOriginal' '-filename<CreateDate' -d /Volumes/Bilder3/new/%Y/%Y%m%d_%H%M%S%%-c_screenshot.%%le -ext png -r /Me$
       
-- Use exiftool to remove any jepg that has a raw file with some info.

        exiftool -directory=trash/%d -srcfile %d%f.jpg -ext nef -ext cr2 -ext arw DIR
       
        or
         
        https://photo.stackexchange.com/questions/16401/how-to-delete-jpg-files-but-only-if-the-matching-raw-file-exists
       
-- Move all photos with exif data to organized folder structure. (Order DateTimeOrigional, CreateDate))

        exiftool '-filename<DateTimeOriginal' '-filename<CreateDate' -d %Y%m%d_%H%M%S%%-c.%%le -r -ext dng -ext psd -ext cr2 -ext jpg -ext jpeg -ext tiff -ext tif -ext nef $
       
-- Figure out how to handle removing lowest quality of 'similiar' but the same images (So different exif data, but different sizes)

        find . -type f -name '*.[jJ][pP][gG]' | parallel --progress 'fingerprint=`gm identify -size 8x8 -format "%k" {}`; mv {} {//}/$fingerprint.jpg'
       
        https://github.com/jhnc/findimagedupes
       
        czkawka?
       
To Do: 
        -- Figure out how to set datestamps on files without exif by folder structure/filename (for -very- old photos that have no date in exif and incorrect filedate).
        -- Change workflow to MOVE files not delete them (just incase).
        -- Integrate subseconds as sometimes more than one photo is taken per second that isn't a duplicate.
       
Sources:
- https://www.linux.com/training-tutorials/how-sort-and-remove-duplicate-photos-linux/
- https://askubuntu.com/questions/1308613/how-to-remove-slightly-modified-duplicate-images
- https://exiftool.org/forum/index.php?topic=10000.0
- https://exiftool.org/forum/index.php?topic=13143.0
- https://exiftool.org/forum/index.php?topic=3788.0
- https://exiftool.org/filename.html
- https://crates.io/crates/fclones 
- https://github.com/DRRDietrich/DeDup-Image/blob/master/dedup-img
- https://exiftool.org/faq.html#Q5
- https://askubuntu.com/questions/404567/how-to-organize-sort-images-by-exif-image-data
- https://exiv2.org/manpage.html
- https://manpages.ubuntu.com/manpages/xenial/man1/exiftime.1.html

StarGeek

I was about to post on your reddit thread, but I'l do here instead.

Google photos does not remove any metadata from your files. Additionally, the json that your receive from Google takeout is set to UTC.  So when you write the date from the json file into the image, you are overwriting the original time stamp with an incorrect time stamp.

Additionally, that recorded UTC time may be inaccurate. It has been some years since I tested, but I was never able to figure out what Google was as the time zone for the uploaded image, but sometimes it based the UTC off of Eastern Time and sometimes it was Pacific Time, regardless of what the actual time zone for the image was.

The only metadata that is lost when an image is uploaded to Google Photos is the file system ones, and you already appear to have that covered in your other commands.

As for duplicate checking, you might also look into exiftool's new ImageDataMD5 which will computer an MD5 of the raw image data for jpegs and tiffs.  You can save that into a tag and might be able to script something up to do comparisons.  Though you might wait until 12.59 as the code is changing in that version.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

joesmoe

Thanks for your feedback!

So essentially the .json files with google can just be discarded then? There's nothing of value in them?

My google photos are all messed up, the exif says that at photo for example is from 2012, in google photos web app it says 2014 (probably when i uploaded it), but the filename (likely the correct date) is 2001.

I'll look into ImageDataMD5, but if it only supports jepg and tiff its not going to work for me, a lot of my files from canon are .CR2 and a lot of my apple photos are .HEIC.

Thanks again for the reply!

<edit> Also, i'm currently trying to use this - https://github.com/DRRDietrich/DeDup-Image/blob/master/dedup-img instead of the fclones as I think it maybe best to remove the largest file including the tag rather than the way that fclones is picking which to delete. Ideally I need to find a way to cache the hashes so that everytime I import more photos I don't have to rehash everything.

joesmoe

What are your thoughts on this guys solution - https://exiftool.org/forum/index.php?msg=17464

?

StarGeek

Quote from: joesmoe on March 27, 2023, 06:26:21 PMSo essentially the .json files with google can just be discarded then? There's nothing of value in them?

If you made changes to the data on the web site, then that data will be held in the json files.

If the original files did not have actual EXIF data, for example, only file system time stamps, then you might want to use the time stamps from the json files.  But they are probably off by a few hours, as I mentioned above.  PNGs and screenshots are probably the most likely files that you want to copy data from json files, as they will rarely have any embedded metadata to start with.

QuoteMy google photos are all messed up, the exif says that at photo for example is from 2012, in google photos web app it says 2014 (probably when i uploaded it), but the filename (likely the correct date) is 2001.

If at some point you took the time to name them according to the date like that, then that is most likely the accurate time.  You can usually use a command like this as long as all 14 numbers are there and in order from year->second (see FAQ #5)
exiftool "-AllDates<Filename" /path/to/files/

One thing you might do is to use the -wm (-writeMode) option with -wm cg.  Adding this to any command means that exiftool will only create new data, not overwrite existing data.  I'd suggest that for anything that you're not sure if it already has useful data.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype