ExifTool Forum

ExifTool => Newbies => Topic started by: maxandersen on October 30, 2016, 12:23:48 PM

Title: detect if "already exists" are true copies and delete or rename source file ?
Post by: maxandersen on October 30, 2016, 12:23:48 PM
Hi,

I'm loving exiftool helping me to clean up my way too many photos into year/month/day folders.

Current command i'm going around running in my various folders are:

sudo exiftool -v -r -d '/volume1/photo/sorted/big_camera/%Y/%m-%b/%d'  "-directory<CreateDate" "-directory<DateTimeOriginal" "-directory<FileModifyDate" .

I got a challenge though - over the years I ended up with copies of same photo and thus I got a lot of
"x already exists - y" errors.

Is there a flag or some awesome trick to have exiftool check if the file are an actual identical file and then rename or even remove the source file ?

Trying to not have to run additional scripts to verify the files are true duplicates.

Any tips appreciated!
/max
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: Phil Harvey on October 30, 2016, 03:55:01 PM
Hi Max.

ExifTool deals with one file at a time, so you need to find some other way to compare two files to see if they are the same.

You can also move the duplicates by doing this:

exiftool -v -r -d '/volume1/photo/sorted/big_camera/%Y/%m-%b/%d/%%f%%-c.%%e'  "-filename<CreateDate" "-filename<DateTimeOriginal" "-filename<FileModifyDate" .

This will add a "-1", "-2" etc to the names of duplicate files.

- Phil
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: maxandersen on November 06, 2016, 05:05:00 PM
I tried playing with -c but no luck.

With your command it seem to created nested folders...

Setting new values from IMG_9050.JPG
'IMG_9050.JPG' --> '/Volumes/photo/sorted/lisbeth_phone/2016/03-Mar/29/IMG_9050.JPG/IMG_9050.JPG'
Error creating directory /Volumes/photo/sorted/lisbeth_phone/2016/03-Mar/29/IMG_9050.JPG
Warning = Error creating directory for '/Volumes/photo/sorted/lisbeth_phone/2016/03-Mar[snip]
Warning: Error creating directory for '/Volumes/photo/sorted/lisbeth_phone/2016/03-Mar/29/IMG_9050.JPG/IMG_9050.JPG' - IMG_9050.JPG

any idea what might be wrong ?
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: Hayo Baan on November 08, 2016, 06:32:34 AM
I think Phil made a typo. Try it with %%-c replaced with -%%c.
Also make sure the directory base /Volumes/photo... is correct and is writeable by you.
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: Phil Harvey on November 08, 2016, 07:08:58 AM
Quote from: Hayo Baan on November 08, 2016, 06:32:34 AM
I think Phil made a typo. Try it with %%-c replaced with -%%c.

No typo this time:

            For %c, these modifiers have a different effects.  If a field
            width is given, the copy number is padded with zeros to the
            specified width.  A leading '-' adds a dash before the copy
            number
, and a '+' adds an underline.  By default, the copy number
            is omitted from the first file of a given name, but this can be
            changed by adding a decimal point to the modifier.  For example:

                -w A%-cZ.txt      # AZ.txt, A-1Z.txt, A-2Z.txt ...

- Phil
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: Hayo Baan on November 08, 2016, 05:16:13 PM
Thanks Phil for enlightening me, there are just so many features. Always good to learn even more ;D

So that was definitely not it then, so perhaps still something with the paths and/or permissions.
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: BangkokPhoto on November 19, 2016, 05:55:21 PM
I had a similar problem. DoubleKiller from bigbangenterprises worked well.

http://www.bigbangenterprises.de/en/doublekiller/
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: wywh on February 13, 2022, 05:05:03 AM
Quote from: Phil Harvey on November 08, 2016, 07:08:58 AM
By default, the copy number is omitted from the first file of a given name, but this can be changed by adding a decimal point to the modifier.

I am glad I stumbled on this old message because I prefer to rename also the first duplicate so it is easier to spot.

So the decimal point is added like %%+.c just before c, right?

exiftool '-FileName<DateTimeOriginal' -d '%Y-%m%d-%H%M-%S%%+.c.%%e' .

It seems to work (I have used this pattern for all dates from the beginning because filenames with no spaces are easier in the Terminal):

2000-0101-1200-00_0.jpg
2000-0101-1200-00_1.jpg
2000-0101-1200-00_2.jpg
2000-0101-1200-00_3.jpg


How can I set a fixed amount of copy numbers? For example:

2000-0101-1200-00_9.jpg
2000-0101-1200-00_10.jpg

to:

2000-0101-1200-00_09.jpg
2000-0101-1200-00_10.jpg


- Matti
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: StarGeek on February 13, 2022, 10:49:36 AM
See the Advanced features section of the -w (-TextOut) option (https://exiftool.org/exiftool_pod.html#w-EXT-or-FMT--textOut).  Multiple examples under that section.
    For %c, these modifiers have a different effects. If a field width is given, the copy number is padded with zeros to the specified width.
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: wywh on February 13, 2022, 12:04:32 PM
Quote from: StarGeek on February 13, 2022, 10:49:36 AM
See the Advanced features section of the -w (-TextOut) option (https://exiftool.org/exiftool_pod.html#w-EXT-or-FMT--textOut).  Multiple examples under that section.

Thanks.

exiftool '-FileName<DateTimeOriginal' -d '%Y-%m%d-%H%M-%S%%+.2c.%%e' .

2000-0101-1200-00_00.jpg
2000-0101-1200-00_01.jpg
2000-0101-1200-00_02.jpg
2000-0101-1200-00_03.jpg


- Matti
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: wywh on February 23, 2022, 07:44:07 AM
An additional question. How should I modify:

exiftool '-FileName<DateTimeOriginal' -d '%Y-%m%d-%H%M-%S%%+.2nc.%%e' .

2000-0101-1100-00_01.jpg
2000-0101-1200-00_01.jpg
2000-0101-1200-00_02.jpg
2000-0101-1200-00_03.jpg
2000-0101-1200-00_04.jpg
2000-0101-1300-00_01.jpg


...to number only the duplicates so they are easier to spot in lists:

2000-0101-1100-00.jpg
2000-0101-1200-00_01.jpg
2000-0101-1200-00_02.jpg
2000-0101-1200-00_03.jpg
2000-0101-1200-00_04.jpg
2000-0101-1300-00.jpg


- Matti
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: Phil Harvey on February 23, 2022, 08:31:20 AM
Hi Matti,

Change %+.2nc to %+2c

- Phil
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: wywh on February 23, 2022, 08:46:18 AM
Quote from: Phil Harvey on February 23, 2022, 08:31:20 AM
Change %+.2nc to %+2c

Thanks, but I already tried it (among other variations), but it does not rename the very first duplicate (in this example to 2000-0101-1200-00_01.jpg):

exiftool '-FileName<DateTimeOriginal' -d '%Y-%m%d-%H%M-%S%%+2c.%%e' .

2000-0101-1100-00.jpg
2000-0101-1200-00_02.jpg
2000-0101-1200-00_03.jpg
2000-0101-1200-00_04.jpg
2000-0101-1200-00.jpg
2000-0101-1300-00.jpg


- Matti
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: Phil Harvey on February 23, 2022, 08:57:57 AM
Oh, I see.  Can't be done.

- Phil
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: wywh on February 23, 2022, 09:04:29 AM
Quote from: Phil Harvey on February 23, 2022, 08:57:57 AM
Oh, I see.  Can't be done.

Bummer. Somehow the latest GraphicConverter can rename via EXIF date like that and I wanted to do the same via exiftool.

But this not a big deal because this is only used when going through hundreds of raw images and some of them taken at the same second. Usually I pick just one of those and trash the rest and do the rename again. Another option would be to rename in subsecond precision in the first run.

- Matti
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: chuck lee on April 09, 2022, 11:50:46 AM
I have the same problem for a while.  Just found a way to workaround, I wrote a function that output the duplicate files and keep the newer or larger one compared to the original. Combining find, sed and sort commands.  "%-3uc" is the copy number I used to identify the duplicate file since I have a lot of file names ending with xxx-01.jpg or xxx_01.jpg.  Therefore, upper case letter is a better choice.  You can change it to fit your need.


file_rm_dup(){
  # ((++func_counter))
  # sec_title "" "$FUNCNAME() --> `basename ${BASH_SOURCE[0]}` --> `basename $0`"
 
  [[ "$file_ulb" == "u" ]] && tier_2=3 || tier_2=4
 
  find .  -iname '*-AA[A-Z].*' -type f -printf "%p %T@ %s\n" | sed -rn 's/(.*)(-AA[A-Z])(\..*) (.*)(\..*) (.*)/\1\3 \1\2\3 \4 \6/p' | sort -b -s -S 10M -k 1,1 -k "$tier_2","$tier_2"nr > dup_out

  if [[ -s dup_out ]] ; then
    f_dest_tmp=
    while read -r line;  do # read a line each loop
      read f_dest f_src f_src_sec f_src_size <<< $line
       
        # f_dest: filename order and also destination filename(without copy number -AA[A-Z])
        # f_src  : duplicate filename
        # f_src_sec  : file seconds since epoch
        # f_src_size : file size
         
        # echo "dest_tmp: $f_dest_tmp
         
      if [[ "$f_dest" != "$f_dest_tmp" ]]; then
        f_dest_sec=$(stat -c%Y "$f_dest")
        f_dest_size=$(stat -c%s "$f_dest")
         
          #  %Y: time of last data modification, seconds since Epoch;
          #  %s: total size, in bytes
           
        [[ "$file_ulb" = u && "$f_src_sec" -gt "$f_dest_sec" || "$file_ulb" = l && "$f_src_size" -gt "$f_dest_size" ]] && mv -f -T "$f_src" "$f_dest" || rm "$f_src"
       
        f_dest_tmp="$f_dest"
      else
        rm "$f_src"
      fi
    done < dup_out
  fi
  rm dup_out
}



In 'find' command, %p is  filename, %T@ is time of last data modification, seconds since Epoch, but I keep only the integer part after 'sed' command, and %s is file size.  Output to a file called dup_out. The first field is the original file name, the 2nd field is the duplicate file name, the 3rd field is the seconds number and the last field is the file size in bytes.  The first index field should be the original file name.  The 2nd index(the 3rd field) in my case is the time of modification, newer if number is larger.  In 'Sort' command, the variable $tier_2 is 3, treat the field as number and sort in reverse order. 

Here is an example output:

./20210306_154312_00_90_8.jpg ./20210306_154312_00_90_8-AAB.jpg 1649512702 6562827
./8.jpg ./8-AAC.jpg 1649512702 6562827
./8.jpg ./8-AAB.jpg 1649512702 6562827
./t/202103xx_8.jpg ./t/202103xx_8-AAC.jpg 1649512702 6560771
./t/202103xx_8.jpg ./t/202103xx_8-AAB.jpg 1649512702 6560771

Then, a while loop to check which file to keep.  Since, files are sort in order, the only comparison is between the original and the first duplicate file.  In this example,

./8.jpg ./8-AAB.jpg 1649512702 6562827
./t/202103xx_8.jpg ./t/202103xx_8-AAB.jpg 1649512702 6560771

These two files are removed directly without comparison.  Look through the man find to choose the printf arguments you need and sort it accordingly.  Hope this will give you some help.

BTW, instead of comparing the modification time or size of files, we can use md5sum to check two files are identical or not?  For example:

md5sum 201801xx_8.jpg 201801Xx_8.jpg


1db75b25856087ac0e45a9a891e3e97c  201801xx_8.jpg
1db75b25856087ac0e45a9a891e3e97c  201801Xx_8.jpg

Then keep the one with the filename you want.
Title: Re: detect if "already exists" are true copies and delete or rename source file ?
Post by: Phil Harvey on October 20, 2024, 01:46:16 PM
We have a work-around using a 2-step process mentioned here (https://exiftool.org/forum/index.php?msg=89152)

- Phil