Eliminate dup images and transpose metadata info onto highest rez images

Started by dwsprouse, April 13, 2011, 03:18:59 PM

Previous topic - Next topic

dwsprouse

Hey all,

GOAL:

I'm trying to write an applescript program that utilizes the Exiftool to eliminate duplicate images in a directory and transpose metadata information from the lower res duplicates to the higher resolution.


PLAN:

In order to identify duplicates and delete them I want to start by pulling the DateTimeOriginal and SubSecTimeOriginal tags from a specified directory recursively. Then I want to parse the returned information into a list and pull the file size of each image as well. Then I want to sort  this information first by DateTimeOriginal then by SubSecTimeOriginal and then by filesize. Finally the script will repeat down the list and determine if a file is a duplicate (DateTimeOriginal and SubSecTimeOriginal are the same for two files) and read/write metadata to the higher resolution file (larger file size).


SCRIPT:

on run
   --establish imageData variable
   set imageData to {}
   set AppleScript's text item delimiters to {""}
   
   --choose directory with images
   set folderPath to (choose folder with prompt "Please choose folder with images:") as text
   
   --pull Date/Time and Sub Sec metadata information from directory
   set exiftoolData to paragraphs of (do shell script "exiftool -r -DateTimeOriginal -SubSecTimeOriginal " & (quoted form of POSIX path of folderPath))
   
   --parse data from exiftool
   repeat with i from 1 to ((count of exiftoolData) - 2)
      set dateCheck to (characters 1 thru 18 of item (i + 1) of exiftoolData) as text
      set subsecCheck to (characters 1 thru 21 of item (i + 2) of exiftoolData) as text
      if ((dateCheck is equal to "Date/Time Original") and (subsecCheck is equal to "Sub Sec Time Original")) then
         set imageDate to (((characters 35 thru -1) of (item (i + 1) of exiftoolData)) & ":" & ((characters 35 thru -1) of (item (i + 2) of exiftoolData))) as text
         set imagePath to ((characters 10 thru -1) of (item i of exiftoolData)) as text
         set imageSize to size of (info for imagePath) as text
         repeat until (count of items of imageSize) is 12
            set imageSize to ("0" & imageSize)
         end repeat
         --store parsed data into imageData variable
         set end of imageData to (imageDate & " " & imageSize & " " & imagePath)
         set i to i + 2
      end if
   end repeat
   
   if imageData is equal to {} then
      display dialog "Error: There are no files with appropriate metadata in this directory." buttons {"OK"} default button 1
      return
   end if
   
   set AppleScript's text item delimiters to {ASCII character 10}
   set imageDataString to (imageData as string)
   return imageDataString
   --sort images by Date/Time, Sub Sec, and then file size
   set sortedImageData to do shell script "echo " & quoted form of imageDataString & " | sort"
   
   return sortedImageData
end run

Note: This script goes as far as creating a string that is sorted properly.


PROBLEM:

Basically the script works perfectly except with a large number of files--and that is the purpose of building this tool. For some reason the script errors out on the "set sortedImageData to do shell script "echo " & quoted form of imageDataString & " | sort"" line and I believe this is because there are too many files in the list.

QUESTION:

Does anyone know how to get around the program crashing from too many files? Or can I utilize Exiftool more effectively to solve my problem easier?

Thanks for any help!!

Phil Harvey

You can get most of the way with this command:

exiftool -fileOrder SubSecDateTimeOriginal -fileOrder filesize -filename -directory -subsecdatetimeoriginal -filesize -T -r DIR

This will give you a tab-delimited list of files in the desired order at the expense of an additional processing pass to do the sorting (filesize is sorted numerically as needed), but at least it will allow you to avoid the problematic sort step.

- Phil

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dwsprouse

Wow thanks for the help!! This is exactly what I needed! Thanks.

dwsprouse

Hey Phil,

I just realized that when I use the code you provided:

exiftool -fileOrder SubSecDateTimeOriginal -fileOrder filesize -filename -directory -subsecdatetimeoriginal -filesize -T -r DIR

It only sorts by "SubSecDateTimeOriginal" and not by "filesize." Ideally it would sort by "SubSecDateTimeOriginal" first and then sort those results by "filesize." Any ideas?

Phil Harvey

Ah.  My bad.  I forgot about the FileSize print conversion, which gives values like "9.8 MB".  String values like this are sorted alphabetically.  We need to sort on the numerical value:

... -fileorder SubSecDateTimeOriginal -fileorder filesize# ...

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).