Size of Tag?

Started by Stephen Marsh, April 20, 2017, 05:48:44 AM

Previous topic - Next topic

Stephen Marsh

Some files may be bloated with over 50mb of errant XMP-photoshop:DocumentAncestor data.

When processing, ExifTool reports "Warning: [Minor] Extracted only 1000 photoshop:DocumentAncestors items. Ignore minor errors to extract all".

I was wondering if there is a method to report on files that contain this tag and to perhaps selectively list/sort/filter or otherwise weed out files that have "excessive" amounts of this data. So rather than simply removing this entry from all files, only remove it from files where the items are over 100 items, or 1mb of data, or output a CSV file listing the size of the tags etc.

Phil Harvey

Hi Stephen,

You could try this to delete DocumentAncestors if it has more than 100 items:

exiftool -if "$documentancestors and (()=$documentancestors =~ /, /g) > 100" -documentancestors= DIR

Here I have used a little Perl trick to count the number of items in the string.

- Phil

Edit: This command actually works for > 101 items because it is counting the number of separators, and the number of items will be one greater
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Fantastic, thanks Phil!

There is no point "throwing the baby out with the bathwater", so rather than indiscriminately deleting this data I thought that it would be wise to do it for "excessive" items... Now all I need to do is figure out what "excessive" means!

elmimmo

Quote from: Phil Harvey on April 20, 2017, 07:47:50 AM
You could try this to delete DocumentAncestors if it has more than 100 items:

exiftool -if "$documentancestors and (()=$documentancestors =~ /, /g) > 100" -documentancestors= DIR

Shouldn't, then, the following command return a list of what those images are?

exiftool -filename -r -if "$documentancestors and (()=$documentancestors =~ /, /g) > 100" DIR

exiftool is not returning any:

    1 directories scanned
   67 files failed condition
    0 image files read


even thought I have confirmed there is a PNG file in the DIR folder matching that condition by extracting the XMP metadata with

exiftool -b -XMP image.png >out.xmp

which returns:

Warning: [Minor] Extracted only 1000 photoshop:DocumentAncestors items. Ignore minor errors to extract all - image.png

Phil Harvey

You need to use single quotes on Mac/Linux.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

elmimmo

Ouch! Thanks!

Now, how come that while extracting the XMP metadata reports "Extracted only 1000 photoshop:DocumentAncestors items [...]" but checking if there are more than 999 items returns false?:

$ exiftool -filename -r -if '$documentancestors and (()=$documentancestors =~ /, /g) > 999' image.png
    1 files failed condition


Note that checking for "only" more than 100 does return true.

Phil Harvey

You're counting the number of ", " in the string, which is one less than the number of items.  So "> 998" should be true.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).