Alphabetize list-type tags

Started by wywh, January 11, 2024, 04:07:22 AM

Previous topic - Next topic

wywh

I try to add XMP-dc:Subject items so that the old ones are preserved, duplicates are omitted and the list is written in alphabetical order.

This command sequence works otherwise OK but is it possible to alphabetize the list so long lists would be easier to read:

exiftool -a -G1 -s -sep '//' -XMP:All image.jpg

exiftool -m -P -overwrite_original -sep ';' -XMP-dc:Title='Marley; Bob' -XMP-photoshop:CaptionWriter='Harvey; Phil' -XMP-dc:Description='Doe; John and Doe; Jane in the 1976 concert.' -XMP-xmp:Rating=3 -tagsFromFile @ -XMP-dc:Subject -XMP-dc:Subject='Keyword 2;Keyword 1' -api NoDups=1 image.jpg
Warning: No writable tags set from image.jpg
    1 image files updated

exiftool -a -G1 -s -sep '//' -XMP:All image.jpg
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.71
[XMP-dc]        Description                     : Doe; John and Doe; Jane in the 1976 concert.
[XMP-dc]        Subject                         : Keyword 2//Keyword 1
[XMP-dc]        Title                           : Marley; Bob
[XMP-photoshop] CaptionWriter                   : Harvey; Phil
[XMP-xmp]       Rating                          : 3

exiftool -m -P -overwrite_original -sep ';' -XMP-dc:Title='Marley, Bob' -XMP-photoshop:CaptionWriter='Harvey, Phil' -XMP-dc:Description='Doe, John and Doe, Jane in the 1976 concert.' -XMP-xmp:Rating=5 -tagsFromFile @ -XMP-dc:Subject -XMP-dc:Subject='Keyword 4;Keyword 3;Keyword 2' -api NoDups=1 image.jpg
    1 image files updated

exiftool -a -G1 -s -sep '//' -XMP:All image.jpg
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.71
[XMP-dc]        Description                     : Doe, John and Doe, Jane in the 1976 concert.
[XMP-dc]        Subject                         : Keyword 2//Keyword 1//Keyword 4//Keyword 3
[XMP-dc]        Title                           : Marley, Bob
[XMP-photoshop] CaptionWriter                   : Harvey, Phil
[XMP-xmp]       Rating                          : 5


Any other comments how to fine-tune the command? (In those commands I also also tested how non-list tags behave if they contain semicolons or commas).

- Matti

Phil Harvey

Hi Matti,

Your commands look good.  I had forgotten about the NoDups option.  Well done.

You can sort list items in a separate command:

exiftool -sep // '-subject<${subject;$_=join "//", sort split m(//)}' FILE

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

I believe that would be a case-sensitive sort.

This post adds cmp and lc to do a case-insensitive sort.

C:\>exiftool -P -overwrite_original -all= -subject="Test 3" -Subject="test 2" -subject="Test 1" y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Test 3, test 2, Test 1

C:\>exiftool -P -overwrite_original -sep // "-subject<${subject;$_=join '//', sort split m(//)}" y:\!temp\Test4.jpg 
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Test 1, Test 3, test 2

C:\>exiftool -P -overwrite_original -sep "//" "-subject<${subject;$_=join '//', sort{lc($a) cmp lc($b)} split '//'}" y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Test 1, test 2, Test 3

Unicode characters probably need something else, though.  I see some references to a fc function on StackExchange for this case.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

wywh

#4
Thanks, that works. I prefer the case-insensitive option like below so it is easier to spot duplicates with just a typo in case.

A remaining issue: is there an option to sort numbers as 8, 9, 10, 11, 98, 99, 100, 101 instead 10, 100, 101, 11, 7, 8, 9, 98, 99 etc:

exiftool -XMP-dc:Title='Marley, Bob' -XMP-photoshop:CaptionWriter='Harvey, Phil' -XMP-dc:Description='Doe, John and Doe, Jane in the 1976 concert.' -XMP-xmp:Rating=5 -tagsFromFile @ -XMP-dc:Subject -XMP-dc:Subject='Keyword 101;Keyword 100' -api NoDups=1 -execute '-Subject<${Subject;$_=join ";", sort {lc($a) cmp lc($b)} split ";"}' -common_args -m -P -overwrite_original_in_place -sep ';' image.jpg
    1 image files updated
    1 image files updated

exiftool -a -G1 -s -sep '//' -XMP:All image.jpg
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.71
[XMP-dc]        Description                     : Doe, John and Doe, Jane in the 1976 concert.
[XMP-dc]        Subject                         : Keyword 10//Keyword 100//Keyword 101//Keyword 11//Keyword 7//Keyword 8//keyword 8//Keyword 9//Keyword 98//Keyword 99
[XMP-dc]        Title                           : Marley, Bob
[XMP-photoshop] CaptionWriter                   : Harvey, Phil
[XMP-xmp]       Rating                          : 5

- Matti

StarGeek

#5
Quote from: wywh on January 11, 2024, 09:47:32 AMA remaining issue: is there an option to sort numbers as 8, 9, 10, 11, 98, 99, 100, 101 instead 10, 100, 101, 11, 7, 8, 9, 98, 99 etc:

Not easily as far as I can tell.  I found a routine that would sort them but it doesn't appear to be case-insensitive, even though it's supposed to be.

C:\>exiftool -P -overwrite_original -all= -subject="Test 3" -Subject="test 2" -subject="Test 1" -Subject="tESt 011"  y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Test 3, test 2, Test 1, tESt 011

C:\>exiftool -P -overwrite_original -sep "//" "-subject<${subject;$_=join '//', sort{natcomp} split '//'}" y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Test 1, Test 3, tESt 011, test 2

natcomp helper function can be added to your .ExifTool_config file with this code, before any %Image::ExifTool block.  Source

# Natural Sorting
# https://www.perlmonks.org/?node_id=540890
use List::Util qw(min);
sub natcomp {
    my @a = split /(\d+)/, $a;
    my @b = split /(\d+)/, $b;
    my $last = min(scalar @a, scalar @b)-1;
    my $cmp;
    for my $i (0 .. $last) {
        unless($i & 1) {  # even
            $cmp = lc $a[$i] cmp lc $b[$i] || $a[$i] cmp $b[$i] and return $cmp;
        }else {  # odd
            $cmp = $a[$i] <=> $b[$i] and return $cmp;
        }
    }
    return scalar @a <=> scalar @b;  # shortest array comes first
}

I like this one because it can sort a string with multiple numbers.  In the example on that page, it will properly sort "Chapter 1 Section 10" and "Chapter 1 Section 3".  Just have to get the case insensitive part working.

C:\>exiftool -P -overwrite_original -all= -subject="Chapter 1 Section 10" -subject="Test 3" -Subject="test 2" -subject="Test 1" -subject="Chapter 1 Section 3" -Subject="tESt 011"  y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -P -overwrite_original -sep "//" "-subject<${subject;$_=join '//', sort{natcomp} split '//'}" y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Chapter 1 Section 3, Chapter 1 Section 10, Test 1, Test 3, tESt 011, test 2
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

StarGeek

Hmmm...  It looks like the or || was making it case sensitive.

Use this code for the helper function seems to work
# Natural Sorting
# https://www.perlmonks.org/?node_id=540890
use List::Util qw(min);
sub natcomp {
    my @a = split /(\d+)/, $a;
    my @b = split /(\d+)/, $b;
    my $last = min(scalar @a, scalar @b)-1;
    my $cmp;
    for my $i (0 .. $last) {
        unless($i & 1) {  # even
            $cmp = lc $a[$i] cmp lc $b[$i] and return $cmp;
        }else {  # odd
            $cmp = $a[$i] <=> $b[$i] and return $cmp;
        }
    }
    return scalar @a <=> scalar @b;  # shortest array comes first
}

C:\>exiftool -P -overwrite_original -all= -subject="Chapter 1 Section 10" -subject="Test 3" -Subject="test 2" -subject="Test 1" -subject="Chapter 1 section 3" -Subject="tESt 011"  y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Chapter 1 Section 10, Test 3, test 2, Test 1, Chapter 1 section 3, tESt 011

C:\>exiftool -P -overwrite_original -sep "//" "-subject<${subject;$_=join '//', sort{natcomp} split '//'}" y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -subject y:\!temp\Test4.jpg
[XMP-dc]        Subject                         : Chapter 1 section 3, Chapter 1 Section 10, Test 1, test 2, Test 3, tESt 011

Yeah, I'm adding this to my collection of helper functions.  I like this a lot.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

wywh

Thanks for the effort!

I have not used config files so I first stubbornly tried the command without it and of course it did not work. I guess it is not possible to include that info to the command? So this did not make it yet:

exiftool -XMP-dc:Title='Marley, Bob' -XMP-photoshop:CaptionWriter='Harvey, Phil' -XMP-dc:Description='Doe, John and Doe, Jane in the 1976 concert.' -XMP-xmp:Rating=5 -tagsFromFile @ -XMP-dc:Subject -XMP-dc:Subject='Keyword 101;Keyword 100' -api NoDups=1 -execute '-Subject<${Subject;$_=join ";", sort{natcomp} split ";"}' -common_args -m -P -overwrite_original_in_place -sep ';' image.jpg
    1 image files updated
Warning: Bareword "natcomp" not allowed while "strict subs" in use for 'Subject' - image.jpg
    1 image files updated
   
exiftool -a -G1 -s -sep '//' -XMP:All image.jpg
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.71
[XMP-dc]        Description                     : Doe, John and Doe, Jane in the 1976 concert.
[XMP-dc]        Subject                         : Keyword 10//Keyword 11//Keyword 7//Keyword 8//keyword 8//Keyword 9//Keyword 98//Keyword 99//Keyword 101//Keyword 100
[XMP-dc]        Title                           : Marley, Bob
[XMP-photoshop] CaptionWriter                   : Harvey, Phil
[XMP-xmp]       Rating                          : 5

I then downloaded ExifTool_config, renamed it as .ExifTool_config in my home folder, copypasted and saved the '# Natural Sorting ... block' you posted below the last %Image::ExifTool block (after a line ending with ');') with BBEdit and saved.

There are some differences in these files -- I picked the first one:

https://raw.githubusercontent.com/exiftool/exiftool/master/config_files/example.config
https://exiftool.org/config.html

Then this works OK:

exiftool -XMP-dc:Title='Marley, Bob' -XMP-photoshop:CaptionWriter='Harvey, Phil' -XMP-dc:Description='Doe, John and Doe, Jane in the 1976 concert.' -XMP-xmp:Rating=5 -tagsFromFile @ -XMP-dc:Subject -XMP-dc:Subject='Keyword 101;Keyword 100' -api NoDups=1 -execute '-Subject<${Subject;$_=join ";", sort{natcomp} split ";"}' -common_args -m -P -overwrite_original_in_place -sep ';' image.jpg
    1 image files updated
    1 image files updated

exiftool -a -G1 -s -sep '//' -XMP:All image.jpg
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.71
[XMP-dc]        Description                     : Doe, John and Doe, Jane in the 1976 concert.
[XMP-dc]        Subject                         : Keyword 7//Keyword 8//keyword 8//Keyword 9//Keyword 10//Keyword 11//Keyword 98//Keyword 99//Keyword 100//Keyword 101
[XMP-dc]        Title                           : Marley, Bob
[XMP-photoshop] CaptionWriter                   : Harvey, Phil
[XMP-xmp]       Rating                          : 5

- Matti

StarGeek

If you want an absolutely bare minimal config file, then just save the code block as .ExifTool_config without anything else.

It is probably possible to put that code in-line, but that would be terribly messy. Using it as a helper function allows it to be used with any List-Type tag.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

That split /(\d+)/ is very smart.  I didn't realize you could capture the matching expressions in a split argument like this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

wywh

#10
Quote from: StarGeek on January 11, 2024, 01:04:38 PMbare minimal config file, then just save the code block as .ExifTool_config without anything else

Thanks for the info. If I use just that block in the bare minimal ~/.ExifTool_config, then I get the alert below. Anyways, the command works as intended.

If I paste the block before or after any %Image::ExifTool block in a larger downloaded example.config, then there is no alert.

High-ASCII umlauts do not seem to honor case insensitive sorting.

BTW to clear old Keywords, delete '-tagsFromFile @ -XMP-dc:Subject' part from the command, then start to fill the Keyword list again with the command below.

exiftool -XMP-dc:Title='Title' -XMP-photoshop:CaptionWriter='Author' -XMP-dc:Description='Description' -XMP-xmp:Rating=3 -tagsFromFile @ -XMP-dc:Subject -XMP-dc:Subject='Marley, Bob;Harvey, Phil;Ä11;Ä10;Ä9;Ä8;Z8' -api NoDups=1 -execute '-Subject<${Subject;$_=join ";", sort{natcomp} split ";"}' -common_args -m -P -overwrite_original_in_place -sep ';' image.jpg
/Users/matti/.ExifTool_config did not return a true value at /usr/local/bin/lib/Image/ExifTool.pm line 9377.
    1 image files updated
    1 image files updated

exiftool -a -G1 -s -sep '//' -XMP:All image.jpg
/Users/matti/.ExifTool_config did not return a true value at /usr/local/bin/lib/Image/ExifTool.pm line 9377.
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.71
[XMP-dc]        Description                     : Description
[XMP-dc]        Subject                         : //Harvey, Phil//Marley, Bob//z8//Z8//Ä8//Ä9//Ä10//Ä11//ä8
[XMP-dc]        Title                           : Title
[XMP-photoshop] CaptionWriter                   : Author
[XMP-xmp]       Rating                          : 3

- Matti

StarGeek

Quote from: wywh on January 12, 2024, 03:46:45 AMThanks for the info. If I use just that block in the bare minimal ~/.ExifTool_config, then I get the alert below.

Oh yeah.  Add 1; (OneSemicolon) as the very last line in the file. It should look like
    return scalar @a <=> scalar @b;  # shortest array comes first
}
1;

When I get a chance, I'm going to try and improve it so the -sep option isn't needed and simplify it so all that is needed is something like
${Subject;natcomp}
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

wywh

#12
Quote from: StarGeek on January 12, 2024, 11:01:23 AMOh yeah.  Add 1;

Oh thanks, that took care of it, little things mean a lot. I guess that is just a closing mark or something like that (1;  #end) that I did not notice.

BTW2 to refer my previous post, another way to clear old Keywords in that command would be to use just blank -XMP-dc:Subject='' but that would leave an ugly (although in practice invisible) that blank Keyword hanging around:

exiftool -XMP-dc:Title='Title' -XMP-photoshop:CaptionWriter='Author' -XMP-dc:Description='Description' -XMP-xmp:Rating=3 -tagsFromFile @ -XMP-dc:Subject -XMP-dc:Subject='Marley, Bob;Harvey, Phil' -api NoDups=1 -execute '-Subject<${Subject;$_=join ";", sort{natcomp} split ";"}' -common_args -m -P -overwrite_original_in_place -sep ';' image.jpg
    1 image files updated
    1 image files updated

[XMP-dc]        Subject                        : //Harvey, Phil//Marley, Bob

instead just:

[XMP-dc]        Subject                        : Harvey, Phil//Marley, Bob

- Matti


StarGeek

Quote from: wywh on January 12, 2024, 11:54:51 AMBTW2 to refer my previous post, another way to clear old Keywords in that command would be to use just blank -XMP-dc:Subject='' but that would leave an ugly (although in practice invisible) that blank Keyword hanging around:

Using -XMP-dc:Subject='' doesn't clear the Subject, it adds the 0-length keyword. That is why you have it in your output. You would use -XMP-dc:Subject= to clear the tag.

Afterwards, you can use
exiftool -Subject-= /path/to/files/
to clear 0-length entries.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

#14
-XMP-dc:Subject=''

on the command line is equivalent to:

-XMP-dc:Subject=

because the quotes are stripped by the command shell.

To write an empty subject you need to do this:

-XMP-dc:Subject^=

But I don't know how you got the empty element without trying unless natcomp generated it.

> exiftool a.jpg -subject=x -subject=y
    1 image files updated
> exiftool a.jpg -subject
Subject                         : x, y
> exiftool a.jpg -subject=x -subject="" -subject=y
    1 image files updated
> exiftool a.jpg -subject
Subject                         : y
> exiftool a.jpg -subject=x -subject^= -subject=y
    1 image files updated
> exiftool a.jpg -subject
Subject                         : x, , y

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).