Sort comma separated Keywords, default separator format

Started by wywh, May 14, 2023, 06:28:34 AM

Previous topic - Next topic

wywh

- GraphicConverter 12 can sort image and movie keywords on-the-fly via its IPTC/XMP dialog and it also alerts about duplicate keywords. There is a minor cosmetic issue in image vs movie keyword separator format, see below (*).

But sometimes I want to sort keywords in BBEdit or when pasting to exiftool because spotting typos and duplicates is easier with a sorted list.

BBEdit has sort lines built-in but it seems sorting comma separated (.csv) rows is not, so it must be added as a "Text Filter" perl command, for example.

I found a perl command below that seems to work quite well in BBEdit with a single row.

-> Do perl wizards have any comments or improvements for this task?

#!/usr/bin/perl

while ( chomp($line = <> )) {
    @words = split( ', *', $line );
    map { ( $words{$_} = $_ ) =~ s|^\d*||; } @words;
    print join( ', ', sort {$words{$a} cmp $words{$b}} keys %words ) . "\n";
}

__END__

Put that text .pl file (attached) to ~/Library/Application Support/BBEdit/Text Filters/sort_row_comma.pl
Use it via BBEdit > Text > Apply Text Filter > sort_row_comma
Origin: Matt Martini https://groups.google.com/g/bbedit/c/WMXg19OzpY0/m/vB-1vbiCVmIJ

- (*) In GC in images there is a space between IPTC/XMP Keywords ", ". But in movie Keys there is only a comma ",".

exiftool -a -G1 -s .
[Keys]          Keywords                        : Isla,Kino,Näsi,Uppa
[IPTC]          Keywords                        : Isla, Kino, Näsi, Uppa
[XMP-dc]        Subject                         : Isla, Kino, Näsi, Uppa

exiftool readily inserts either format. I guess ", " is the default?

I asked GC author if they could be written in the same ", " format for easy reading. AFAIK GC uses exiftool for movie metadata because Apple's tools had some issues. He replied that the space is added by the exiftool iteration of the single keywords so he has no control over that. This is not a big issue but I might pester him later about this if there was some misunderstanding in our communication.

regards,

- Matti

StarGeek

Quote from: wywh on May 14, 2023, 06:28:34 AM-> Do perl wizards have any comments or improvements for this task?

Phil would know better but that seems nice and compact to me.  But I'm not the expert.

Quote- (*) In GC in images there is a space between IPTC/XMP Keywords ", ". But in movie Keys there is only a comma ",".
...
exiftool readily inserts either format. I guess ", " is the default?

  From the docs on the -sep option
    When reading, the default is to join list items with ", "

Because IPTC:Keywords and XMP:Subject are list type tags (see FAQ #17), they will display with ", " as the separator and that can be changed with the -sep option.

But Keys:Keywords is not a list type tag.  It is a simple string. So there isn't a separator except what various programs decide to use as one.  A comma is probably the defacto standard, but you can't necessarily rule out something else.  It is completely up to the program that reads/writes the data as to what to use

QuoteHe replied that the space is added by the exiftool iteration of the single keywords so he has no control over that

To clarify further, there is no actual space in the data.  Entries in list type tags are completely separate.  See this post which shows the raw XMP of the XMP:Subject tag.

ETA: You could always use -sep , to make exiftool show the list type tags the same way as the Keys:Keywords is listed and adjust your script to use that instead.

ETA2:  actually, it looks like your script already accounts for either one. You're splitting on ', *', which would be on 1 comma followed by 0 or more spaces.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

wywh

Quote from: StarGeek on May 14, 2023, 11:27:12 AMIPTC:Keywords and XMP:Subject are list type tags (see FAQ #17), they will display with ", " as the separator and that can be changed with the -sep option.

But Keys:Keywords is not a list type tag.  It is a simple string. So there isn't a separator

Thanks for the explanation!

So to set list-type tag output the same as string tag, this sets the columns straight:

exiftool -a -G1 -s -Keywords -Subject -sep ',' .
[Keys]          Keywords                        : Isla,Kino,Näsi,Uppa
[IPTC]          Keywords                        : Isla,Kino,Näsi,Uppa
[XMP-dc]        Subject                         : Isla,Kino,Näsi,Uppa

After digging deeper I now see the difference between string and list tags:

exiftool -a -G1 -s -Keywords -Subject -v1 movie.mp4
  | | Keys (SubDirectory) -->
  | | + [Keys directory]
  | | | Added ItemList Tag 1.1 = (mdta) keywords
  | | | Added ItemList Tag 1.2 = (mdta) rating.user
  | | ItemList (SubDirectory) -->
  | | + [ItemList directory]
  | | | Keywords = Isla,Kino,N..si,Uppa
[Keys]          Keywords                        : Isla,Kino,Näsi,Uppa

exiftool -a -G1 -s -Keywords -Subject -v1 image.jpg
  | IPTCData (SubDirectory) -->
  | + [IPTC directory, 52 bytes]
  | | CurrentIPTCDigest = .1...../..y....D
  | | -- IPTCEnvelope record --
  | | CodedCharacterSet = .%G
  | | -- IPTCApplication record --
  | | ApplicationRecordVersion = 2
  | | Keywords = Isla
  | | Keywords = Kino
  | | Keywords = N..si
  | | Keywords = Uppa
JPEG APP1 (2665 bytes):
  + [XMP directory, 2636 bytes]
  | XMPToolkit = XMP Core 6.0.0
  | Subject = Isla
  | Subject = Kino
  | Subject = N..si
  | Subject = Uppa
[IPTC]          Keywords                        : Isla, Kino, Näsi, Uppa
[XMP-dc]        Subject                         : Isla, Kino, Näsi, Uppa

...I am learning by osmosis...

- Matti

wywh

It seems impossible to have a comma inside a string type keyword, as far as the latest GraphicConverter is concerned.

I tried to write commas using '; ' as a separator:

exiftool -overwrite_original -sep "; " -Keys:Keywords='Smith, John; Doe, Jane; Williams, Mark A., III' -IPTC:Keywords='Smith, John; Doe, Jane; Williams, Mark A., III' .
...so far it seems to work:

exiftool -a -G1 -s -sep '; ' -Keywords -Subject .
======== ./image.jpg
[IPTC]          Keywords                        : Smith, John; Doe, Jane; Williams, Mark A., III
======== ./movie.mp4
[Keys]          Keywords                        : Smith, John; Doe, Jane; Williams, Mark A., III

GraphicConverter correctly displays those three IPTC:Keywords.

But for Keys:Keywords it lists seven Keywords (GC displays ; as a separator in its GUI, and also shows the number of keywords). I unsuccessfully also tried -sep ',, ' and '.' (and left '.' out from the last keyword) but GC failed to use them as a separator in Keys:Keywords.

So maybe it is best to avoid commas in such movie Keywords.

- Matti

StarGeek

Quote from: wywh on May 14, 2023, 01:12:58 PMAfter digging deeper I now see the difference between string and list tags:

Ahh, using the -v (-verbose) option to show the difference.  I hadn't thought of that before.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).