Delete Subject/Keywords After N entries

Started by Stephen Marsh, February 24, 2020, 06:58:47 AM

Previous topic - Next topic

Stephen Marsh

I have been trying to adapt some of the commands from this topic without any success:

truncate exceeding keywords 
https://exiftool.org/forum/index.php?topic=8674.msg44527#msg44527

The goal is to retain the first N keywords, where N could be 6 or 33 or 50 etc. To keep it easy, let's say that the goal is to retain the first 6 keywords and delete everything else.

[XMP-dc] Subject : 1, 2, 3, 4, 5, six, 7, 8, nine, 10

The result should be:

1, 2, 3, 4, 5, six

I was going to try to use a regex, but that was my first mistake! It appears that the split command is better, however, the code is not intuitive for me.

Thank you!

greybeard

From the other topic - this should work for your example:

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;my @b=splice(@a,0,8);push @b,splice(@a,rand(@a),1) while @a;$_=join "qqq",@b}' FILE

(You may have to switch single and double quotes depending on your platform)

Phil Harvey

I think greybeard's command can be simplified:

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;return join "qqq",@a' FILE

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

#3
Hmm - that didn't work for me

I assume you missed the closing curly bracket so I tried

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;return join "qqq",@a}'  FILE

but that didn't truncate the Subject

Phil Harvey

You're right, sorry.  Two mistakes in my command.  This one should work:

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;$_ = join "qqq",@a}' FILE

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

Thanks - I continue to learn.

Just to point out to the OP that this won't work properly if any of the keywords that should be retained end in a "q".

That may have been one of the problems in attempting to adapt the command from the original topic.

Phil Harvey

Right.  I didn't think about what happens if a keyword ends in 'q'.  (ie. "wordq, word2" would join to "wordqqqqword2" then be split back to "word, qword2".)  You could replace "qqq" with something else (in 3 places in the command) to avoid this.  Usually I use something like "##".  But if you are going to use letters, using 3 different uncommon letters would be better, something like "qxw" for example.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Thank you Phil and greybeard...

I would have thought that I would need to have enclosed the first sep in quotes, however I didn't need to... Testing it appeared to work both ways. I like the quotes for clarity though.

As the input delimiter is verified to be a comma, I don't understand why the suggestion is to use another delimiter such as qqq or ### etc? I tested using the comma separator and all apparently worked as intended.

Rather than combining the two separate commands for Subject and Keywords, I tested with the MWG option and it also appeared to work as intended:

exiftool -sep ',' '-MWG:Keywords<${MWG:Keywords;my @a=split /,/,$val;$#a>7 and $#a=7;$_ = join ",",@a}' FILE

It appears that if one wishes to retain the first 8 keywords, then one has to use 7 as the value for the split, so always enter in a value of one less than what is required appears to be how the split works?

Thank you for your time!

StarGeek

Quote from: Stephen Marsh on February 24, 2020, 07:47:25 PM
I would have thought that I would need to have enclosed the first sep in quotes, however I didn't need to...

It depends upon what you use.  Some characters will have special meaning in the various types of command line and would act oddly without quotes.

QuoteAs the input delimiter is verified to be a comma, I don't understand why the suggestion is to use another delimiter such as qqq or ### etc?

Tags such as Keywords/Subject are not delimited by commas. They are lists.  Each entry is completely separate from the others.  Exiftool, as well as many other programs, will display them as comma separated to make it easier to understand.

One reason not to use a comma would be if a comma was included in one of the keywords.  For example, if the keywords included names of people in the image in the LastName, FirstName format, using -sep "," would split the single keyword, say "Smith, John", into two separate keywords, "Smith" and "John".  The advantage of using an extremely unlikely sequence of characters means you don't have to worry about what data is embedded in the file.

Basically, using a comma is fine, as long as you are sure that there are no keywords that would be affected by using it.

QuoteIt appears that if one wishes to retain the first 8 keywords, then one has to use 7 as the value for the split, so always enter in a value of one less than what is required appears to be how the split works?

What split is doing is creating an array.  In most programming languages, arrays start at 0.  So 0-7 equals the first 8 keywords.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Stephen Marsh

Thank you for the reply StarGeek, it all makes sense!

I was under the impression that we had to check for the delimiter in use, however I can now see that in that other topic Phil was just trying to verify the data first.

<dc:subject>
            <rdf:Bag>
               <rdf:li>1</rdf:li>
               <rdf:li>2</rdf:li>
               <rdf:li>3</rdf:li>
               <rdf:li>4</rdf:li>
               <rdf:li>5</rdf:li>
               <rdf:li>six</rdf:li>
               <rdf:li>7</rdf:li>
               <rdf:li>eight</rdf:li>
            </rdf:Bag>
         </dc:subject>

Which ExifTool returns as:

[XMP-dc] Subject : 1, 2, 3, 4, 5, six, 7, eight

Which is just a "visual" - got it!

This now makes sense why any delimiter can be used in the code, as it is not "matching up" with anything in the file data.