ExifTool Forum

ExifTool => The "exiftool" Application => Topic started by: Stephen Marsh on February 24, 2020, 06:58:47 AM

Title: Delete Subject/Keywords After N entries
Post by: Stephen Marsh on February 24, 2020, 06:58:47 AM
I have been trying to adapt some of the commands from this topic without any success:

truncate exceeding keywords 
https://exiftool.org/forum/index.php?topic=8674.msg44527#msg44527 (https://exiftool.org/forum/index.php?topic=8674.msg44527#msg44527)

The goal is to retain the first N keywords, where N could be 6 or 33 or 50 etc. To keep it easy, let's say that the goal is to retain the first 6 keywords and delete everything else.

[XMP-dc] Subject : 1, 2, 3, 4, 5, six, 7, 8, nine, 10

The result should be:

1, 2, 3, 4, 5, six

I was going to try to use a regex, but that was my first mistake! It appears that the split command is better, however, the code is not intuitive for me.

Thank you!
Title: Re: Delete Subject/Keywords After N entries
Post by: greybeard on February 24, 2020, 07:41:32 AM
From the other topic - this should work for your example:

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;my @b=splice(@a,0,8);push @b,splice(@a,rand(@a),1) while @a;$_=join "qqq",@b}' FILE

(You may have to switch single and double quotes depending on your platform)
Title: Re: Delete Subject/Keywords After N entries
Post by: Phil Harvey on February 24, 2020, 07:46:24 AM
I think greybeard's command can be simplified:

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;return join "qqq",@a' FILE

- Phil
Title: Re: Delete Subject/Keywords After N entries
Post by: greybeard on February 24, 2020, 08:01:23 AM
Hmm - that didn't work for me

I assume you missed the closing curly bracket so I tried

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;return join "qqq",@a}'  FILE

but that didn't truncate the Subject
Title: Re: Delete Subject/Keywords After N entries
Post by: Phil Harvey on February 24, 2020, 08:32:45 AM
You're right, sorry.  Two mistakes in my command.  This one should work:

exiftool -sep qqq '-Subject<${subject;my @a=split /qqq/,$val;$#a>5 and $#a=5;$_ = join "qqq",@a}' FILE

- Phil
Title: Re: Delete Subject/Keywords After N entries
Post by: greybeard on February 24, 2020, 09:00:50 AM
Thanks - I continue to learn.

Just to point out to the OP that this won't work properly if any of the keywords that should be retained end in a "q".

That may have been one of the problems in attempting to adapt the command from the original topic.
Title: Re: Delete Subject/Keywords After N entries
Post by: Phil Harvey on February 24, 2020, 09:09:20 AM
Right.  I didn't think about what happens if a keyword ends in 'q'.  (ie. "wordq, word2" would join to "wordqqqqword2" then be split back to "word, qword2".)  You could replace "qqq" with something else (in 3 places in the command) to avoid this.  Usually I use something like "##".  But if you are going to use letters, using 3 different uncommon letters would be better, something like "qxw" for example.

- Phil
Title: Re: Delete Subject/Keywords After N entries
Post by: Stephen Marsh on February 24, 2020, 07:47:25 PM
Thank you Phil and greybeard...

I would have thought that I would need to have enclosed the first sep in quotes, however I didn't need to... Testing it appeared to work both ways. I like the quotes for clarity though.

As the input delimiter is verified to be a comma, I don't understand why the suggestion is to use another delimiter such as qqq or ### etc? I tested using the comma separator and all apparently worked as intended.

Rather than combining the two separate commands for Subject and Keywords, I tested with the MWG option and it also appeared to work as intended:

exiftool -sep ',' '-MWG:Keywords<${MWG:Keywords;my @a=split /,/,$val;$#a>7 and $#a=7;$_ = join ",",@a}' FILE

It appears that if one wishes to retain the first 8 keywords, then one has to use 7 as the value for the split, so always enter in a value of one less than what is required appears to be how the split works?

Thank you for your time!
Title: Re: Delete Subject/Keywords After N entries
Post by: StarGeek on February 24, 2020, 09:11:45 PM
Quote from: Stephen Marsh on February 24, 2020, 07:47:25 PM
I would have thought that I would need to have enclosed the first sep in quotes, however I didn't need to...

It depends upon what you use.  Some characters will have special meaning in the various types of command line and would act oddly without quotes.

QuoteAs the input delimiter is verified to be a comma, I don't understand why the suggestion is to use another delimiter such as qqq or ### etc?

Tags such as Keywords/Subject are not delimited by commas. They are lists.  Each entry is completely separate from the others.  Exiftool, as well as many other programs, will display them as comma separated to make it easier to understand.

One reason not to use a comma would be if a comma was included in one of the keywords.  For example, if the keywords included names of people in the image in the LastName, FirstName format, using -sep "," would split the single keyword, say "Smith, John", into two separate keywords, "Smith" and "John".  The advantage of using an extremely unlikely sequence of characters means you don't have to worry about what data is embedded in the file.

Basically, using a comma is fine, as long as you are sure that there are no keywords that would be affected by using it.

QuoteIt appears that if one wishes to retain the first 8 keywords, then one has to use 7 as the value for the split, so always enter in a value of one less than what is required appears to be how the split works?

What split is doing is creating an array.  In most programming languages, arrays start at 0.  So 0-7 equals the first 8 keywords.
Title: Re: Delete Subject/Keywords After N entries
Post by: Stephen Marsh on February 24, 2020, 09:51:18 PM
Thank you for the reply StarGeek, it all makes sense!

I was under the impression that we had to check for the delimiter in use, however I can now see that in that other topic Phil was just trying to verify the data first.

<dc:subject>
            <rdf:Bag>
               <rdf:li>1</rdf:li>
               <rdf:li>2</rdf:li>
               <rdf:li>3</rdf:li>
               <rdf:li>4</rdf:li>
               <rdf:li>5</rdf:li>
               <rdf:li>six</rdf:li>
               <rdf:li>7</rdf:li>
               <rdf:li>eight</rdf:li>
            </rdf:Bag>
         </dc:subject>

Which ExifTool returns as:

[XMP-dc] Subject : 1, 2, 3, 4, 5, six, 7, eight

Which is just a "visual" - got it!

This now makes sense why any delimiter can be used in the code, as it is not "matching up" with anything in the file data.