Conditional deletion from dc:subject

Started by brunos, April 03, 2020, 12:27:04 PM

Previous topic - Next topic

brunos

Hi all,

I've got the following situation: many JPGs contain in Subject field one or more of the following items in the shown syntax:

aN0M Name Surname String1 String2
aT0F IdString3 IdString4
sT0F IdString5
...

and so on. In XMP it looks like:
         <dc:subject>
            <rdf:Bag>
               <rdf:li>aN0F Ivana Giganti</rdf:li>
               <rdf:li>aN0M Bruno Stivicevic</rdf:li>
               <rdf:li>at0F LAV CN01</rdf:li>
               <rdf:li>ritrattiattivisti</rdf:li>
            </rdf:Bag>
         </dc:subject>

The number of items in <dc:subject> vary from one to ten or more. Some items have fixed prefixes, and other do not have. The prefixes (aN0M,at0M etc), where exist, vary, but there's a rule: all prefixes have 4 chars followed by a space, and the discriminating char is the 2nd char that could be "N" or "n" or "T" or "t".

Now, I need to process all JPGs and the desired outcome is that
- all the subject items prefixed so that the 2nd char is "N" or "n" and the 5th char is a space, independently of which is the 1st char, 3rd char and 4th char, have to be kept;
- all the items without a prefix, e.g. the 5th char is not a space have to be kept;
- all other items in subject have to be removed

The above example would lose the original 3rd line (at0F LAV CN01) as it had the 2nd char "not N or n", and the 5th char was a space, while other lines will be kept. New XMP would be:
         <dc:subject>
            <rdf:Bag>
               <rdf:li>aN0F Ivana Giganti</rdf:li>
               <rdf:li>aN0M Bruno Stivicevic</rdf:li>
               <rdf:li>ritrattiattivisti</rdf:li>
            </rdf:Bag>
         </dc:subject>

Thanks to your help I learned so far how to do simple replacements (comma with period, space with underscore) and adding, but this conditional one seems to me much more complicated...  Please help when you can!

Kindest regards
Bruno

Phil Harvey

Hi Bruno,

Maybe something like this:

exiftool "-subject<${subject@;$_=undef if /^\w[^nN]\w\w /}" -sep ", " DIR

This will delete any Subject items that start with a "word" character followed by anything but "n" or "N", followed by 2 more word characters and a space.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

brunos

wow!!! Thanks Phil, I will try it!!!

Kindest regards
Bruno

brunos

The only word that describes Exiftool and its creator is: Miraculous!

Thanks a million,
Bruno

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

brunos

Hmmm... it worked for about 23 thousands JPGs, but I've found some on which it didn't work - the at0F and at0M tags remained. Don't understand why. I attached one of those. The version is 11.93.

When I check the subject, it appears:
C:\Users\Bruno\Desktop\tmpfolder>exiftool -filename -subject img_1980.jpg
File Name                       : img_1980.jpg
Subject                         : at0F Correzzana 2012 17, at0F Correzzana 2012 18

When I perform the removal command, it returns undef:
C:\Users\Bruno\Desktop\tmpfolder>exiftool "-subject<${subject@;$_=undef if /^\w[^nN]\w\w /}" -sep ", " -ext jpg .
Warning: [minor] Advanced formatting expression returned undef for 'subject' - ./IMG_1980.JPG
Warning: No writable tags set from ./IMG_1980.JPG

When I re-check tags, they remained untouched:
C:\Users\Bruno\Desktop\tmpfolder>exiftool -filename -subject img_1980.jpg
File Name                       : img_1980.jpg
Subject                         : at0F Correzzana 2012 17, at0F Correzzana 2012 18

There are about 120 such photos and I can easily remove those tags manually, but it's weird, isn't it? Is it because they are only tags and without them the subject would be removed?

Kindest regards
Bruno

Phil Harvey

Right.  This is the case where all items are being deleted.  Adding the -m option will ignore this minor warning and cause the list to be cleared, but the Subject tag won't be deleted (it will remain with a single, empty entry).

I hope this is good enough.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Ah, no.  I have a better solution.  Instead of using the -m option, try this:

exiftool -subject= "-subject<${subject@;$_=undef if /^\w[^nN]\w\w /}" -sep ", " DIR

This will delete the Subject tag if all items are removed.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

brunos

thanks again, will try it ASAP!

Kindest regards
Bruno

brunos

nope, the tags still survive! But it doesn't matter, I'm almost done with removing the remainders in the main archive by hand... thanks again!!!

Phil Harvey

Quote from: brunos on April 05, 2020, 08:35:05 AM
nope, the tags still survive!

I respectfully disagree.  Did you put -subject= before the other arguments in my last command?

I tested this with the file you posted and it works.

But doing the rest by hand is fine.  However, if you have to reprocess thousands again you can use this new command.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

brunos

I did the test again and you are right: the tags are gone!

Have no idea why it looked like to me that the tags were still there. Surely I messed up something with the test set.

I apologize!

Kindest regards
Bruno