selective removal of keywords?

Started by stonecherub, December 15, 2012, 07:08:17 PM

Previous topic - Next topic

Phil Harvey

You're right, my expression doesn't work as indended.  It is complicated by the fact that the list items are joined into a single string.  But you are in luck:  I have added a new (as-yet undocumented) feature to ExifTool 10.87 which allows the expression to work on individual list items by adding a "@" after the tag name.  So the command may be simplified to this with ExifTool 10.87:

exiftool "-Hierarchicalsubject<${Hierarchicalsubject@;/^gens\|.*/ ? $_=undef : s/[^\|]+\|//}" -sep ", " FILE

The regular expression syntax is explained in a number of places (here is one), but it is very powerful, so the documentation is very lengthy.

StarGeek often recommends Regular-Expressions.info as a site to learn about regular expressions.  And Regex101.com is a great site where you can test out your regex. 

You can test your regular expressions in ExifTool using the -p option before actually rewriting the file:

exiftool -p "-${Hierarchicalsubject@;/^gens\|.*/ ? $_=undef : s/[^\|]+\|//}" FILE

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

phweyland

Your formula works great, even with version 10.86 !
I get now the desired result:
Hierarchical Subject            : bâtiment|fenêtre, Brasil|Minas Gerais|Ipoema, portrait

I'll now try to do the same for Subject.

---- XMP-dc ----
Subject                         : Brasil, Mike, Ipoema, Minas Gerais, bâtiment, construction, famille, famille Beatrice, fenêtre, gens, places, portrait, style


Do you think that is possible to reuse the output of Hierarchical Subject ?
That would be the most generic way.

I should not be alone to feel lucky to be able use your tool ! Impressive tool indeed !
Thanks

Phil Harvey

You're right.  This feature was actually added in version 10.53, but it didn't work with the -p option until 10.87.

After you have edited HierarchicalSubject, you can write the components back to Subject like this:

exiftool "-subject<hierarchicalsubject" -sep "|" FILE

But this will have to be done in a separate command.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

phweyland

Sorry, I haven't checked everything.
After cleaning, ExifView shows this (which seems perfect):
Hierarchical Subject            : bâtiment|fenêtre, Brasil|Minas Gerais|Ipoema, portrait
(I don't see difference using -sep ", " or not)
But if I look at xmp data from xnview I see different lines before but only one line after (see attached files).
So I guess something is not perfect yet.
I'm using an argument file (also attached).


Phil Harvey

There is a difference if you don't add the -sep option.  Read FAQ 17 for details.  You need the -sep option to write it properly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

phweyland

I understand that -sep is needed but I would like to see the effect  :)
Directly on command line:
"C:\Program Files (x86)\Exiftool\exiftool.exe" -P -overwrite_original "-Hierarchicalsubject<${Hierarchicalsubject@;/^gens\|.*/ ? $_=undef : s/[^\|]+\|//}" -sep ", "
that works as I can see (as on attached file) the 3 records instead of the unique one.
But the same command in the argument file (attached in the previous post) produces a unique line, as if -sep ", " was ignored (unique record as shown on previous post snaphot).
I've tried to put  -sep ", "  before or after the command but that doesn't change anything.
-sep ", "
-XMP:Hierarchicalsubject<${Hierarchicalsubject@;/^gens\|.*/ ? $_=undef : s/[^\|]+\|//}

or
-XMP:Hierarchicalsubject<${Hierarchicalsubject@;/^gens\|.*/ ? $_=undef : s/[^\|]+\|//}
-sep ", "

The corresponding command line with argument file is:
"C:\Program Files (x86)\Exiftool\exiftool.exe" -k -@ "c:\Documents\Darktable\ExifWeb.txt"
Where is my mistake ?



StarGeek

From the docs on arg files

"The file contains one argument per line (NOT one option per line -- some options require additional arguments, and all arguments must be placed on separate lines). "

Move the CommaSpace to a seperate line and remove the double quotes.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

phweyland

Thank you StarGeek, I'd seen that but not understood properly  :-[
Works great now.
Philippe

phweyland

Just a quick update. Thanks to the interesting links you've shared, I've understood I could not iterate to get Subject for HierarchicalSubject in one pass.
As I wanted to use ExifTool integrated with darktable export, I've found simpler to use lua to prepare the data. Then Exiftool write them on exported images.
Works great !
Thank you again

phweyland

New update
Based on the first formula you gave me I've succeeded in getting, not only the HierarchicalSubject, but also the Subject.
Here are the formulas:
-XMP:Subject<${HierarchicalSubject;s/((gens|piwigo)\|[^,]+, |, (gens|piwigo)\|[^,]+|$)//g;s/(^|, )[^\|]+\|/$1/g;s/\|/, /g;NoDups}
-XMP:HierarchicalSubject<${HierarchicalSubject;s/((gens|piwigo)\|[^,]+, |, (gens|piwigo)\|[^,]+|$)//g;s/(^|, )[^\|]+\|/$1/g}

That removes the HierarchicalSubject starting by gens or piwigo.
That removes the first level of HierarchicalSubject like places, ...
That transforms the HierarchicalSubject in Subject.
And last that removes the duplicates.
What a show !
Thanks again for the tool and your quick support.

phweyland

Hi Phil,
Going forward I've a new question.
I'm setting some IPTC this way:
-IPTC:Country-PrimaryLocationName < ${HierarchicalSubject;s/^.*places\|([^\|,]*).*/$1/ or $_=undef}
-IPTC:Province-State < ${HierarchicalSubject;s/^.*places\|([^\|,]*)\|([^\|,]*).*/$2/ or $_=undef}
-IPTC:City < ${HierarchicalSubject;s/^.*places\|([^\|,]*)\|([^\|,]*)\|([^\|,]*).*/$3/ or $_=undef}
-IPTC:Sub-location < ${HierarchicalSubject;s/^.*places\|([^\|,]*)\|([^\|,]*)\|([^\|,]*)\|([^\|,]*).*/$4/ or $_=undef}

But the 4th level is rarely present and I get an empty tag:
City                            : Ipoema
Sub-location                    :
Province-State                  : Minas Gerais
Country-Primary Location Name   : Brasil

Is there a way no to create the tag in case of undef value ?
Thanks
Philippe

Phil Harvey

Hi Philippe,

Setting $_=undef will cause the tag not to be written.  Somehow your Sub-location pattern must be matching with $4 being an empty string for this to get set to an empty string.  This works for me on MacOS:

> exiftool a.jpg -hierarchicalsubject
Hierarchical Subject            : construction|bâtiment|fenêtre, gens|famille|famille Beatrice|Mike, places|Brasil|Minas Gerais|Ipoema, style|portrait
> exiftool a.jpg '-iptc:sub-location<${HierarchicalSubject;s/^.*places\|([^\|,]*)\|([^\|,]*)\|([^\|,]*)\|([^\|,]*).*/$4/ or $_=undef}'
Warning: [minor] Advanced formatting expression returned undef for 'HierarchicalSubject' - a.jpg
Warning: No writable tags set from a.jpg
    0 image files updated
    1 image files unchanged


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

phweyland

Thank you Phil for the answer.
The simulator (the site you sent me the reference) doesn't return anything, but I'll continue to investigate.
I'm using windows 10 64bit.

I've made a try with:
-IPTC:Sub-location < ${HierarchicalSubject;s/^.*places\|([^\|,]*)\|([^\|,]*)\|([^\|,]*)\|([^\|,]*).*/$4/}
the result is, instead of empty:
Sub-location                    : construction|bâtiment|fenêtre, gens|famille|famille Beatrice|Mike, piwigo|2010s|2018|03-10 Ipoema, places|Brasil|Minas Gerais|Ipoema
If I understand properly that means the match hasn't been found...

Phil Harvey

Correct.  So when you add " or $_=undef" when setting IPTC:Sub-location, then it shouldn't get written.

So I don't understand the problem.  Everything seems to be working properly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

phweyland

#29
Hi Phil,
I'm trying to find a way to add the IPTC Location tag on the 4th level of HierarchicalSubject below "places".
The following code returns a syntax error. I haven't found any example of such embedded {} but it's worth the try :)

exiftool -p "${HierarchicalSubject;s/(places\|[^\|,]*\|[^\|,]*\|[^\|,]*)/$1\|${XMP-iptcCore:Location}/}" D:\Documents\Images\Photos\2010\2017\20171118_Tiradentes\20171119_Tiradentes_004.xmp

Warning: syntax error for 'HierarchicalSubject' - D:/Documents/Images/Photos/2010/2017/20171118_Tiradentes/20171119_Tiradentes_004.xmp
Is there a way to achieve this ?
Thank you