Unnecessary duplication in MWG HierarchicalKeywords?

Started by Joanna Carter, May 22, 2021, 10:24:56 AM

Previous topic - Next topic

Joanna Carter

Hi.

I'm trying to ensure that my app writes hierarchical keywords correctly, using the MWG standard

Here are the arguments that I am passing to the OS process...

  - 0 : "-preserve"
  - 1 : "-ignoreMinorErrors"
  - 2 : "-overwrite_original_in_place"
  - 3 : "-mwg:keywords+=Didier"
  - 4 : "-mwg:keywords+=Gilbert"
  - 5 : "-mwg:keywords+=Joanna"
  - 6 : "-mwg:keywords+=Nounours"
  - 7 : "-mwg:keywords+=Surfers"
  - 8 : "-xmp-mwg-kw:hierarchicalkeywords+={keyword=Nounours,children={keyword=Didier}}"
  - 9 : "-xmp-mwg-kw:hierarchicalkeywords+={keyword=Nounours,children={keyword=Didier,children={keyword=Joanna}}}"
  - 10 : "-xmp-mwg-kw:hierarchicalkeywords+={keyword=Nounours,children={keyword=Gilbert}}"
  - 11 : "/Users/joannacarter/Pictures/_HLN0032.NEF"

... and here is the output from ExifTool...


joannacarter@MacBookPro Pictures % exiftool -xmp:all _HLN0032.NEF                           
XMP Toolkit                     : Image::ExifTool 12.11
Subject                         : Surfers, Joanna, Nounours, Gilbert, Didier
Hierarchical Subject            : Nounours|Didier, Nounours|Didier|Joanna, Nounours|Gilbert
Creator Tool                    : NIKON D810 Ver.1.12
joannacarter@MacBookPro Pictures % exiftool -xmp -b _HLN0032.NEF > _HLN0032.xmp
joannacarter@MacBookPro Pictures % exiftool -xmp:all _HLN0032.NEF             
XMP Toolkit                     : Image::ExifTool 12.11
Subject                         : Didier, Gilbert, Joanna, Nounours, Surfers
Hierarchical Subject            : Nounours|Didier, Nounours|Didier|Joanna, Nounours|Gilbert
Hierarchical Keywords 2         : Didier, Didier, Gilbert
Hierarchical Keywords 1         : Nounours, Nounours, Nounours
Hierarchical Keywords 3         : Joanna
Creator Tool                    : NIKON D810 Ver.1.12


Notice the repeated keywords in levels 1 and 2. Is this correct, or should I be trying to avoid this?

If I add in the lr hierarchicalSubject, I get similar repetition for that, as can be seen from an XMP file, output using ExifTool...


<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 12.11'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

<rdf:Description rdf:about=''
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:subject>
   <rdf:Bag>
    <rdf:li>Didier</rdf:li>
    <rdf:li>Gilbert</rdf:li>
    <rdf:li>Joanna</rdf:li>
    <rdf:li>Nounours</rdf:li>
    <rdf:li>Surfers</rdf:li>
   </rdf:Bag>
  </dc:subject>
</rdf:Description>

<rdf:Description rdf:about=''
  xmlns:lr='http://ns.adobe.com/lightroom/1.0/'>
  <lr:hierarchicalSubject>
   <rdf:Bag>
    <rdf:li>Nounours|Didier</rdf:li>
    <rdf:li>Nounours|Didier|Joanna</rdf:li>
    <rdf:li>Nounours|Gilbert</rdf:li>
   </rdf:Bag>
  </lr:hierarchicalSubject>
</rdf:Description>

<rdf:Description rdf:about=''
  xmlns:mwg-kw='http://www.metadataworkinggroup.com/schemas/keywords/'>
  <mwg-kw:Keywords rdf:parseType='Resource'>
   <mwg-kw:Hierarchy>
    <rdf:Bag>
     <rdf:li rdf:parseType='Resource'>
      <mwg-kw:Children>
       <rdf:Bag>
        <rdf:li rdf:parseType='Resource'>
         <mwg-kw:Keyword>Didier</mwg-kw:Keyword>
        </rdf:li>
       </rdf:Bag>
      </mwg-kw:Children>
      <mwg-kw:Keyword>Nounours</mwg-kw:Keyword>
     </rdf:li>
     <rdf:li rdf:parseType='Resource'>
      <mwg-kw:Children>
       <rdf:Bag>
        <rdf:li rdf:parseType='Resource'>
         <mwg-kw:Children>
          <rdf:Bag>
           <rdf:li rdf:parseType='Resource'>
            <mwg-kw:Keyword>Joanna</mwg-kw:Keyword>
           </rdf:li>
          </rdf:Bag>
         </mwg-kw:Children>
         <mwg-kw:Keyword>Didier</mwg-kw:Keyword>
        </rdf:li>
       </rdf:Bag>
      </mwg-kw:Children>
      <mwg-kw:Keyword>Nounours</mwg-kw:Keyword>
     </rdf:li>
     <rdf:li rdf:parseType='Resource'>
      <mwg-kw:Children>
       <rdf:Bag>
        <rdf:li rdf:parseType='Resource'>
         <mwg-kw:Keyword>Gilbert</mwg-kw:Keyword>
        </rdf:li>
       </rdf:Bag>
      </mwg-kw:Children>
      <mwg-kw:Keyword>Nounours</mwg-kw:Keyword>
     </rdf:li>
    </rdf:Bag>
   </mwg-kw:Hierarchy>
  </mwg-kw:Keywords>
</rdf:Description>

<rdf:Description rdf:about=''
  xmlns:xmp='http://ns.adobe.com/xap/1.0/'>
  <xmp:CreateDate>2021-02-26T12:08:40.79</xmp:CreateDate>
  <xmp:CreatorTool>NIKON D810 Ver.1.12     </xmp:CreatorTool>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>

StarGeek

The big problem is figuring out how is the data supposed to be interpreted.  I have yet to find a program that reads the KeywordInfo structure, so it's hard to know if that is the correct way to interpret the structure.  If you know of one, please let me know.

I previously made a config file to convert HierarchicalSubject into MWG KeywordInfo, but now, after reading your post, I'm not sure if I did it properly.

I created a hierarchy in Adobe Bridge based upon what I think you're setting up.  Let me know if it's wrong

Bridge writes the following
Nounours
Nounours|Didier
Nounours|Didier|Joanna
Nounours|Gilbert


Since KeywordInfo is a structure, it's best to look at the full structure and not the flattened keywords, as they can be really misleading as to the hierarchy.  Here's the formatted result of your KeywordInfo (tabs are much wider here than on my computer)
{
Hierarchy=[
{
Keyword=Nounours
Children=[
{
Keyword=Didier
}
],
},
{
Keyword=Nounours
Children=[
{
Keyword=Didier
Children=
[
{
Keyword=Joanna
}
],
}
],
},
{
Keyword=Nounours
Children=[
{
Keyword=Gilbert
}
],
}
]
}


I then took the HierarchicalSubject I created in Bridge, and ran my conversion config on it.  This is the result
{
Hierarchy=[
{
Keyword=Nounours
Children=[
{
Keyword=Gilbert
},
{
Keyword=Didier
Children=[
{
Keyword=Joanna
}
],
}
],
}
]
}


When I wrote my code, I went out of my way to make sure and condense the structure before writing it.  But since Bridge will write all levels of the hierarchy as separate entries, I'm thinking the result of my code is only a two entries, "Nounours|Gilbert" and "Nounours|Didier|Joanna".

To summarize, I really don't know at this point :D  I'm starting to think that your setup might be the proper way with one change.  I think that you need a separate entry for "Nounours" by itself with no children.
-xmp-mwg-kw:hierarchicalkeywords+={keyword=Nounours}
which would give you
HierarchicalKeywords1           : Nounours, Nounours, Nounours,Nounours
at the end.

I'll dig through the MWG specs in a bit (have some errands to run), so I'll get back to you later today or tomorrow.  Unless someone beats me to it.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

StarGeek

I'm still trying to figure the the specs, which starts on page 60.  The PDF can be found here on Archive.org.  But I'm latching on to one particular paragraph

QuoteA Changer...
MUST write the XMP dc:subject property to store the individual keywords. Hierarchical
path elements MUST be flattened, which means that each hierarchy node needs to be
stored as a separate keyword entry to XMP dc:subject.

So, each node is supposed to be written out to Subject which seems to suggest to me that one hierarchy is supposed to cover it.

Additionally,
Quotemwg-kw:Applied
True if this keyword has been applied, False otherwise. If
missing, mwg-kw:Applied is presumed True for leaf nodes
and False for ancestor nodes.

So if that keyword in the hierarchy is selected, then the HierarchicalKeywords#Applied that matches that node should be True.

A revised formatted hierarchy with all nodes selected would look like this, I think
{
Hierarchy=[
{
Keyword=Nounours,
Applied=True,
Children=[
{
Keyword=Gilbert,
Applied=True
},
{
Keyword=Didier,
Applied=True,
Children=[
{
Keyword=Joanna,
Applied=True
}
]
}
]
}
]
}

And writing the whole thing would be
exiftool -KeywordInfo="{Hierarchy=[{Keyword=Nounours,Applied=True,Children=[{Keyword=Gilbert,Applied=True},{Keyword=Didier,Applied=True,Children=[{Keyword=Joanna,Applied=True}]}]}]}" file.jpg

If anyone else reads over the specs and interprets it differently, let me know.

If this interpretation is correct, it does mean I'll have to revisit my config file to add in the Applied tags.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Joanna Carter

Thank you for your insights, they have helped me understand more of what goes on.

I now add the top level keywords into the hierarchies, so I now end up with...


joannacarter@MacBookPro Pictures % exiftool -xmp:all _HLN0032.NEF
XMP Toolkit                     : Image::ExifTool 12.11
Description                     : Electricity
Subject                         : Nounours, Didier, Joanna, Barbecue, Claudine, Surfers
Hierarchical Subject            : Barbecue, Nounours, Barbecue|Claudine, Nounours|Didier, Nounours|Didier|Joanna
Hierarchical Keywords 1         : Barbecue, Nounours, Barbecue, Nounours, Nounours
Hierarchical Keywords 2         : Claudine, Didier, Didier
Hierarchical Keywords 3         : Joanna
Creator Tool                    : NIKON D810 Ver.1.12


... which, despite the repeated repeated top level words, works better with software that reads the xmp files I am generating.

Quote
Additionally,
Quote
mwg-kw:Applied
True if this keyword has been applied, False otherwise. If
missing, mwg-kw:Applied is presumed True for leaf nodes
and False for ancestor nodes.
So if that keyword in the hierarchy is selected, then the HierarchicalKeywords#Applied that matches that node should be True.


And if I read that right, that means that, unless you want msg-kw:Applied to explicitly false, you don't have to add it.

On a related note, if I want to reassign the hierarchy with a shorter one, would I be right in assuming I have to delete the old one first?

If I simply assign a new, shorter hierarchy, it leaves the last node from the longer one in place. In theory, sending a delete is a lot easier than trying to find the offending detritus and zapping it.

StarGeek

Quote from: Joanna Carter on May 23, 2021, 02:21:20 AM
I now add the top level keywords into the hierarchies, so I now end up with...
<snip>
... which, despite the repeated repeated top level words, works better with software that reads the xmp files I am generating.

While I think the more compact way of doing it is what the specs say, the only thing that matters is that your software can read it correctly.  As long as that happens, who cares if there's extra stuff.

May I ask what the software is that is reading the MWG keyword structure? Is it commercial or an in-house program?

Also, yay ellipses!  ...  ...  ... I think that's the first time I've ever seen someone else use the single character ellipses.  I have a text replacement program that does it for me.

Quote
And if I read that right, that means that, unless you want msg-kw:Applied to explicitly false, you don't have to add it.

Any Ancestor node that you want checked would need Applied to be set to True, but the way you are doing it might work, though you would have to check with your software.  And any Leaf node that, for some reason, you wanted to be included but not checked, would have to be set to false.  The latter should hopefully be very uncommon.

QuoteOn a related note, if I want to reassign the hierarchy with a shorter one, would I be right in assuming I have to delete the old one first?

If I simply assign a new, shorter hierarchy, it leaves the last node from the longer one in place. In theory, sending a delete is a lot easier than trying to find the offending detritus and zapping it.

As with any list type tag, exiftool won't automatically consolidate or remove data unless explicitly told to do so. The KeywordInfo is very complex and personally, I would probably rebuild the entire thing for any changes.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Joanna Carter

QuoteMay I ask what the software is that is reading the MWG keyword structure? Is it commercial or an in-house program?

It will be a commercial app for Mac.

QuoteI would probably rebuild the entire thing for any changes.

That's what I ended up doing

Many thanks for your thoughts