Keywords and hierarchies

Started by Joanna Carter, April 29, 2022, 12:24:55 PM

Previous topic - Next topic

Joanna Carter

Over the last couple of years I have been writing a keywording app for Mac, using ExifTool as the underlying engine. I have adhered to what I believe the MWG Guidance document defines and my app consumes and produces metadata that is 100% compatible with all other apps I have been able to test.

Nonetheless, can I please start a discussion on the nature of keywords and their relationship to hierarchies?

My current view is that every keyword should be available for use in a standalone context.

If they are used in a hierarchical context, then the hierarchy contains references to existing keywords.

However, there are some who feel that the same keyword, appearing in multiple hierarchies, constitutes a "unique" keyword, in that it can't exist on its own.

e.g.

Standalone keywords - Fruit, Colour, Enterprise, Telecommunications, Orange, Satsuma

Hierarchy 1 - Fruit > Orange > Satsuma

Hierarchy 2 - Colour > Orange

Hierarchy 3 - Enterprise > Telecommunications > Orange

Should the keyword Orange be a reference to the standalone keyword or should it only exist in the context each of the defined hierarchies?

Should all keywords referred to in hierarchies also be valid as standalone?

If I take the approach that all mentions of Orange, in hierarchies, are purely references to the standalone keyword, this then ensures that there can be no spelling variations and allows for searches where I can find all images that reference Orange, no matter where in any hierarchies they are found.

If I only define Orange in each of its containing hierarchies, it makes both "spell-checking" and searching more difficult.

What are your thoughts?

Alan Clifford

Orange, orange and orange are three completely different words.  They look the same because they are spelt the same but they have different meanings.

Phil Harvey

For search purposes I would put Orange in the individual keywords.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Joanna Carter

Quote from: Alan Clifford on April 29, 2022, 02:24:56 PM
Orange, orange and orange are three completely different words.  They look the same because they are spelt the same but they have different meanings.

Not necessarily, if you read the MWG Guidance document...

Quote
Hierarchical keywords simply provide the syntactic mechanism for relating one keyword to another
Quote
The assumption is that the hierarchies are structured from higher to lower level whereas each hierarchy node - keyword within themselves - can be assigned to an image individually

Skids

Hi,

Fair warning: I am just a non-professional user of image software and the following is just my view that will remain valid for a short while ;-)

If I have understood Joanna's question correctly she is asking what should be written to "stand alone keywords" when "orange" is selected by the user from the hierarchical lists.  In the next post Alan argues that "orange", "orange", and "orange" are "are three completely different words".  Sorry, no they are not.  They are the same word being used in a different context and the context is important.  Just applying "orange" as a keyword without other keywords is of little use as the context will be lost.

In the three examples given above I would expect software to write "orange" and all the ancestor words  to dc:subject when one of the nodes is selected in software.  How the hierarchy is captured appears to be up to Joanna as there does not appear to be a standard.  However, applying the "rule of gross tonnage" perhaps adopting the syntax/xmp tag format that Adobe  Lightroom uses is a safe way to proceed.
<dc:subject>
    <rdf:Bag>
     <rdf:li>pictorial</rdf:li>
     <rdf:li>StandardLens</rdf:li>
     <rdf:li>River</rdf:li>
     <rdf:li>Places</rdf:li>
     <rdf:li>Bournemouth</rdf:li>
     <rdf:li>The Run</rdf:li>
    </rdf:Bag>
   </dc:subject>


snip

<lr:hierarchicalSubject>
    <rdf:Seq>
     <rdf:li>Places|Bournemouth|The Run</rdf:li>
    </rdf:Seq>
   </lr:hierarchicalSubject>


The dc:subject keywords were applied using Exiftool.  The hierarchicalSubject were applied by NeoFinder with I strongly suspect uses Exiftool. 

One plea : please provide a method of replacing one poorly chosen keyword with a better word or phrase in a set of images.

Lastly, the end user has a responsibility to choose sensible keywords or phrases, I suggest that "orange" is a poor.  Better phrases would be "An Orange", "Cadmium Orange", "Orange plc". 

Expanding my somewhat chaotic thoughts: why are we so concerned about terse keywords or phrases?  For example consider this description/caption
Quote"Jane Doe sitting at an orange table drinking a coffee at the cafe on the Christchurch side of The Run at Mudeford near Bournemouth"
any modern application can find specific words from text in the blink of an eye for example I searched the web for the text Joanna quoted above and found the document in a third of a second. So perhaps keywords belong to the age of computers where databases could not afford to store the century in date fields. 

What do I know ?

best wishes
Simon

Joanna Carter

Quote
How the hierarchy is captured appears to be up to Joanna as there does not appear to be a standard

Well, it is actually very clear (to me) MWG Guidance document. The only problem is that hierarchical storage is defined in terms of the xmp-mwg-kw:hierarchicalkeywords tag instead of the lr:hierarchicalSubject tag, like this...


<rdf:Description xmlns:mwg-kw="http://www.metadataworkinggroup.com/schemas/keywords/">
  <mwg-kw:Keywords rdf:parseType="Resource">
    <mwg-kw:Hierarchy>
      <rdf:Bag>
        <rdf:li rdf:parseType="Resource">
          <mwg-kw:Keyword>Places</mwg-kw:Keyword>
          <mwg-kw:Applied>False</mwg-kw:Applied>
          <mwg-kw:Children>
            <rdf:Bag>
              <rdf:li rdf:parseType="Resource">
                <mwg-kw:Keyword>Bournemouth</mwg-kw:Keyword>
                <mwg-kw:Applied>True</mwg-kw:Applied>
                <mwg-kw:Children>
                  <rdf:Bag>
                    <rdf:li rdf:parseType="Resource">
                      <mwg-kw:Keyword>The Run</mwg-kw:Keyword>
                      <mwg-kw:Applied>True</mwg-kw:Applied>
                    </rdf:li>
                  </rdf:Bag>
                </mwg-kw:Children>
              </rdf:li>
            </rdf:Bag>
          </mwg-kw:Children>
        </rdf:li>
      </rdf:Bag>
    </mwg-kw:Hierarchy>
  </mwg-kw:Keywords>
</rdf:Description>


Which get written to the EXIF in the file as...


[XMP]           Hierarchical Keywords 1         : Places, Places, Places
[XMP]           Hierarchical Keywords 2         : Bournemouth, Bournemouth
[XMP]           Hierarchical Keywords 3         : The Run


In order to "comply" with this definition of all possible combinations, along with other research, I felt it was most appropriate to do the same thing with the lr:hierarchicalSubject tag...


[XMP]           Hierarchical Subject            : Places, Places|Bournemouth, Places|Bournemouth|TheRun


Some software does use this "full" version, but others prefer to simply state the hierarchy in one single "phrase"...


[XMP]           Hierarchical Subject            : Places|Bournemouth|TheRun


After extensive testing, I haven't found any significant proof that one is better than the other, but my software writes the full version, on the basis that I'd rather have everything that other software might want to work with, rather than discovering that something, somewhere can't make enough sense of the shorter version.

Quote
One plea : please provide a method of replacing one poorly chosen keyword with a better word or phrase in a set of images.

This is something that ExifTool can already do, you would just need to work out the exact command line. I do it all in my app, which then simply overwrites both the subject and hierarchy tags with the amended full list.