Policy for reading/rewriting badly formed keywords

Started by Joanna Carter, August 14, 2021, 01:50:23 PM

Previous topic - Next topic

Joanna Carter

I have noticed that Adobe Bridge and possibly Lightroom has an option to write hierarchical keywords to the xmp:subject tag.

Checking one of the options in Bridge produce the following in an XMP file...


    ...
   <dc:subject>
    <rdf:Bag>
     <rdf:li>Nounours|Didier|Joanna</rdf:li>
     <rdf:li>Joanna</rdf:li>
    </rdf:Bag>
   </dc:subject>
    ...
   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Nounours|Didier|Joanna</rdf:li>
     <rdf:li>Joanna</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>
    ...


... which to the best of my knowledge flies in the face of MWG guidelines and most other advice.

My question is - What are your thoughts on what to do when you come across this kind of thing?

Do you "migrate" it to correctly formed metadata, or do you leave it the poor chump who has to use it after they have used your app and now wants to blame you because another application can't cope with it?

StarGeek

I would say don't change it unless the user explicitly wants it to change.  Since it's an option to turn on and off, you would have to assume that it's as the user wanted.  It would come down to the line from the guidance pdf: "Deletion of metadata MUST only be done with specific intent."  You shouldn't "fix" something that they may not want "fixed".
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Joanna Carter

OK. Decision made.

I read delimiter , | / \ : ; > < separated subject tags, parsing them to individual words for display but not (yet) rewriting them.

Usually, in these cases, the hierarchical subject isn't worth much because it just mirrors the subject and can be incomplete, so I tend to ignore it.

If the user changes the keywords, I then rewrite properly formed, separate, keywords to the subject tag and an equally correctly formed hierarchy to the hierarchical subject tag.

Apart from (correctly) displaying the separate keywords and correct hierarchy, apps like Lightroom and Bridge don't have any problem reading the replacement format and display the same context that was there originally