ExifTool Forum

ExifTool => Developers => Topic started by: Joanna Carter on September 02, 2022, 02:44:16 AM

Title: Apostrophes being encoded into XMP file
Post by: Joanna Carter on September 02, 2022, 02:44:16 AM
I'm working on Mac and trying to finalise the testing of my app.

I've just noticed that words with apostrophes in the -xmp:subject tag are being encoded differently in XMP sidecars than when written directly to image files.

If I use L'art de l'eau as an example, reading back an image file gives me...

Subject                         : L'art de l'eau

And reading back the XMP sidecar using ExifTool from the command line gives me...

Subject                         : L'art de l'eau

But reading the XMP sidecar, using a text editor, gives me...

    <rdf:li>L&#39;art de l&#39;eau</rdf:li>

This then means that Spotlight searches are unable to find L'art de l'eau and, since my app uses the Spotlight metadata engine, it means my app cannot search for words with apostrophes in XMP files

Do I need to specify an encoding or something?
Title: Re: Apostrophes being encoded into XMP file
Post by: StarGeek on September 02, 2022, 05:50:07 PM
That is the way they're supposed to be encoded.  If you look at the raw XMP in the file, you'll see it's the same way
C:\>exiftool -P -overwrite_original -all= -Subject="L'art de l'eau" y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -xmp -b y:\!temp\Test4.jpg
<?xpacket begin='�' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 12.44'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

 <rdf:Description rdf:about=''
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:subject>
   <rdf:Bag>
    <rdf:li>L&#39;art de l&#39;eau</rdf:li>
   </rdf:Bag>
  </dc:subject>
 </rdf:Description>
</rdf:RDF>
</x:xmpmeta><?xpacket end='w'?>

XMP is based upon XML and there are some characters that need to be encoded.  See this StackOverflow answer (https://stackoverflow.com/a/1091953/3525475).
Title: Re: Apostrophes being encoded into XMP file
Post by: Joanna Carter on September 03, 2022, 08:39:17 AM
Quote from: StarGeek on September 02, 2022, 05:50:07 PMXMP is based upon XML and there are some characters that need to be encoded
Thanks. I appreciate and understand that.

Unfortunately, the problem comes when you try to use Spotlight in Finder to look for RAW files whose keywording is stored in an XMP sidecar.

If I have written a keyword with apostrophes to an image file (even to a RAW file) Spotlight finds all those files but, if I write apostrophes to an XMP sidecar file, then they get encoded by ExifTool.

However, Adobe Bridge and Lightroom both write such words to XMP sidecars without escaping them

   <dc:subject>
    <rdf:Bag>
     <rdf:li>L'art de l'eau</rdf:li>
    </rdf:Bag>
   </dc:subject>

As does DxO PhotoLab

         <dc:subject>
            <rdf:Bag>
               <rdf:li>L'art de l'eau</rdf:li>
            </rdf:Bag>
         </dc:subject>

I agree that < and > need encoding, but then my software, along with many others uses them as hierarchy separation markers when inputting keywords, so never get written to XMP sidecars.

The ampersand (&) is a bit of a weird one because it is accepted in image files as is, but not in sidecars without being encoded..

I would like to think that ExifTool could "agree" with Adobe and others authors in, at least, not encoding the apostrophe.
Title: Re: Apostrophes being encoded into XMP file
Post by: Phil Harvey on September 04, 2022, 09:40:26 PM
I've got a few questions:

1. How do Adobe Bridge/Lightroom write a single quote in a simple XMP property?  (ie. not a Bag or Alt list.)  And does it use XMP shorthand format?  For example, writing -xmp-dc:identifier="it's a test" with the ExifTool -api compact=shorthand option results in this XMP code:

  dc:identifier='it&#39;s a test'
2. I think Adobe software uses shorthand format as above, but it is possible they use double quotes around the value.  If so, how is a value containing a double quote written, and can Spotlight search for this?

The bottom line is that there are times a quote must be encoded as an XML character entity, and it is a bug in Spotlight if it doesn't handle this properly.

- Phil
Title: Re: Apostrophes being encoded into XMP file
Post by: Joanna Carter on September 06, 2022, 02:43:43 AM
Quote from: Phil Harvey on September 04, 2022, 09:40:26 PM1. How do Adobe Bridge/Lightroom write a single quote in a simple XMP property?
I only have Bridge but it writes `xmp-dc:description` as...

   <dc:description>
    <rdf:Alt>
     <rdf:li xml:lang="x-default">It's a matter of "preference", Black &amp; White or Colour?</rdf:li>
    </rdf:Alt>
   </dc:description>

... and `xmp-dc:subject` as...

   <dc:subject>
    <rdf:Bag>
     <rdf:li>L'art de l'eau</rdf:li>
     <rdf:li>This is a "quote"</rdf:li>
     <rdf:li>Black &amp; White</rdf:li>
    </rdf:Bag>
   </dc:subject>

Allowing the single and double quotes but encoding the ampersand.

Quote from: Phil Harvey on September 04, 2022, 09:40:26 PMand can Spotlight search for this?

Spotlight regards an XMP file as a "simple" text file and searches its contents "literally" without any regard for encoded characters.

On the other hand, it searches for metadata in image files according to its own, extensive, list of metadata keys.

Building a search predicate, in Swift code, for keywords in any image file looks like this...

let rawKeywordsMetadataPredicate = NSPredicate(fromMetadataQueryString: "((_kMDItemGroupId = 13) && (kMDItemKeywords = \"\(rawKeyword)\"cd))")!

The `_kMDItemGroupId = 13` constant indicates any type of image file and `kMDItemKeywords` indicates the `xmp-dc:subject` tag.

Whereas, for the same thing in an XMP file...

let xmpKeywordsMetadataPredicate = NSPredicate(fromMetadataQueryString: "((kMDItemContentTypeTree = 'public.xml'cd) && (kMDItemTextContent = \"\(xmpKeyword)\"cd))")

Note the `kMDItemTextContent` key, which indicates the predicate is based purely on "plain" text.

Quote from: Phil Harvey on September 04, 2022, 09:40:26 PMand it is a bug in Spotlight if it doesn't handle this properly

Not so much a bug, simply that it has never treated XMP files as anything other than a simple text file. One can but hope that could change.
Title: Re: Apostrophes being encoded into XMP file
Post by: Joanna Carter on September 07, 2022, 06:30:37 PM
I stumbled across this interesting thread (https://exiftool.org/forum/index.php?topic=10101.msg52599)

I tried out using the `sed` command and got perfectly usable XMP files with substituting the apostrophe symbol for `&apos;`, which means that Spotlight is perfectly happy in searching for words with apostrophes in XMP files.

In my experimentations, it becomes apparent that that it is also possible to substitute the double quote (") as well but this then causes problems with Spotlight, even though apps like Bridge and ExifTool can can cope with reading an XMP file like this...

   <dc:subject>
    <rdf:Bag>
     <rdf:li>L'art de l'eau</rdf:li>
     <rdf:li>Here's a "quote"</rdf:li>
    </rdf:Bag>
   </dc:subject>

After testing I agree that ",<,> and & need to be escaped and avoided in values to be written, but is there any chance that just the apostrophe can be written unescaped (maybe with an option switch)?

This is quite important for non-english languages like French and I have done some fairly exhaustive testing and not found any problem with using it unescaped.

And, as you can see from the thread I quoted, it does seem to be a requirement for some folks.
Title: Re: Apostrophes being encoded into XMP file
Post by: StarGeek on September 07, 2022, 09:21:06 PM
Have you tested with the -api Compact option (https://exiftool.org/ExifTool.html#Compact) as Phil pointed out.  As an example, you can see here that Location has it's data between single quotes inside the </> symbols
C:\>exiftool -P -overwrite_original -api compact=all -all= -location="Provence-Alpes-Côte d'Azur (plus double quote \")" y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -b -xmp y:\!temp\Test4.jpg
<?xpacket begin='�' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 12.44'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about=''
xmlns:Iptc4xmpCore='http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/'
Iptc4xmpCore:Location='Provence-Alpes-Côte d&#39;Azur (plus double quote &quot;)'/></rdf:RDF>
</x:xmpmeta>
<?xpacket end='w'?>
Title: Re: Apostrophes being encoded into XMP file
Post by: Joanna Carter on September 08, 2022, 05:49:26 AM
Quote from: StarGeek on September 07, 2022, 09:21:06 PMIptc4xmpCore:Location='Provence-Alpes-Côte d&#39;Azur (plus double quote &quot;)'/>

And here is the problem. Type Provence-Alpes-Côte d'Azur into Spotlight and it will not find the file because it is essentially just a textual search.