Issues writing a config that can read custom XMP data

Started by hsummrs, September 05, 2023, 02:33:24 PM

Previous topic - Next topic

hsummrs

Hi, I'm trying to use exiftool to read/write Calibre metadata in PDF files. I wrote a mostly-functional config for this that can write all of the data mostly fine but can't seem to read it properly.

Here is the XMP data from a PDF that I exported from Calibre (I removed any non-Calibre data):
<?xpacket begin="�" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:calibreCC="http://calibre-ebook.com/xmp-namespace-custom-columns" xmlns:calibre="http://calibre-ebook.com/xmp-namespace" rdf:about="">
      <calibre:rating>8</calibre:rating>
      <calibre:timestamp>0100-12-31T19:00:00-05:00</calibre:timestamp>
      <calibre:title_sort>Stalin: History and Critique of a Black Legend</calibre:title_sort>
      <calibre:author_sort>Losurdo, Domenico &amp; Hakamäki, Henry &amp; Mauro, Salvatore Engel-Di &amp; Losurdo, Domenico &amp; Hakamäki, Henry &amp; Mauro, Salvatore Engel-Di</calibre:author_sort>
      <calibre:custom_metadata>
        <rdf:Bag>
          <rdf:li rdf:parseType="Resource">
            <calibreCC:name>#subtitle</calibreCC:name>
            <rdf:value>{"table": "custom_column_2", "column": "value", "datatype": "text", "is_multiple": null, "kind": "field", "name": "Subtitle", "search_terms": ["#subtitle"], "label": "subtitle", "colnum": 2, "display": {"use_decorations": false, "description": ""}, "is_custom": true, "is_category": true, "link_column": "value", "category_sort": "value", "is_csp": false, "is_editable": true, "rec_index": 22, "#value#": null, "#extra#": null, "is_multiple2": {}}</rdf:value>
          </rdf:li>
          <rdf:li rdf:parseType="Resource">
            <calibreCC:name>#translators</calibreCC:name>
            <rdf:value>{"table": "custom_column_1", "column": "value", "datatype": "text", "is_multiple": "|", "kind": "field", "name": "Translators", "search_terms": ["#translators"], "label": "translators", "colnum": 1, "display": {"is_names": true, "description": ""}, "is_custom": true, "is_category": true, "link_column": "value", "category_sort": "value", "is_csp": false, "is_editable": true, "rec_index": 23, "#value#": [], "#extra#": null, "is_multiple2": {"cache_to_list": "|", "ui_to_list": "&amp;", "list_to_ui": " &amp; "}}</rdf:value>
          </rdf:li>
        </rdf:Bag>
      </calibre:custom_metadata>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>

And here's the config I wrote:
%Image::ExifTool::UserDefined = (
  'Image::ExifTool::XMP::Main' => {
    calibre => {
      SubDirectory => {
        TagTable => 'Image::ExifTool::UserDefined::calibre',
      },
    },
  },
);

%Image::ExifTool::UserDefined::calibre = (
  GROUPS => { 0 => 'XMP', 1 => 'XMP-calibre', 2 => 'Document' },
  WRITABLE => 'string',
  NAMESPACE => {
    'calibre' => 'http://calibre-ebook.com/xmp-namespace',
    'calibreCC' => 'http://calibre-ebook.com/xmp-namespace-custom-columns',
    'calibreSI' => 'http://calibre-ebook.com/xmp-namespace-series-index'
  },

  rating => {},
  timestamp => {},
  title_sort => {},
  author_sort => {},
  series => {},
  series_index => {},
  link_maps => {},
  user_categories => {},
  author_link_map => {},
  custom_metadata => {
    List => 'Bag',
    Struct => {
      name => { Namespace => 'calibreCC' },
      value => { Namespace => 'rdf' }
    }
  }
);


%Image::ExifTool::UserDefined::Options = (
  Duplicates => 1, # make -a default for the exiftool app
  RequestAll => 3, # request additional tags not normally generated
  Struct => 2,
  StructFormat => 'JSON'
);

1;  #end

With this config I can write everything normally, including the custom metadata fields:

exiftool '-XMP-calibre:custom_metadata'+="{name=#subtitle, value=|{\"key\": \"value\"|}}" '-XMP-calibre:custom_metadata'+="{name=#translator, value=|{\"key\": \"value\"|}}" '-XMP-calibre:rating'='8' '-XMP-calibre:timestamp'='0100-12-31T19:00:00-05:00' '-XMP-calibre:title_sort'='Stalin: History and Critique of a Black Legend' '-XMP-calibre:author_sort'='Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di & Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di' ./src-calibre.pdf


Writes the following XMP data (again trimmed):
<?xpacket begin='�' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 12.60'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
 <rdf:Description rdf:about=''
  xmlns:calibre='http://calibre-ebook.com/xmp-namespace'
  xmlns:calibreCC='http://calibre-ebook.com/xmp-namespace-custom-columns'>
  <calibre:author_sort>Losurdo, Domenico &amp; Hakamäki, Henry &amp; Mauro, Salvatore Engel-Di &amp; Losurdo, Domenico &amp; Hakamäki, Henry &amp; Mauro, Salvatore Engel-Di</calibre:author_sort>
  <calibre:custom_metadata>
   <rdf:Bag>
    <rdf:li rdf:parseType='Resource'>
     <calibreCC:name>#subtitle</calibreCC:name>
     <rdf:value>{&quot;key&quot;: &quot;value&quot;}</rdf:value>
    </rdf:li>
    <rdf:li rdf:parseType='Resource'>
     <calibreCC:name>#translator</calibreCC:name>
     <rdf:value>{&quot;key&quot;: &quot;value&quot;}</rdf:value>
    </rdf:li>
   </rdf:Bag>
  </calibre:custom_metadata>
  <calibre:rating>8</calibre:rating>
  <calibre:timestamp>0100-12-31T19:00:00-05:00</calibre:timestamp>
  <calibre:title_sort>Stalin: History and Critique of a Black Legend</calibre:title_sort>
 </rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end='w'?>

There are some issues with quotes being escaped but I thought I'd figure those out later. Anyway, I run into problems when trying to read custom metadata. I absolutely cannot get exiftool to read it as a list of structs or even two separate lists of names & values. With the above config only the names are read:
[XMP-calibre]   Custom metadata                 : [{Name=#subtitle},{Name=#translator}]

If I remove "Struct => { ... }" from the config it reads the values as a list but the names as a duplicate tag I think, though I am not an expert on exiftool's output. There are also warnings in the output.
[ExifTool]      Warning                         : Custom_metadata is not a structure!
[ExifTool]      Warning                         : Custom_metadata is not a structure!
[XMP-calibre]   Custom metadata Name            : #subtitle
[XMP-calibre]   Custom metadata                 : [|{"key": "value"},|{"key": "value"}]
[XMP-calibre]   Custom metadata Name            : #translator

I read through the source files XMP.pm, XMP2.pl, and XMPStruct.pl and found some uses of "Resource => 1" but didn't have any luck with that either. I am really just at a loss here. I feel like it has something to do with the use of "<rdf:value>". Maybe the way Calibre uses it is improper?


Edit: I just noticed some warnings about XMP:Identifier and it seems like someone has tried to solve a similar problem in the past.

Here is what the Identifier element looks like in a Calibre-modified PDF:
<xmp:Identifier>
        <rdf:Bag>
          <rdf:li rdf:parseType="Resource">
            <xmpidq:Scheme>goodreads</xmpidq:Scheme>
            <rdf:value>193770583</rdf:value>
          </rdf:li>
          <rdf:li rdf:parseType="Resource">
            <xmpidq:Scheme>barnesnoble</xmpidq:Scheme>
            <rdf:value>1143809944</rdf:value>
          </rdf:li>
          <rdf:li rdf:parseType="Resource">
            <xmpidq:Scheme>isbn</xmpidq:Scheme>
            <rdf:value>9781087884714</rdf:value>
          </rdf:li>
          <rdf:li rdf:parseType="Resource">
            <xmpidq:Scheme>amazon</xmpidq:Scheme>
            <rdf:value>1087884713</rdf:value>
          </rdf:li>
          <rdf:li rdf:parseType="Resource">
            <xmpidq:Scheme>google</xmpidq:Scheme>
            <rdf:value>s50N0AEACAAJ</rdf:value>
          </rdf:li>
        </rdf:Bag>
      </xmp:Identifier>

And the result of running exiftool:
[ExifTool]      Warning                         : Identifier is not a structure!
[ExifTool]      Warning                         : Identifier is not a structure!
[ExifTool]      Warning                         : Identifier is not a structure!
[ExifTool]      Warning                         : Identifier is not a structure!
[ExifTool]      Warning                         : Identifier is not a structure!
[XMP]           Identifier Scheme               : goodreads
[XMP]           Identifier                      : [193770583,1143809944,9781087884714,1087884713,s50N0AEACAAJ]
[XMP]           Identifier Scheme               : barnesnoble
[XMP]           Identifier Scheme               : isbn
[XMP]           Identifier Scheme               : amazon
[XMP]           Identifier Scheme               : google

So I looked for the definition of Identifier and found some commented out lines in XMP.pm:378
#my %sIdentifierScheme = (
#    NAMESPACE   => 'xmpidq',
#    Scheme      => { }, # qualifier for xmp:Identifier only
#);

So it appears somebody has tried to solve this in the past.

StarGeek

I'm not ignoring this, it's just that it's a complicated question and will take a significant amount of time to break down and try to figure out.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

hsummrs

Quote from: StarGeek on September 07, 2023, 11:40:11 PMI'm not ignoring this, it's just that it's a complicated question and will take a significant amount of time to break down and try to figure out.

I understand. I've been researching this since I posted it and this pattern seems to be called a "Qualifier" in XMP. Adobe's XMP spec has some information about it. For what it's worth it appears to be proper, though rare, usage of XMP and RDF so changes to exiftool might be worthwhile if this isn't solvable via the config file. I'll include links to some of the resources I read but I'd be glad to explain exactly what's going on RDF-wise if you'd like.

Resources:

Phil Harvey

Sorry for the delay in responding.  It has taken me a while to catch up after my vacation.

You are correct that ExifTool doesn't yet have the ability to handle XMP qualifiers.   This has been on my to-do list for a long time now.  As you suspected, they will have to be handled in a similar way to structures.

Adding this ability will be quite a bit of work, and qualifiers aren't commonly used so there hasn't been much pressure for me to work on this yet, but I will move it up in my to-do list.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).