Hi, I'm trying to use exiftool to read/write Calibre (https://www.calibre-ebook.com/) metadata in PDF files. I wrote a mostly-functional config for this that can write all of the data mostly fine but can't seem to read it properly.
Here is the XMP data from a PDF that I exported from Calibre (I removed any non-Calibre data):
<?xpacket begin="�" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:calibreCC="http://calibre-ebook.com/xmp-namespace-custom-columns" xmlns:calibre="http://calibre-ebook.com/xmp-namespace" rdf:about="">
<calibre:rating>8</calibre:rating>
<calibre:timestamp>0100-12-31T19:00:00-05:00</calibre:timestamp>
<calibre:title_sort>Stalin: History and Critique of a Black Legend</calibre:title_sort>
<calibre:author_sort>Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di & Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di</calibre:author_sort>
<calibre:custom_metadata>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<calibreCC:name>#subtitle</calibreCC:name>
<rdf:value>{"table": "custom_column_2", "column": "value", "datatype": "text", "is_multiple": null, "kind": "field", "name": "Subtitle", "search_terms": ["#subtitle"], "label": "subtitle", "colnum": 2, "display": {"use_decorations": false, "description": ""}, "is_custom": true, "is_category": true, "link_column": "value", "category_sort": "value", "is_csp": false, "is_editable": true, "rec_index": 22, "#value#": null, "#extra#": null, "is_multiple2": {}}</rdf:value>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<calibreCC:name>#translators</calibreCC:name>
<rdf:value>{"table": "custom_column_1", "column": "value", "datatype": "text", "is_multiple": "|", "kind": "field", "name": "Translators", "search_terms": ["#translators"], "label": "translators", "colnum": 1, "display": {"is_names": true, "description": ""}, "is_custom": true, "is_category": true, "link_column": "value", "category_sort": "value", "is_csp": false, "is_editable": true, "rec_index": 23, "#value#": [], "#extra#": null, "is_multiple2": {"cache_to_list": "|", "ui_to_list": "&", "list_to_ui": " & "}}</rdf:value>
</rdf:li>
</rdf:Bag>
</calibre:custom_metadata>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
And here's the config I wrote:
%Image::ExifTool::UserDefined = (
'Image::ExifTool::XMP::Main' => {
calibre => {
SubDirectory => {
TagTable => 'Image::ExifTool::UserDefined::calibre',
},
},
},
);
%Image::ExifTool::UserDefined::calibre = (
GROUPS => { 0 => 'XMP', 1 => 'XMP-calibre', 2 => 'Document' },
WRITABLE => 'string',
NAMESPACE => {
'calibre' => 'http://calibre-ebook.com/xmp-namespace',
'calibreCC' => 'http://calibre-ebook.com/xmp-namespace-custom-columns',
'calibreSI' => 'http://calibre-ebook.com/xmp-namespace-series-index'
},
rating => {},
timestamp => {},
title_sort => {},
author_sort => {},
series => {},
series_index => {},
link_maps => {},
user_categories => {},
author_link_map => {},
custom_metadata => {
List => 'Bag',
Struct => {
name => { Namespace => 'calibreCC' },
value => { Namespace => 'rdf' }
}
}
);
%Image::ExifTool::UserDefined::Options = (
Duplicates => 1, # make -a default for the exiftool app
RequestAll => 3, # request additional tags not normally generated
Struct => 2,
StructFormat => 'JSON'
);
1; #end
With this config I can write everything normally, including the custom metadata fields:
exiftool '-XMP-calibre:custom_metadata'+="{name=#subtitle, value=|{\"key\": \"value\"|}}" '-XMP-calibre:custom_metadata'+="{name=#translator, value=|{\"key\": \"value\"|}}" '-XMP-calibre:rating'='8' '-XMP-calibre:timestamp'='0100-12-31T19:00:00-05:00' '-XMP-calibre:title_sort'='Stalin: History and Critique of a Black Legend' '-XMP-calibre:author_sort'='Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di & Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di' ./src-calibre.pdf
Writes the following XMP data (again trimmed):
<?xpacket begin='�' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 12.60'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about=''
xmlns:calibre='http://calibre-ebook.com/xmp-namespace'
xmlns:calibreCC='http://calibre-ebook.com/xmp-namespace-custom-columns'>
<calibre:author_sort>Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di & Losurdo, Domenico & Hakamäki, Henry & Mauro, Salvatore Engel-Di</calibre:author_sort>
<calibre:custom_metadata>
<rdf:Bag>
<rdf:li rdf:parseType='Resource'>
<calibreCC:name>#subtitle</calibreCC:name>
<rdf:value>{"key": "value"}</rdf:value>
</rdf:li>
<rdf:li rdf:parseType='Resource'>
<calibreCC:name>#translator</calibreCC:name>
<rdf:value>{"key": "value"}</rdf:value>
</rdf:li>
</rdf:Bag>
</calibre:custom_metadata>
<calibre:rating>8</calibre:rating>
<calibre:timestamp>0100-12-31T19:00:00-05:00</calibre:timestamp>
<calibre:title_sort>Stalin: History and Critique of a Black Legend</calibre:title_sort>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end='w'?>
There are some issues with quotes being escaped but I thought I'd figure those out later. Anyway, I run into problems when trying to read custom metadata. I absolutely cannot get exiftool to read it as a list of structs or even two separate lists of names & values. With the above config only the names are read:
[XMP-calibre] Custom metadata : [{Name=#subtitle},{Name=#translator}]
If I remove "Struct => { ... }" from the config it reads the values as a list but the names as a duplicate tag I think, though I am not an expert on exiftool's output. There are also warnings in the output.
[ExifTool] Warning : Custom_metadata is not a structure!
[ExifTool] Warning : Custom_metadata is not a structure!
[XMP-calibre] Custom metadata Name : #subtitle
[XMP-calibre] Custom metadata : [|{"key": "value"},|{"key": "value"}]
[XMP-calibre] Custom metadata Name : #translator
I read through the source files XMP.pm (https://raw.githubusercontent.com/exiftool/exiftool/master/lib/Image/ExifTool/XMP.pm), XMP2.pl (https://raw.githubusercontent.com/exiftool/exiftool/master/lib/Image/ExifTool/XMP2.pl), and XMPStruct.pl (https://raw.githubusercontent.com/exiftool/exiftool/master/lib/Image/ExifTool/XMPStruct.pl) and found some uses of "Resource => 1" but didn't have any luck with that either. I am really just at a loss here. I feel like it has something to do with the use of "<rdf:value>". Maybe the way Calibre uses it is improper?
Edit: I just noticed some warnings about XMP:Identifier and it seems like someone has tried to solve a similar problem in the past.
Here is what the Identifier element looks like in a Calibre-modified PDF:
<xmp:Identifier>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<xmpidq:Scheme>goodreads</xmpidq:Scheme>
<rdf:value>193770583</rdf:value>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<xmpidq:Scheme>barnesnoble</xmpidq:Scheme>
<rdf:value>1143809944</rdf:value>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<xmpidq:Scheme>isbn</xmpidq:Scheme>
<rdf:value>9781087884714</rdf:value>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<xmpidq:Scheme>amazon</xmpidq:Scheme>
<rdf:value>1087884713</rdf:value>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<xmpidq:Scheme>google</xmpidq:Scheme>
<rdf:value>s50N0AEACAAJ</rdf:value>
</rdf:li>
</rdf:Bag>
</xmp:Identifier>
And the result of running exiftool:
[ExifTool] Warning : Identifier is not a structure!
[ExifTool] Warning : Identifier is not a structure!
[ExifTool] Warning : Identifier is not a structure!
[ExifTool] Warning : Identifier is not a structure!
[ExifTool] Warning : Identifier is not a structure!
[XMP] Identifier Scheme : goodreads
[XMP] Identifier : [193770583,1143809944,9781087884714,1087884713,s50N0AEACAAJ]
[XMP] Identifier Scheme : barnesnoble
[XMP] Identifier Scheme : isbn
[XMP] Identifier Scheme : amazon
[XMP] Identifier Scheme : google
So I looked for the definition of Identifier and found some commented out lines in XMP.pm:378 (https://github.com/exiftool/exiftool/blob/c5d5eae9fb6924ee859f4b62b5472e242cfb7662/lib/Image/ExifTool/XMP.pm#L378)
#my %sIdentifierScheme = (
# NAMESPACE => 'xmpidq',
# Scheme => { }, # qualifier for xmp:Identifier only
#);
So it appears somebody has tried to solve this in the past.
I'm not ignoring this, it's just that it's a complicated question and will take a significant amount of time to break down and try to figure out.
Quote from: StarGeek on September 07, 2023, 11:40:11 PMI'm not ignoring this, it's just that it's a complicated question and will take a significant amount of time to break down and try to figure out.
I understand. I've been researching this since I posted it and this pattern seems to be called a "Qualifier" in XMP. Adobe's XMP spec has some information about it. For what it's worth it appears to be proper, though rare, usage of XMP and RDF so changes to exiftool might be worthwhile if this isn't solvable via the config file. I'll include links to some of the resources I read but I'd be glad to explain exactly what's going on RDF-wise if you'd like.
Resources:
- XMP spec from Adobe (https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/XMPSpecificationPart1.pdf)
- RDF/XML spec (https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/)
- rdf:value documentation (https://www.w3.org/TR/2014/REC-rdf-schema-20140225/Overview.html#ch_value)
Sorry for the delay in responding. It has taken me a while to catch up after my vacation.
You are correct that ExifTool doesn't yet have the ability to handle XMP qualifiers. This has been on my to-do list for a long time now. As you suspected, they will have to be handled in a similar way to structures.
Adding this ability will be quite a bit of work, and qualifiers aren't commonly used so there hasn't been much pressure for me to work on this yet, but I will move it up in my to-do list.
- Phil