I have a book in both ePub and in PDF. I would like to extract the following tag information, present in both files, in JSON format:
<XMP-dc:Creator>
<rdf:Bag>
<rdf:li>Radhakrishnan Nagarajan</rdf:li>
<rdf:li>Marco Scutari</rdf:li>
<rdf:li>Sophie Lèbre</rdf:li>
</rdf:Bag>
</XMP-dc:Creator>
So, I use -j -G1 -struct
switches to get as close to the original metadata as possible.
To my surprise this is what I get for the ePub file:
"XMP-dc:Creator": "Sophie Lèbre"
While I get the proper response on the PDF file:
"XMP-dc:Creator": ["Radhakrishnan Nagarajan","Marco Scutari","Sophie Lèbre"]
Is this a bug or a feature, or rather my ignorance?
Could you send me the ePub file so I can take a look? (philharvey66 at gmail.com)
- Phil
I sent you a PM with the links.
I got the files, thanks.
This is a bit complicated. The difference is that the metadata in the ePub isn't XMP. It is XML, and ExifTool is patched to read this as if it were XMP. Unfortunately, due to this the -struct option doesn't work as it would with XMP. I can perhaps look into patching this too, but at the moment you have two options:
1) Use -G1:4 instead of -G1. Then you will get the extra Creators as separate tags.
2) Drop the -struct option. (for some reason this works to return Creator as a list)
But in general, we can't ever expect the -struct option to function correctly for XML metadata. ExifTool doesn't officially support XML because there is just too much variation in the structure.
- Phil
Thanks for the prompt response, thanks for looking into the issue.
I will try to implement one of your suggested solutions.
Every time I learn new stuff here...
For future reference, this is the structure of the XML in the ePub file:
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="bookid" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:creator opf:role="aut" opf:file-as="Nagarajan, Radhakrishnan & Scutari, Marco & Lèbre, Sophie">Radhakrishnan Nagarajan</dc:creator>
<dc:creator opf:role="aut">Marco Scutari</dc:creator>
<dc:creator opf:role="aut">Sophie Lèbre</dc:creator>
</metadata>
</package>
...which is why the items don't appear as a list in the ExifTool -struct output: They aren't in list form.
Compare this to the XMP, where they are elements of a sequential list:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
<dc:creator>
<rdf:Seq>
<rdf:li>Radhakrishnan Nagarajan</rdf:li>
<rdf:li>Marco Scutari</rdf:li>
<rdf:li>Sophie Lèbre</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
- Phil