Discrepancy in JSON output between PDF and ePub formats (in bags)

Started by Retrography, April 26, 2019, 08:33:42 AM

Previous topic - Next topic

Retrography

I have a book in both ePub and in PDF. I would like to extract the following tag information, present in both files, in JSON format:


<XMP-dc:Creator>
  <rdf:Bag>
   <rdf:li>Radhakrishnan Nagarajan</rdf:li>
   <rdf:li>Marco Scutari</rdf:li>
   <rdf:li>Sophie Lèbre</rdf:li>
  </rdf:Bag>
</XMP-dc:Creator>


So, I use -j -G1 -struct switches to get as close to the original metadata as possible.

To my surprise this is what I get for the ePub file:


"XMP-dc:Creator": "Sophie Lèbre"


While I get the proper response on the PDF file:


"XMP-dc:Creator": ["Radhakrishnan Nagarajan","Marco Scutari","Sophie Lèbre"]


Is this a bug or a feature, or rather my ignorance?

Phil Harvey

Could you send me the ePub file so I can take a look?  (philharvey66 at gmail.com)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).


Phil Harvey

I got the files, thanks.

This is a bit complicated.  The difference is that the metadata in the ePub isn't XMP.  It is XML, and ExifTool is patched to read this as if it were XMP.  Unfortunately, due to this the -struct option doesn't work as it would with XMP.  I can perhaps look into patching this too, but at the moment you have two options:

1) Use -G1:4 instead of -G1.  Then you will get the extra Creators as separate tags.

2) Drop the -struct option.  (for some reason this works to return Creator as a list)

But in general, we can't ever expect the -struct option to function correctly for XML metadata.  ExifTool doesn't officially support XML because there is just too much variation in the structure.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Retrography

Thanks for the prompt response, thanks for looking into the issue.

I will try to implement one of your suggested solutions.

Every time I learn new stuff here...


Phil Harvey

For future reference, this is the structure of the XML in the ePub file:

<?xml version="1.0"  encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="bookid" version="2.0">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:creator opf:role="aut" opf:file-as="Nagarajan, Radhakrishnan &amp; Scutari, Marco &amp; Lèbre, Sophie">Radhakrishnan Nagarajan</dc:creator>
    <dc:creator opf:role="aut">Marco Scutari</dc:creator>
    <dc:creator opf:role="aut">Sophie Lèbre</dc:creator>
  </metadata>
</package>


...which is why the items don't appear as a list in the ExifTool -struct output:  They aren't in list form.

Compare this to the XMP, where they are elements of a sequential list:

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
      <dc:creator>
        <rdf:Seq>
          <rdf:li>Radhakrishnan Nagarajan</rdf:li>
          <rdf:li>Marco Scutari</rdf:li>
          <rdf:li>Sophie Lèbre</rdf:li>
        </rdf:Seq>
      </dc:creator>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).