Adding XMP into PDF

Started by jeffreyke81, September 24, 2017, 08:03:28 PM

Previous topic - Next topic

jeffreyke81

Dear all,

I am sure I have missed something here... but I just cannot figure out what I did wrong... Sorry if this is something super obvious and thank you in advance for all help you can provide!

I have been trying to embed a XMP into PDFs --- I read the "adding xmp data" thread (https://exiftool.org/forum/index.php?topic=2922.0). Unfortunately, I cannot go for the "exiftool -tagsfromfile xmp.xml "-all>xmp:all" FILE" option as I do have some unusual tags in my XMP input and I need the complete XMP embedded in the XMP box.
So... here is the command I tried
exiftool "-xml<=pdf-xmp.xml" page-pdf.pdf

And it returns
  Warning: Invalid XMP data for XMP:XMP
      0 image files updated
      1 image files unchanged


I've tried validate my XMP with both W3C validator (https://www.w3.org/RDF/Validator/rdfval) and PDFLib XMP Validator (https://www.pdflib.com/knowledge-base/xmp-metadata/free-xmp-validator/)
Here is the XMP source
---
1: <?xml version="1.0"?>
2:                                                                                                                                                                                           
3: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
4:   <rdf:Description rdf:about=""
5:                    xmlns:dc="http://purl.org/dc/elements/1.1/">
6:     <dc:format>application/pdf</dc:format>
7:     <dc:title>
8:       <rdf:Alt>
9:         <rdf:li xml:lang="en">Coronado Tent City Daily Program (Coronado, CA) 1903-07-30 [p ].</rdf:li>
10:       </rdf:Alt>
11:     </dc:title>
12:     <dc:description>
13:       <rdf:Alt>
14:         <rdf:li xml:lang="en">Page from Coronado Tent City Daily Program (newspaper). [See LCCN: sn94051565 for catalog record.]. Prepared on behalf of Coronado Public Library.</rdf:li>
15:       </rdf:Alt>
16:     </dc:description>
17:     <dc:date>
18:       <rdf:Seq>
19:         <rdf:li xml:lang="x-default">1903-07-30</rdf:li>
20:       </rdf:Seq>
21:     </dc:date>
22:     <dc:type>
23:       <rdf:Bag>
24:         <rdf:li xml:lang="en">text</rdf:li>
25:         <rdf:li xml:lang="en">newspaper</rdf:li>
26:       </rdf:Bag>
27:     </dc:type>
28:   </rdf:Description>
29:   <rdf:Description rdf:about=""
30:                    xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/">
31:     <xapMM:InstanceID>uuid:6e360eb7-2c72-4534-9f4a-205b8f590321</xapMM:InstanceID>
32:     <xapMM:DocumentID>uuid:969f4ad8-a768-4a84-b555-f86e30214b1d</xapMM:DocumentID>
33:   </rdf:Description>
34:   <rdf:Description rdf:about=""
35:                    xmlns:xap="http://ns.adobe.com/xap/1.0/">
36:     <xap:CreateDate>2011-07-09T00:54:36-05:00</xap:CreateDate>
37:     <xap:ModifyDate>2011-07-09T08:14:42-05:00</xap:ModifyDate>
38:     <xap:MetadataDate>2011-07-09T08:14:42-05:00</xap:MetadataDate>
39:   </rdf:Description>
40:   <rdf:Description rdf:about=""
41:                    xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
42:     <pdf:Producer/>
43:   </rdf:Description>
44: </rdf:RDF>


I guess my questions are
1. Why does exiftool think the XMP is invalid?
2. Is there a way to embed the complete XMP into PDF regardless its validility?
Here is an example of a PDF that has my XMP embedded
---
exiftool -D -b -xmp page-pdf.pdf
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.0-c321 44.398116, Tue Aug 04 2009 14:24:39">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="en">Coronado Tent City Daily Program (Coronado, CA) 1903-07-30 [p ].</rdf:li>
            </rdf:Alt>
         </dc:title>
         <dc:description>
            <rdf:Alt>
               <rdf:li xml:lang="en">Page from Coronado Tent City Daily Program (newspaper). [See LCCN: sn94051565 for catalog record.]. Prepared on behalf of Coronado Public Library.</rdf:li>
            </rdf:Alt>
         </dc:description>
         <dc:date>
            <rdf:Seq>
               <rdf:li xml:lang="x-default">1903-07-30</rdf:li>
            </rdf:Seq>
         </dc:date>
         <dc:type>
            <rdf:Bag>
               <rdf:li xml:lang="en">text</rdf:li>
               <rdf:li xml:lang="en">newspaper</rdf:li>
            </rdf:Bag>
         </dc:type>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/">
         <xapMM:InstanceID>uuid:6e360eb7-2c72-4534-9f4a-205b8f590321</xapMM:InstanceID>
         <xapMM:DocumentID>uuid:969f4ad8-a768-4a84-b555-f86e30214b1d</xapMM:DocumentID>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xap="http://ns.adobe.com/xap/1.0/">
         <xap:CreateDate>2011-07-09T00:54:36-05:00</xap:CreateDate>
         <xap:ModifyDate>2011-07-09T08:14:42-05:00</xap:ModifyDate>
         <xap:MetadataDate>2011-07-09T08:14:42-05:00</xap:MetadataDate>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
         <pdf:Producer/>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>


Again, thank you very much for shedding some light on this!!
Best regards,
Jeffrey


jeffreyke81

I have spent some more time on this and I think I found a workaround to this problem.

After further investigation, I found that I can most accomplish what I need by setting each XMP tag individually instead of feeding in the XMP xml.
This was almost working with one exception --- My validator is expecting to have rdf li tag within the dc.identifier field.
The solution is to overwrite the tag definition through -config option and add writable => 'lang-alt'.

%Image::ExifTool::UserDefined = (
  'Image::ExifTool::XMP::dc' => {
    identifier  => { Groups => { 2 => 'Image'  }, Writable => 'lang-alt' },
  },
);


Great design Phil, that UserDefined capability has already saved me twice!!
Again, thanks for creating this great software,
Best regards,
Jeffrey

Phil Harvey

Hi Jeffrey,

I love it when someone solves their own problem (and posts the solution!).  :)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).