Custom Property Not showing in Adobe Properties

Started by jason.heine, March 09, 2017, 09:54:26 AM

Previous topic - Next topic

jason.heine

I have the following ExifTool custom conf file which I'm using to add specific data to our pdf documents:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::Main' => {
        FGID => {
            SubDirectory => {
                TagTable => 'Image::ExifTool::UserDefined::fgid',
            },
        },
    },
);
%Image::ExifTool::UserDefined::fgid = (
    GROUPS => { 0 => 'XMP', 1 => 'PDFX' },
    NAMESPACE => { 'pdfx' => 'http://ns.adobe.com/pdfx/1.3/' },
    WRITABLE => 'string',
    FGID => { },
);


I ran exiftool -xmp -b original.pdf and got the following on the original document:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/">
         <xmp:CreateDate>2011-09-19T13:47:41Z</xmp:CreateDate>
         <xmp:ModifyDate>2012-01-25T11:05:29-06:00</xmp:ModifyDate>
         <xmp:MetadataDate>2012-01-25T11:05:29-06:00</xmp:MetadataDate>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:creator>
            <rdf:Bag>
               <rdf:li>My Company Name Here</rdf:li>
            </rdf:Bag>
         </dc:creator>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">The Title Meta Data Here</rdf:li>
            </rdf:Alt>
         </dc:title>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
         <xmpMM:DocumentID>uuid:e3d1e614-1e49-4f8b-bea9-3370c03da11b</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:cdfa38bc-0d2e-4b18-9dd6-de471aa9c9e8</xmpMM:InstanceID>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>


I then proceed to run the following:

exiftool -config /home/jason/exif.conf -FGID="2325" -overwrite_original  original.pdf

It outputs:
1 image files updated

I then proceed to run exiftool -xmp -b original.pdf again to verify and the output is as follows:

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 9.74'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

<rdf:Description rdf:about=''
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:creator>
   <rdf:Bag>
    <rdf:li>My Company Name Here</rdf:li>
   </rdf:Bag>
  </dc:creator>
  <dc:format>application/pdf</dc:format>
  <dc:title>
   <rdf:Alt>
    <rdf:li xml:lang='x-default'>The Title Meta Data Here</rdf:li>
   </rdf:Alt>
  </dc:title>
</rdf:Description>

<rdf:Description rdf:about=''
  xmlns:pdfx='http://ns.adobe.com/pdfx/1.3/'>
  <pdfx:FGID>2325</pdfx:FGID>
</rdf:Description>

<rdf:Description rdf:about=''
  xmlns:xmp='http://ns.adobe.com/xap/1.0/'>
  <xmp:CreateDate>2011-09-19T13:47:41Z</xmp:CreateDate>
  <xmp:MetadataDate>2012-01-25T11:05:29-06:00</xmp:MetadataDate>
  <xmp:ModifyDate>2012-01-25T11:05:29-06:00</xmp:ModifyDate>
</rdf:Description>

<rdf:Description rdf:about=''
  xmlns:xmpMM='http://ns.adobe.com/xap/1.0/mm/'>
  <xmpMM:DocumentID>uuid:e3d1e614-1e49-4f8b-bea9-3370c03da11b</xmpMM:DocumentID>
  <xmpMM:InstanceID>uuid:cdfa38bc-0d2e-4b18-9dd6-de471aa9c9e8</xmpMM:InstanceID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>


The first thing I noticed is it is formatted slightly different, however the key pieces of data are there. The main differences are the xmptk in the xmpmeta tag, and it does have my pdfx:FGID tag in the data.

So the problem is, Adobe is not seeing this data. Nor is one of my applications that is reading the meta data for indexing.

When I got into Acrobat and hit Control-D and view the Custom tab it shows nothing.

If I go to the Description tab, and then click Advanced Metadata, it does not show in the data there either.

Am I doing something wrong to get this information properly inserted into the Custom Property section of the document?

If I add the custom property via Acrobat, and then view the xmp data, it shows the exact xmp data as it would if doing this with exiftool. So I know there is something missing somewhere.

Thanks

Jason





Phil Harvey

Hi Jason,

You can see if is an XMP problem by copying the XMP from a file written by Acrobat to a file written by ExifTool:

exiftool -tagsfromfile acrobat.pdf -xmp exiftool.pdf

If you can see the metadata in "exiftool.pdf" then it must be something in the XMP.  Post the XMP from a good Acrobat file and a bad ExifTool file and I'll take a look.  If

The xmptk won't be the problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jason.heine

Hi Phil,

Thank you for the quick response.

I created a sample document in word and converted to a pdf. The first one I opened in acrobat and saved the meta, so I got this.

Here is the xmp from the acrobat modified file:

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>My Company</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">My Title</rdf:li>
            </rdf:Alt>
         </dc:title>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/">
         <xmp:CreateDate>2017-03-09T09:52:57-06:00</xmp:CreateDate>
         <xmp:CreatorTool>Microsoft® Word 2013</xmp:CreatorTool>
         <xmp:ModifyDate>2017-03-09T09:53:36-06:00</xmp:ModifyDate>
         <xmp:MetadataDate>2017-03-09T09:53:36-06:00</xmp:MetadataDate>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
         <pdf:Producer>Microsoft® Word 2013</pdf:Producer>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <pdfx:FGID>2325</pdfx:FGID>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
         <xmpMM:DocumentID>uuid:ae07bede-6b47-4069-899b-d4d1486a5442</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:a01e716d-a869-4cfc-82c9-f222fa75a085</xmpMM:InstanceID>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>


I then took the unmodified pdf and modified it in exiftool. And here is the xmp from the exiftool.pdf:

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='Image::ExifTool 9.74'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

<rdf:Description rdf:about=''
  xmlns:pdfx='http://ns.adobe.com/pdfx/1.3/'>
  <pdfx:FGID>2325</pdfx:FGID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>


I opened the file modified by exiftool, and it does not show the custom property in adobe.

If I take the originally modified acrobat file and then run the command:

exiftool -tagsfromfile acrobat.pdf -xmp exiftool.pdf I get:

======== acrobat.pdf
XMP                             : (Binary data 3788 bytes, use -b option to extract)
======== exiftool.pdf
    2 image files read


But when I run exiftool -xmp -b exiftool.pdf it's empty

And then open the file in acrobat (exiftool.pdf) the custom property is not there.

I hope I provided you with the required information that you requested. Please let me know if otherwise.

Thanks



Phil Harvey

I don't understand why copying the XMP from the acrobat file didn't work.  Can you post the exact commands you used and email me the pdf files so I can reproduce this myself (philharvey66 at gmail.com)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jason.heine

Here are the commands:

I'm running Debian Jessie and here are the commands:

exiftool -tagsromfile acrobat.pdf -xmp exiftool.pdf

I then used exiftool -xmp -b exiftool.pdf to check the data

I have emailed you the files.

jason.heine

As an additional update.

I did download and run the latest version (10.46) with the same commands, and it gave the same issue.



Phil Harvey

Hmmm.  Mail hasn't arrived yet and I'm just about to go home for the day.  I'll take a look at this first thing tomorrow if the files come in.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I got the files just as I was stepping out the door.    Copying the XMP works, but I can't test the resulting file (a.pdf) in Acrobat:

> exiftool tmp -xmp
======== tmp/acrobat.pdf
XMP                             : (Binary data 3788 bytes, use -b option to extract)
======== tmp/exiftool.pdf
    1 directories scanned
    2 image files read
> cp tmp/exiftool.pdf a.pdf
> exiftool a.pdf -tagsfromfile tmp/acrobat.pdf -xmp
Warning: [Minor] Ignored duplicate Info dictionary - tmp/acrobat.pdf
    1 image files updated
> exiftool a.pdf -xmp
XMP                             : (Binary data 3788 bytes, use -b option to extract)


...not sure about the duplicate Info dictionary, but I suspect this may be related to the problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jason.heine

Okay, I will mess around with it more and see what I can find.

I find it odd that even with a base empty PDF, and using the exiftool to add the custom property, it still doesn't show in acrobat, or reader.

When looking at the binary of the file, there are some key differences, so I'm wondering if acrobat is adding values allowing the reader to see the properties, but when they are added outside of the application, those "values" are not being set.


jason.heine

Phil,

I was able to get this to work with PyPDF2 by using the following code:

from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import NameObject, createStringObject

inpfn = '/home/jason/unmodified_a.pdf'

pdf_in = PdfFileReader(open(inpfn, 'rb'))

writer = PdfFileWriter()

for page in range(pdf_in.getNumPages()):
    writer.addPage(pdf_in.getPage(page))

infoDict = writer._info.getObject()

info = pdf_in.documentInfo
for key in info:
    infoDict.update({NameObject(key): createStringObject(info[key])})

infoDict.update({NameObject('/FGID'): createStringObject(u'2325')})

fout = open(inpfn+'out.pdf', 'wb')

writer.write(fout)
fout.close()

import os
os.unlink(inpfn)
os.rename(inpfn+'out.pdf', inpfn)


This was able to add the information to the document properly and it showed in Acrobat, and our indexer found the custom property.

I don't like this method because there is so much margin for error as it has to basically re-create the pdf document page by page in memory. We have millions of documents that we need to modify.

I really would like to use Exiftool to add the Custom Property to the PDF if at all possible.

I would think that the custom exif.conf file could be written in such a way to insert the data properly into the PDF, this is what I have now:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::Main' => {
        FGID => {
            SubDirectory => {
                TagTable => 'Image::ExifTool::UserDefined::fgid',
            },
        },
    },
);
%Image::ExifTool::UserDefined::fgid = (
    GROUPS => { 0 => 'XMP', 1 => 'PDFX' },
    NAMESPACE => { 'pdfx' => 'http://ns.adobe.com/pdfx/1.3/' },
    WRITABLE => 'string',
    FGID => { },
);


Thoughts?

Phil Harvey

Sorry, I haven't had a lot of time recently to spend working problems like this.

Here is the problem:

> exiftool tmp/acrobat.pdf -G1 -a -fgid
[XMP-pdfx]      Fgid                            : 2325
[PDF]           FGID                            : 2325


FGID is written in two locations by acrobat.  Apparently it ignores the XMP-pdfx location.  Use the following config file to write both:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::pdfx' => {
        FGID => { },
    },
    'Image::ExifTool::PDF::Info' => {
        FGID => { },
    },
);


(It turns out that there was no problem with the duplicate Info dictionaries.  Instead, it was a FAQ 3 problem.)

- Phil

Edit: Hey.  That was my eleven-thousandth post!
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jason.heine

This worked like a charm.

I greatly appreciate you looking at this and helping me resolve this problem