User defined ID tags with spaces for PDF files

Started by basfmega, May 08, 2020, 07:40:47 PM

Previous topic - Next topic

basfmega

After creating a simple config file following this code from 2013:

Quote from: Phil Harvey on January 22, 2013, 08:26:58 AM
Yes, this is how to define a tag with a name that is different from the ID:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::PDF::Info' => {
        'ID with spaces' => {
            Name => 'SomeTagName',
            Description => 'Some description',
        },
    },
);


- Phil

After running:
exiftool -config config.txt -PDF:SomeTagName='test' file.pdf

When I check the results with:
exiftool -a -G1 file.pdf

I get this:
[ExifTool]      ExifTool Version Number         : 11.16
[System]        File Name                       : file.pdf
[System]        Directory                       : .
[System]        File Size                       : 1494 kB
[System]        File Modification Date/Time     : 2020:05:09 01:33:07+02:00
[System]        File Access Date/Time           : 2020:05:09 01:33:10+02:00
[System]        File Inode Change Date/Time     : 2020:05:09 01:33:07+02:00
[System]        File Permissions                : rw-r--r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.7
[PDF]           Linearized                      : No
[PDF]           Page Count                      : 50
[PDF]           ID                              : with

Instead of something like:
[PDF]           ID with spaces                 : test

And when I check it in Adobe Acrobat, there're no user defined tags nor values.
I don't know if the code in no longer usable or I'm making a mistake. I hope somebody can help me.
Thank you very much in advance.

Phil Harvey

It it legal to have a space in a PDF tag ID?  I don't recall ever having seen this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

basfmega

In Adobe Acrobat you can insert tags with spaces in their IDs. When you look at them with exiftool you get a little garbage:

[XMP-pdfx]      Fecha 0020de 0020edicin         : 2001

Where U+0020 is the Unicode for space. I've been trying to replicate this with exiftool without success.


Phil Harvey

That is not a PDF tag, it is an XMP tag.  What does the source XMP look like for this tag?:

exiftool -xmp -b FILE

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

basfmega

Thank you very much for your reply.

The whole -xmp -b output is:

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
         <pdf:Producer>Acrobat Distiller 8.2.2 (Macintosh)</pdf:Producer>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/">
         <xmp:CreateDate>2016-01-13T12:42:55+01:00</xmp:CreateDate>
         <xmp:ModifyDate>2020-04-29T20:36:21+02:00</xmp:ModifyDate>
         <xmp:MetadataDate>2020-04-29T20:36:21+02:00</xmp:MetadataDate>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">Título del Libro - Title</rdf:li>
            </rdf:Alt>
         </dc:title>
         <dc:rights>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">Copyright warning</rdf:li>
            </rdf:Alt>
         </dc:rights>
         <dc:creator>
            <rdf:Bag>
               <rdf:li>First Author</rdf:li>
               <rdf:li>Second Author</rdf:li>
            </rdf:Bag>
         </dc:creator>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
         <xmpMM:DocumentID>uuid:d5cd5673-c467-490c-ac3a-b08d95b5fbce</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:592373b2-8f04-8c40-9088-2664bfd1ae38</xmpMM:InstanceID>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/">
         <xmpRights:WebStatement>http://www.ourwebpage.es</xmpRights:WebStatement>
         <xmpRights:Marked>True</xmpRights:Marked>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:illustrator="http://ns.adobe.com/illustrator/1.0/">
         <illustrator:StartupProfile>Print</illustrator:StartupProfile>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <pdfx:ISBN>978-12-34-56789-0</pdfx:ISBN>
         <pdfx:Fechaↂ0020deↂ0020edición>2001</pdfx:Fechaↂ0020deↂ0020edición>
         <pdfx:eISBN>978-09-87-65432-1</pdfx:eISBN>
         <pdfx:Idioma>Español</pdfx:Idioma>
         <pdfx:Editor>Our Company</pdfx:Editor>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

Phil Harvey

That is truly odd.  The actual tag ID is "Fechaↂ0020deↂ0020edición".  Wild.  Or with the special characters escaped "Fecha\xe2\x86\x820020de\xe2\x86\x820020edici\xc3\xb3n".  So the config file would look something like this:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::pdfx' => {
        "Fecha\xe2\x86\x820020de\xe2\x86\x820020edici\xc3\xb3n" => {
            Name => 'FechaDeEdicion',
        },
    },
);

1;  #end


...but I wouldn't recomment using tags with such crazy ID's.

ExifTool tag names can not contain special characters, so without the config file to change the name, it would look like this:  "Fecha0020de0020edicin".

- Phil

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

basfmega

I've tried it, and Adobe Acrobat doesn't show any tags, but I've found another "space" (U+00A0 - \xc2\xa0) character and using it:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::pdfx' => {
        "Fecha\xc2\xa0de\xc2\xa0edici\xc3\xb3n" => {
            Name => 'FechaDeEdicion',
        },
    },
);

1;  #end


Acrobat shows and exports it as a regular space (Fecha de edición).

Nevertheless, I'll do my best to persuade my team to use standardized tags.
Thank you very, very much for your help and your amazing work.