PDF/A documents no longer identified

Started by dgn, September 13, 2018, 11:05:28 AM

Previous topic - Next topic

dgn

In upgrading from Exiftool 10.00 to 11.01 for use within the File Information Tool Set (http://fitstool.org) I have noticed a change in the identification of PDF documents. Whereas the old version would return metadata indicating a PDF/A document the current version only identifies the document as PDF. Here is the relevant tool output:
v.10.00
    <SchemasPrefix>pdfaid</SchemasPrefix>
    <SchemasSchema>PDF/A ID Schema</SchemasSchema>
v.11.01
    <SchemasPrefix>pdf</SchemasPrefix>
    <SchemasSchema>Adobe PDF Schema</SchemasSchema>

Any help resolving this greatly appreciated.

Phil Harvey

I have no idea where the "pdfaid" comes from.  Can you provide a sample PDF file and an example exiftool command that shows this with ExifTool 10.00?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dgn

A PDF/A document is attached.
I realized afterwards that I posted "post-processed" data rather than Exiftool raw output.

Using v.10.00 of Exiftool the following relevant data was output for the attached file:
Schemas Prefix                  : pdfaid
Schemas Schema                  : PDF/A ID Schema

Using v.11.01 the same data points:
Schemas Prefix                  : pdf
Schemas Schema                  : Adobe PDF Schema

I don't know why/how "pdfaid" is being output for all PDF/A documents.
However, with v.11.01 I have noticed a new data point being output for PDF/A documents (but not for simple PDF documents):
Conformance                     : B
(for type 1a PDF/A documents this value is "A")

In the past we used the value of  "Schemas Schema    : PDF/A ID Schema" to identify a document as PDF/A.
My question is whether this "Conformance" value can now be reliably used to identify PDF/A documents.

Thanks in advance for your help.

Phil Harvey

#3
The confusion is due to the fact that multiple tags exist with the same name.  Use the -a option to see them all:

> exiftool ~/Desktop/PDFa_equations.pdf -schemasprefix -schemasschema -a
Schemas Prefix                  : pdf
Schemas Prefix                  : xmpMM
Schemas Prefix                  : pdfaid
Schemas Schema                  : Adobe PDF Schema
Schemas Schema                  : XMP Media Management Schema
Schemas Schema                  : PDF/A ID Schema


Without -a, only one will be returned.  The difference in the preferred tag came in ExifTool 10.30 when the priority of unknown XMP tags was lowered.

- Phil

Edit:  Looking more closely at the XMP in this file, the "schemas" metadata is a bag of structures.  ExifTool should probably treat this as a list-type tag, but currently unknown tags are not handled this way.  If this changes, future versions of ExifTool may extract this SchemasPrefix as "pdf, xmpMM, padfaid".
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dgn

Thanks, Phil, this information has been very helpful.