Main Menu

XMP metadata in pdf

Started by elkesan, August 03, 2018, 03:55:30 AM

Previous topic - Next topic

elkesan

I try add metadata to pdf-file. It works well, but XMP is not so clear.

Reason of unilimited possibilities add metadata in native programs I use exiftool. Eg. Corel Draw can add only some metadata etc.

I use command line command in command prompt:
exiftool -a -sep ",  " -csv=pdfmuunto.csv *.pdf

And this csv-file is:
SourceFile;Description;Title;Keywords;Categories;Author;XMP-dc:Subject
D10000692A.pdf;Lorem ipsum and long explanation;User manual;Manufacturername,  Productnumber,  Productnumber2;Manual,  Usermanual;Me;Very long liiba laaba

This csv put metadata properly exl xmp. Field "Categories" is not standard field reason of pdf-standard. PDF standard contain title, author, keywords but not category (please compare "Categories" in photo formats, pdf standard not understand this). When I import this pdf to any other system it read this Categories and use it...

1, what is wrong in this command? When I open this document Adobe reader, it put this "subject" after keywords... So, how I can add this XMP-field in csv?

2, reason I try add this XMP...Subject is "Description" limit. I can put any string to "Description" and all pdf reader put this string to pdf-field "Subject". Ok, this is ok. But limit is 256 character. I want add 2000 character lenght description, just same as in photo files. Looks XMP accept if--- of course, it standard text is really fuzzy, but looks there is no limit. So: I want make "Description" as in image files, 2000 character, but in pdf-file. So: "what is right field name in csv-file?" "Or, any other way make this?"

Edit. Now it stops works totally. As you know, adobe reader show document preferences, this metadata. Author, etc. Also there is tab named "Modified". Sorry, my reader is finnish, no any idea what it is english version. I have some pdf-files 2-3 days ago, and there is this "Categories", my own and I make it using Exiftool. BUT, now I cannot make this. I test also command line: exiftool -Categories="Ohje" test.pdf. Nothing happened. Earlier this "Categories" metadata is possible add, but not yet. I test all possible, delete metadata (all:all=), and -f -a -wc wcg etc etc etc.

So: Is it any method add this "Categories"? Looks with Exiftool it is not stable, but I am sure if I add enough -x -y -z- -e etc, it will work. But looks there is no any good for this... And, if I export this working pdf (with categories) I can see there is field named "Categories", but importing is not possible....

Phil Harvey

Your CSV (comma separated values) file is not comma separated.  Change the semicolons to commas to get it to work.

Also, you may want to read FAQ 10 if you run into problems with special characters.

If you run into a length limitation, you are trying to write the old IPTC/IIM, not XMP.  Specify "XMP:" before the tag name to make sure you are writing XMP.

Also, see FAQ 3 if other software isn't reading what you write.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

elkesan

Ok, looks this is bug in Exiftool.

Now test basic:
exiftool -xmp:Categories=test testingpdf.pdf
or
exiftool -xmp:Categories="test" testingpdf.pdf
or
exiftool -xmp:Categories='test' testingpdf.pdf
(Test also "-xmp" and "-XMP"...)

Then open pdf-file in Adobe Reader (not depend Reader version). File -> preferences tab "modified" (I have Finnish reader and there is tiedosto > ominaisuudet, muokattu-tab). This "Categories" is possible see in tab "modified".

http://kuvanjako.fi/6mx95.jpg

Here you can see; this basic metadata is possible add with Exiftool. But additional XMP-metadata "Categories" not. So: exiftool -xmp:Categories=xxx filename add nothing.

Please notify: Exiftool testingpdf.pdf show this "Categories", BUT it is not real test. Real test is open file in original software, Adobe Reader and check work it there or not. So: Not work in Adobe reader ---> bug in exiftool. This is easy to test, simply make this exiftool -xmp:categories etc, and open it file in Adobe Reader. You can see: Metadata "Categories" not work.


Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

elkesan

Still I am quite sure this is bug in exiftool. I use exiftool in windows (10, 64 bit) command line.

As I wrote:
exiftool -xmp:Categories="test" testingpdf.pdf
or
exiftool -xmp:Categories='test' testingpdf.pdf

This both add this Categories-tag but it is not visible in Adobe Reader.

As you see, -xmp:Categories='test' just as you said.

With this new information I test:
exiftool -xmp:Categories='$test$' testingpdf.pdf
Now this "Categories" is '$test$' but Adobe Reader cannot read it.

Then I test put this '$ ---- $' also to several other places, -xmp:'$Categories$', ("unknow..."), before -xmp....

Summa summarum: I am sure this is bug in Exiftool. You can test this simply. In windows command line put exiftool -xmp:Categories='test' testingpdf.pdf. Then you can write exiftool testingpdf and you can see this "Categories" is inside pdf. But. It is not important. Important is, Adobe Reader does not regognize it. ---> So: Exiftool write this, but make it wrong, non-standard way.

https://drive.google.com/file/d/1qtNGUHV_e9L8Fwstnb6bso_icKROkDez/view?usp=sharing

This is my test pdf. As you see, with exiftool you can see this "Categories" but Adobe Reader cannot read it "Categories"; if it work, it show.

This file contain this "Categories" and I really cannot imagine, how it is possible:
https://drive.google.com/file/d/1yOa4_-gp_2K-gMpPKb8_Cg9jqECP6utK/view?usp=sharing

So: Strongly I think exiftool contain any bug.


Phil Harvey

This is not an ExifTool bug.

You aren't writing the correct metadata.

Please try to understand what FAQ 3 is telling you.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

As Phil says, it's not a bug.  When you run the command in FAQ 3, you will see that your working PDF has three tags that contain the Categories data, XMP-acdsee:Categories, XMP-pdfx:Categories, and PDF:Categories.   .  When you run it on your test file, you will see that your test file, only XMP-acdsee:Categories.

Obviously, Reader doesn't recognize XMP-acdsee:Categories.  So if you want the Categories tag to show up in Reader, you don't want to write to that.

Now, the problem is that XMP-pdfx:Categories and PDF:Categories are not tags that Exiftool has definitions for, so while it can output the data, it doesn't know how to write them.  XMP-pdfx:Categories is probably easy enough to create a definition for by reading the example.config file.  I'm not sure about how easy writing the PDF tag would be, though.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

StarGeek

After downloading Reader and taking a look at your file there, the source of the whole problem becomes clear.  The Categories tag you want to write to is Custom.  It is not a standard property.  You can't expect exiftool to be able to write a tag that doesn't exist in any other pdf.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

elkesan

Yes, now much more clear. -a -G1 -s show all fields. Looks my biggest mistake is use csv-export; then field names are not exact.

Looks this "Categories" tag depend of software. I don't remember how I succeeded, (d10000692), but Adobe reader found this "Categories". Looks I edited this file also with exiftool---
In this file:
[PDF]           Categories                     Ohje, Käyttöohje
[XMP-pdfx]  Categories                     Ohje,  Käyttöohje
[xmp-acdsee] Categories                 Ohje,  Käyttöohje
So, this "Categories" is in THREE place and this Adobe Reader can read this. But, I cannot remember how I force this...

THEN, I take clean pdf-file and Adobe Acrobat. I write it "Categories" with Adobe Acrobat, and:
[PDF]           Categories                     Ohje, Käyttöohje
[XMP-pdfx]  Categories                     Ohje,  Käyttöohje

But:
When I read forum and tags etc etc, looks this XMP-pdfx is not writable. I test also both from command line: exiftool -PDF:Categories="liiba" and -XMP-pdfx:Categories="laaba" and this not work. XMP pdfx Tags: [no tags known] I read from pdf-tag-page.

So: Looks Adobe reader can read only this "PDF" and "XMP-pdfx". And, exiftool cannot write this fields. You are right, it is not bug in exiftool, I am sorry. I does not read manuals carefully. Exiftool not support this tags, or still need any deep knowledge... or not possible from command line.

--------------------
SO: I think, if I write this "Categories" with Adobe Acrobat, it is it "right" information. And, XMP-pdfx and PDF are not writable.

Phil Harvey

#9
You need to create user-defined tags to write these.  The attached config file will allow you to do this with the command:

exiftool -config categories.config -xmp-pdfx:categories="test" -pdf:categories="test" FILE

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

elkesan

Ah, thanks, now next step, config-file-studying. Thanks for patience.

Looks way is right; now it is possible write this -xmp-pdfx. Looks Adobe Reader does not understand this.

But, "Sorry, PDF:Categories doesnt exist or isnt writable". Looks Adobe Acrobat write really tags to two places: -xmp-pdfx and pdf. Looks this "main" is any other place. But.... when read file edited by Adobe Acrobat there is tag [PDF].

Looks now it is two possibilities:
- Adobe Acrobat write this [PDF] using any special way. My brain says, config-file say "Main", but... I cannot imagine what it any other in this "main".
[File]          MIMEType                        : application/pdf
[PDF]           PDFVersion                      : 1.7
[PDF]           Linearized                      : No
[PDF]           PageCount                       : 4
[PDF]           CreateDate                      : 2018:08:21 15:41:41+03:00
[PDF]           ModifyDate                      : 2018:08:21 15:41:41+03:00
[PDF]           Categories                      : Test, testing, Yucca
[XMP-x]         XMPToolkit                      : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
[XMP-pdfx]      Categories                      : Test, testing, Yucca
[XMP-xmp]       ModifyDate                      : 2018:08:21 15:41:41+03:00
[XMP-xmp]       CreateDate                      : 2018:08:21 15:41:41+03:00
[XMP-xmp]       MetadataDate                    : 2018:08:21 15:41:41+03:00
[XMP-dc]        Format                          : application/pdf
[XMP-xmpMM]     DocumentID                      : uuid:935c4c92-5479-44d2-929b-39aba5a7f631
[XMP-xmpMM]     InstanceID                      : uuid:ba7e1437-88d6-4f21-837f-31302fc85a0d


Now I clean pdf-file and make this same with exiftool. Example config-file exl. this PDF reason it is not writable.
[ICC-meas]      MeasurementFlare                : 0.999%
[ICC-meas]      MeasurementIlluminant           : D65
[XMP-x]         XMPToolkit                      : Image::ExifTool 11.10
[XMP-pdfx]      Categories                      : testaus, specialcat, pcat


Ok: Now we meet second possilibities. First is, "PDF-main" is not writable. Second is:

[XMP-x]         XMPToolkit                      : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39


So, this second possibility is, Adobe Reader Regognize XMP-toolkit, and if not found word "Adobe" it does not show Categories in Adobe Reader.

Ok. My next work is try invite, how Exiftool can write this "PDF-main". If it not work, then must try force-write this xmptoolkit-name as in example...


Phil Harvey

Sorry.  I added PDF Categories to the wrong table in the config file I posted (it should go in the Info table, not the Main table).  I have fixed this and updated the config file in my last post.  With this version you should be able to write PDF:Categories.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

elkesan

Thanks for patience. I am really sorry, I am quite n00b with all fields. Now it works. But: I try study exiftool more; eg. "exiftool -a -G1 -s file" show much information, but really need I study more "how to add also table to this list". I regognize "this Main, this XMP must be table", but... really need more studying.

Interesting is, Adobe Acrobat write this information to XMP and Info-table. When edit file in Adobe Acrobat, "File -> Properties > Custom" it add it information to this two table. BUT, I test also. I clean all tags (-all:all= etc) and then write only info-table. (-pdf:Categories="test"). Adobe Reader understand this also.

Okay, final result: "now it works".

--
Ps. "Subject"-field is standard. But, looks all field lenght limit is 256 character. In all tags in pdf. Any idea how to break this? I think add my internal information. "Subject" is standard, short description. But I think add "Caption_Abstract" as in image-files, 2000 character. Any table in pdf which accept overlengt field?

Phil Harvey

ExifTool doesn't impose any length limitation on PDF or XMP metadata.  IPTC metadata (Caption-Abstract) does have size limits that are enforced by ExifTool unless you use the -m option.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).