Bizarre PDF Keywords Metadata Handling

Started by mhanft, December 09, 2023, 05:44:23 AM

Previous topic - Next topic

mhanft

Hi,

after reading and understanding FAQ#3, I have managed to edit PDF metadata (Creator, Producer and all that) by doubling the data (for example, writing "-PDF:Title" as well as "-XMP-dc:Title" and so on).

However, the "Keywords" still remain a miracle.

In order to find out which fields are concerned, I used Acrobat Professional to write "One, Two, Three" into the "Keywords" field. Apparently, there are three internal fields where these keywords are stored:

mh@home01 ~ $ exiftool -a -G1 -s test.pdf | grep "Three"
[PDF]           Keywords                        : One, Two, Three
[XMP-dc]        Subject                         : One, Two, Three
[XMP-pdf]       Keywords                        : One, Two, Three

Ok, now I want to change the keywords as follows:

mh@home01 ~ $ exiftool -PDF:Keywords="Four, Five, Six" -XMP-dc:Subject="Four, Five, Six" -XMP-pdf:Keywords="Four, Five, Six" test.pdf
    1 image files updated

which leads to good-looking

mh@home01 ~ $ exiftool -a -G1 -s test.pdf | grep "Four"
[PDF]           Keywords                        : Four, Five, Six
[XMP-dc]        Subject                         : Four, Five, Six
[XMP-pdf]       Keywords                        : Four, Five, Six

but when inspecting the PDF metadata using normal (Windows) "Adobe Reader", the Keywords field is

"Four, Five, Six"; "Four, Five, Six"
I have tried each and every combination of space, comma, semicolon etc., but the Keywords field is always doubled, with one single exemption: If I use just spaces, the Keywords field is shown correct in Adobe Reader (with spaces) - although in one of the internal fields, there are mysterious commas appearing:

mh@home01 ~ $ exiftool -PDF:Keywords="Four Five Six" -XMP-dc:Subject="Four Five Six" -XMP-pdf:Keywords="Four Five Six" test.pdf
    1 image files updated

mh@home01 ~ $ exiftool -a -G1 -s test.pdf | grep "Four"
[PDF]           Keywords                        : Four, Five, Six
[XMP-dc]        Subject                         : Four Five Six
[XMP-pdf]       Keywords                        : Four Five Six

and Windows Adobe Reader now displays indeed "Four Five Six" in the Keywords field.

I can live with that (although I'd prefer comma-separated "Four, Five, Six" or "Four,Five,Six" displayed), but isn't that strange somehow?

By the way, it's a PDF-A/3 file with an embedded XML attachment - but as far as I have seen, the above functionality seems to be the same with "just normal" PDF files.

Any comments?

Thanks in advance,

-Matt

StarGeek

Quote from: mhanft on December 09, 2023, 05:44:23 AMHowever, the "Keywords" still remain a miracle.
Welcome to the madness that is metadata.

QuoteOk, now I want to change the keywords as follows:

mh@home01 ~ $ exiftool -PDF:Keywords="Four, Five, Six" -XMP-dc:Subject="Four, Five, Six" -XMP-pdf:Keywords="Four, Five, Six" test.pdf
    1 image files updated

which leads to good-looking

It might look good, but the data isn't stored correctly because there are two different types of tags here.

PDF:Keywords is a simple string tag.  When you set it as you did, the value is exactly that, "Four, Five, Six".  I believe Adobe is treating this as a comma separated list, though it does raise the question of how it treats a keyword that includes a comma, such as "Smith, John".

But XMP-dc:Subject is different.  It is a List Type tag (see FAQ #17, List-type tags).  That means that each keyword is saved as individual separate entries.  When you set the value to "Four, Five, Six", as you found out, you are setting a single keyword to exactly that, "Four, Five, Six".  What you want is three entries
Four
Five
Six

XMP-pdf:Keywords is even messier. Exiftool treats it as a List Type tag, the same as Subject, but it is saved in the file as either a comma or semicolon separated string.  This probably leads to all sorts of problems if you want to include a comma or semicolon in the keyword. I assume that Adobe probably treats this the same as PDF:Keywords.

The command you should be using in this case would be either to use the -sep option
exiftool -sep ", " -PDF:Keywords="Four, Five, Six" -XMP-dc:Subject="Four, Five, Six" -XMP-pdf:Keywords="Four, Five, Six" test.pdf
or set them individually
exiftool -PDF:Keywords="Four, Five, Six" -XMP-dc:Subject=Four -XMP-dc:Subject=Five -XMP-dc:Subject=Six -XMP-pdf:Keywords=Four -XMP-pdf:Keywords=Five -XMP-pdf:Keywords=Six test.pdf

The one tip I can offer would be to skip using the PDF group tags (PDF:Keywords, etc) unless your using a program that can't read the XMP tags.  They are an older standard and Adobe writes them for backward compatibility, but they are pushing the newer XMP standard.

All this is might be if you need to keep the PDF/A setting.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

mhanft

Quote from: StarGeek on December 09, 2023, 10:38:01 AMThe command you should be using in this case would be either to use the -sep option
exiftool -sep ", " -PDF:Keywords="Four, Five, Six" -XMP-dc:Subject="Four, Five, Six" -XMP-pdf:Keywords="Four, Five, Six" test.pdf
or set them individually
exiftool -PDF:Keywords="Four, Five, Six" -XMP-dc:Subject=Four -XMP-dc:Subject=Five -XMP-dc:Subject=Six -XMP-pdf:Keywords=Four -XMP-pdf:Keywords=Five -XMP-pdf:Keywords=Six test.pdf
Ah, -sep ", " works as expected - thanks a lot! I must have overlooked FAQ#17 - sorry.

Is it correct that -sep ", " is only taken into account for lists - so that I can use it for simple strings without risk (and set a lot of metadata with one single command)? For example,

exiftool -sep ", " -PDF:Keywords="Four, Five, Six" -XMP-dc:Subject="Four, Five, Six" -XMP-pdf:Keywords="Four, Five, Six" -PDF:Producer="PHP, Ghostscript, Mustang, Exiftool, QPDF" -XMP-pdf:Producer="PHP, Ghostscript, Mustang, Exiftool, QPDF" test.pdf

No problem with commas within "Producer" - right?

Thanks again,

-Matt

StarGeek

Quote from: mhanft on December 10, 2023, 05:58:30 AMAh, -sep ", " works as expected - thanks a lot! I must have overlooked FAQ#17 - sorry.

No need to be sorry.  Part of the problem is that metadata is so complex that people just starting out don't know what they should be looking for.  Exiftool knows 27,401 different tags.  How would anyone new know how to deal with all that.

QuoteIs it correct that -sep ", " is only taken into account for lists - so that I can use it for simple strings without risk (and set a lot of metadata with one single command)?

Yes.  It has to do with the way the data is dealt with internally.  For example, list type tags in XMP are stored like this, each item is separate
<dc:subject>
 <rdf:Bag>
  <rdf:li>keyword 1</rdf:li>
  <rdf:li>keyword 2</rdf:li>
  <rdf:li>keyword , with comma</rdf:li>
 </rdf:Bag>
</dc:subject>

where simple strings would be
<rdf:Description rdf:about=''
  xmlns:pdf='http://ns.adobe.com/pdf/1.3/'>
  <pdf:Producer>PHP, Ghostscript, Mustang, Exiftool, QPDF</pdf:Producer>
 </rdf:Description>
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype