Hi
I am running exiftool version 8.15 on debian stable, which is the latest version available. When trying to print keywords from pdf files exiftool outputs nothing. But doublechecking with acroread clearly shows that keywords are available.
Issued command: $ exiftool -keywords file.pdf
Please send me your email address if you need the pdf file in question to test this, so i can send the file there.
Thanks in advance
Matthias
Hi Matthias,
I don't works with pdf files much, so just a guess: tag keywords is defined for (old) Iptc metadata and I wouldn't believe that's used in pdf. Many time, when shown "keywords", it's ment data from Xmp:Subject. Just an idea.. so try:
exiftool -Xmp:subject file.pdf
-or list all metadata and findout where inside file "keywords" are stored.
Bogdan
Hi Bogdan
Thanks for your reply.
Some background information:
I am trying to get close to a paper-less office. Therefor I scan all documents using gscan2pdf, run it through ocrapus, save it to pdf and add keywords to them so I can find them easily by my tiny search script (calling exiftool), which compares content and keywords with a search string.
Having this said all PDFs are generated by the same API and should therefor be equal to each other in terms of meta structure (I hope it's clear what I am talking about here).
I already tried to list all metadata to find where the "keywords" are stored in, it just did not show up on "some" PDFs. I guess there is something wrong with this specific PDFs themself, others work by querying '-keywords'. Strangely Adobe Acrobat Reader shows them. Is there some tool, script, ... I can use to check a PDF for integrity?
I will have a look into -Xmp:subject, when I am back home. Currently I am not able to test this.
Regards
Matthias
Hi Matthias,
As sid, I'm just guessing here. What I would do is:
exiftool -g1 -all filename.pdf
-to get metadata grouped by metadata sections. And if values you're after exist inside file, ExifTool will show them for sure! I don't use Acrobat either, so.. is it possible Acrobat keeps valued cached somewhere? and so, they arent really inside file?
Anyway, maybe you should wait for Phil to come -he can check what's "wrong" with your file.
Bogdan
Hi Bogdan
Unfortunately none of the commands showed the keywords. But as said earlier, acrobat reader does... (see attachement) :o
$ exiftool -Xmp:subject file.pdf
$ exiftool -keywords file.pdf
$ exiftool -Xmp:keywords file.pdf
$ exiftool -g1 -all file.pdf
---- ExifTool ----
ExifTool Version Number : 8.15
---- System ----
File Name : file.pdf
Directory : .
File Size : 185 kB
File Modification Date/Time : 2012:07:18 20:04:42+02:00
File Permissions : rw-r--r--
---- File ----
File Type : PDF
MIME Type : application/pdf
---- PDF ----
PDF Version : 1.4
I also tried writing to this field:
$ exiftool -Xmp:keywords='test' file.pdf
Error: Can't find Root object - file.pdf
0 image files updated
1 files weren't updated due to errors
I just managed to fix that by using pdftk. Here is how I did it:
$ pdftk file_broken.pdf output file_fixed.pdf
And now $ exiftool -g1 -all file_fixed.pdf
started to show up something useful.
Also writing to the file issuing $exiftool -keywords='test' file_fixed.pdf
worked...
So...problem solved! ;D
Please mail me a PDF where the keywords show up in Acrobat but not ExifTool (my mail is philharvey66 at gmail.com), or where ExifTool can't parse the PDF. This should not happen. I would like to take a look at this in more detail. It may be some flavour of PDF that ExifTool isn't handling properly.
Thanks.
- Phil