ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: matze2ooo on July 18, 2012, 02:16:29 PM

Title: exiftool shows no pdf keywords
Post by: matze2ooo on July 18, 2012, 02:16:29 PM
Hi

I am running exiftool version 8.15 on debian stable, which is the latest version available. When trying to print keywords from pdf files exiftool outputs nothing. But doublechecking with acroread clearly shows that keywords are available.

Issued command: $ exiftool -keywords file.pdf

Please send me your email address if you need the pdf file in question to test this, so i can send the file there.

Thanks in advance
Matthias

Title: Re: exiftool shows no pdf keywords
Post by: BogdanH on July 19, 2012, 01:10:07 AM
Hi Matthias,

I don't works with pdf files much, so just a guess: tag keywords is defined for (old) Iptc metadata and I wouldn't believe that's used in pdf. Many time, when shown "keywords", it's ment data from Xmp:Subject. Just an idea.. so try:
exiftool -Xmp:subject file.pdf
-or list all metadata and findout where inside file "keywords" are stored.

Bogdan
Title: Re: exiftool shows no pdf keywords
Post by: matze2ooo on July 20, 2012, 05:06:00 AM
Hi Bogdan

Thanks for your reply.

Some background information:

I am trying to get close to a paper-less office. Therefor I scan all documents using gscan2pdf, run it through ocrapus, save it to pdf and add keywords to them so I can find them easily by my tiny search script (calling exiftool), which compares content and keywords with a search string.

Having this said all PDFs are generated by the same API and should therefor be equal to each other in terms of meta structure (I hope it's clear what I am talking about here).

I already tried to list all metadata to find where the "keywords" are stored in, it just did not show up on "some" PDFs. I guess there is something wrong with this specific PDFs themself, others work by querying '-keywords'. Strangely Adobe Acrobat Reader shows them. Is there some tool, script, ... I can use to check a PDF for integrity?

I will have a look into -Xmp:subject, when I am back home. Currently I am not able to test this.

Regards
Matthias

Title: Re: exiftool shows no pdf keywords
Post by: BogdanH on July 20, 2012, 07:30:55 AM
Hi Matthias,
As sid, I'm just guessing here. What I would do is:
exiftool -g1 -all filename.pdf
-to get metadata grouped by metadata sections. And if values you're after exist inside file, ExifTool will show them for sure! I don't use Acrobat either, so.. is it possible Acrobat keeps valued cached somewhere? and so, they arent really inside file?
Anyway, maybe you should wait for Phil to come -he can check what's "wrong" with your file.

Bogdan
Title: Re: exiftool shows no pdf keywords
Post by: matze2ooo on July 21, 2012, 06:14:40 AM
Hi Bogdan

Unfortunately none of the commands showed the keywords. But as said earlier, acrobat reader does... (see attachement) :o

$ exiftool -Xmp:subject file.pdf

$ exiftool -keywords file.pdf

$ exiftool -Xmp:keywords file.pdf

$ exiftool -g1 -all file.pdf
---- ExifTool ----
ExifTool Version Number         : 8.15
---- System ----
File Name                       : file.pdf
Directory                       : .
File Size                       : 185 kB
File Modification Date/Time     : 2012:07:18 20:04:42+02:00
File Permissions                : rw-r--r--
---- File ----
File Type                       : PDF
MIME Type                       : application/pdf
---- PDF ----
PDF Version                     : 1.4


I also tried writing to this field:

$ exiftool -Xmp:keywords='test' file.pdf
Error: Can't find Root object - file.pdf
    0 image files updated
    1 files weren't updated due to errors
Title: Re: exiftool shows no pdf keywords
Post by: matze2ooo on July 21, 2012, 06:34:03 AM
I just managed to fix that by using pdftk. Here is how I did it:


$ pdftk file_broken.pdf output file_fixed.pdf


And now $ exiftool -g1 -all file_fixed.pdf started to show up something useful.

Also writing to the file issuing $exiftool -keywords='test' file_fixed.pdf worked...

So...problem solved!  ;D
Title: Re: exiftool shows no pdf keywords
Post by: Phil Harvey on July 22, 2012, 10:25:13 AM
Please mail me a PDF where the keywords show up in Acrobat but not ExifTool (my mail is philharvey66 at gmail.com), or where ExifTool can't parse the PDF.  This should not happen.  I would like to take a look at this in more detail.  It may be some flavour of PDF that ExifTool isn't handling properly.

Thanks.

- Phil