exiftool shows no pdf keywords

Started by matze2ooo, July 18, 2012, 02:16:29 PM

Previous topic - Next topic

matze2ooo

Hi

I am running exiftool version 8.15 on debian stable, which is the latest version available. When trying to print keywords from pdf files exiftool outputs nothing. But doublechecking with acroread clearly shows that keywords are available.

Issued command: $ exiftool -keywords file.pdf

Please send me your email address if you need the pdf file in question to test this, so i can send the file there.

Thanks in advance
Matthias


BogdanH

Hi Matthias,

I don't works with pdf files much, so just a guess: tag keywords is defined for (old) Iptc metadata and I wouldn't believe that's used in pdf. Many time, when shown "keywords", it's ment data from Xmp:Subject. Just an idea.. so try:
exiftool -Xmp:subject file.pdf
-or list all metadata and findout where inside file "keywords" are stored.

Bogdan

matze2ooo

Hi Bogdan

Thanks for your reply.

Some background information:

I am trying to get close to a paper-less office. Therefor I scan all documents using gscan2pdf, run it through ocrapus, save it to pdf and add keywords to them so I can find them easily by my tiny search script (calling exiftool), which compares content and keywords with a search string.

Having this said all PDFs are generated by the same API and should therefor be equal to each other in terms of meta structure (I hope it's clear what I am talking about here).

I already tried to list all metadata to find where the "keywords" are stored in, it just did not show up on "some" PDFs. I guess there is something wrong with this specific PDFs themself, others work by querying '-keywords'. Strangely Adobe Acrobat Reader shows them. Is there some tool, script, ... I can use to check a PDF for integrity?

I will have a look into -Xmp:subject, when I am back home. Currently I am not able to test this.

Regards
Matthias


BogdanH

Hi Matthias,
As sid, I'm just guessing here. What I would do is:
exiftool -g1 -all filename.pdf
-to get metadata grouped by metadata sections. And if values you're after exist inside file, ExifTool will show them for sure! I don't use Acrobat either, so.. is it possible Acrobat keeps valued cached somewhere? and so, they arent really inside file?
Anyway, maybe you should wait for Phil to come -he can check what's "wrong" with your file.

Bogdan

matze2ooo

Hi Bogdan

Unfortunately none of the commands showed the keywords. But as said earlier, acrobat reader does... (see attachement) :o

$ exiftool -Xmp:subject file.pdf

$ exiftool -keywords file.pdf

$ exiftool -Xmp:keywords file.pdf

$ exiftool -g1 -all file.pdf
---- ExifTool ----
ExifTool Version Number         : 8.15
---- System ----
File Name                       : file.pdf
Directory                       : .
File Size                       : 185 kB
File Modification Date/Time     : 2012:07:18 20:04:42+02:00
File Permissions                : rw-r--r--
---- File ----
File Type                       : PDF
MIME Type                       : application/pdf
---- PDF ----
PDF Version                     : 1.4


I also tried writing to this field:

$ exiftool -Xmp:keywords='test' file.pdf
Error: Can't find Root object - file.pdf
    0 image files updated
    1 files weren't updated due to errors

matze2ooo

I just managed to fix that by using pdftk. Here is how I did it:


$ pdftk file_broken.pdf output file_fixed.pdf


And now $ exiftool -g1 -all file_fixed.pdf started to show up something useful.

Also writing to the file issuing $exiftool -keywords='test' file_fixed.pdf worked...

So...problem solved!  ;D

Phil Harvey

Please mail me a PDF where the keywords show up in Acrobat but not ExifTool (my mail is philharvey66 at gmail.com), or where ExifTool can't parse the PDF.  This should not happen.  I would like to take a look at this in more detail.  It may be some flavour of PDF that ExifTool isn't handling properly.

Thanks.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).