Apologies if it's covered somewhere, I did a fair amount of googling and experimentation before this post, and I've used this awesome tool before.
If I use the Verbose switch, I get all the data of the PDF in question. However, I cannot seem to figure out what the switch(es) for the specific values I would want. I can see the tag names in the verbose readout, but when I try to issue those I get no data returned. I'm using the command line directly.
First I get output that I am able to get specific values for such as PDFVersion, Linearized, etc if I just issue the regular exiftool command with no switch. However, with the verbose switch, I see "PDF dictionary (1 of 1) with 4 entries:" and I am unable to crawl this data that I assume is for the embedded images. Now looking at the tags between " [ ]" I see familiar constructs such as Root, Pages, Kids, Resources, XObject and so on including the information I want. However if I issue a command like
exiftool -PDF:Root:Pages:Kids:Resources:XObject myfile.pdf
I have tried simple things
exiftool -PDF:all myfile.pdf
And that offers no output either so I assume anything under the PDF tag will not work either. I also thought perhaps those values were being calculated but -composite is empty as well. Any ideas? It's probably something dumb on my part that I glossed over...
And so on, I get no data returned. No errors, just no data. Here's a screenshot of the Verbose output that I'd like to capture specific values from. I just can't seem to see what's wrong. I experimented with explicitly using -ee and a wide range of top-level tag names to get even the simplest output in return in hope of extending that, but no dice.
http://imgur.com/pgfDlNr
ExifTool will not extract just anything from a PDF file. It is a metadata utility, so in general I restrict the pre-defined tags to metadata only. See the PDF documentation (https://exiftool.org/TagNames/PDF.html) for a list of tags that are extracted.
If you want to extract anything else, you will need to create user-defined tags in a config file, but this may be very complicated to do.
ExifTool really isn't designed to disassemble a PDF to extract all embedded files/images. For this, there may be other utilities that are more suitable.
- Phil
Edit: Apparently it does more than I remembered.
Thanks for the reply Phil - So then, should I assume that the tag names I see that match what I see under the PDF tag from your link are just coincidentally matching and that the access to them is only as result of a "grab everything" from the verbose switch and isn't able to be crawled? I guess what I was seeking originally is how the verbose switch knows of these embedded images, their properties, and values per my screenshot but that the switches can't discern them.
I'll look into a custom defined file to attempt this then. Appreciate your time, thanks again for such a great utility.
Oh, right. XObject is listed in the PDF tag documentation, and the reading the documentation I see that the contained image information is extracted when the ExtractEmbedded (-ee) option is used. I had forgotten about this. So you can extract all embbeded images with this command:
exiftool -ee -embeddedimage -b -W %t%c.%s some.pdf
This will create a bunch of files called "embedded###.jpg" in the current directory (if the embedded images are JPG format).
I should learn not to rely on my memory when describing exiftool capabilities. I am continually surprised by all of the cool things that it does. :)
Good thing it's all documented. ;)
- Phil