difficulty extracting all EXIF thumbnails

Started by Paul Heckbert, October 25, 2012, 06:00:38 PM

Previous topic - Next topic

Paul Heckbert

I've got C++ code that extracts thumbnails (and some other info) from TIFF and JPEG files.  The exiftool commandline utility has been very valuable for debugging, and its documentation (in particular http://www.exiftool.org/TagNames/EXIF.html) has been great for helping me learn the semantics of EXIF IFDs.

My difficulty is that I have a picture bl1.jpg at

https://www.dropbox.com/sh/5f1e6z13dlpr67u/Y5rNrsz7qs

whose EXIF thumbnail I'm unable to read with my current C++ code.  (Note that the picture is visually boring - just blue sky).  My code knows how to follow 0x8769 (ExifOffset) and 0xA005 (InteroperabilityOffset) tags into sub-IFDs, plus I have code to follow 0x8649 (PhotoshopSettings) tags and decode Photoshop resources with resource IDs 0x0409 and 0x040c (PhotoshopThumbnail) that work on 90% of the TIFF and JPEG files I've tested on (note that some pictures have no thumbnail).  But on this file, bl1.jpg, exiftool is able to find a PhotoshopThumbnail, but my code never finds a PhotoshopSettings tag and never finds a PhotoshopThumbnail resource in the two IFDs it explores.

I'm guessing that there are other IFDs in this file that my code is never finding and exploring.  Note: I want to keep my code C++ and self-contained.  Invoking perl to run exiftool is a very undesirable option in the app I'm developing.  Suggestions for finding all the IFDs?

Phil Harvey

I'll take a look at the JPG when I am on a faster internet connection.

What I would do is use the exiftool -htmldump feature to see what the file contains.  If you don't know about this feature, you should definitely try it out:

exiftool -htmldump FILE > out.html

Then open out.html in a web browser.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I've taken a look at your image.

The PhotoshopThumbnail is of course not stored in an IFD (a TIFF/EXIF-based storage format), but in a Photoshop IRB.  The -htmldump feature dumps the EXIF IFD's, but not the Photoshop IRB's.  Use the -v3 option to see the IRB's in a verbose dump.  The PhotoshopThumbnail is in IRB 0x040c, after a header of 28 bytes.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Paul Heckbert

Phil: thanks. -htmldump -v3 is helpful.  Followup questions (reply to ph at cs.cmu.edu if you prefer):

So after the two IFD subdirs, at address 02ae there's a string of bytes
ff ed 09 66 50 68 6f 74 6f 73  68 6f 70 20 33 2e 30 00
and then I see what look like several Photoshop resource blocks (I recognize the "8BIM").  I have code to follow 0x8649 (PhotoshopSettings) tags, as I mentioned, and code to parse Photoshop resource blocks, and they work on other TIFF / JPEG files, but it seems that these Photoshop resource blocks aren't inside an IFD, as I've witnessed before, but are appended to the IFDs.

Of course I see that
ff ed 09 66 50 68 6f 74 6f 73  68 6f 70 20 33 2e 30 00
contains "Photoshop 3.0" but otherwise what do the four bytes before it (ff ed 09 66) mean?

Can you point me to a doc that explains what to expect after EXIF IFDs and how to parse what follows the end of the IFDs?  If there's a photoshop resource that's not in an IFD and I don't find one immediately after the EXIF IFDs, can I assume there are no Photoshop resources?

Do Aperture-generated JPEGs contain thumbnails?

Phil Harvey

For an explanation of the meaning of ff ed 09 66, read the JPEG/JFIF specification.  This is a JPEG segment header, indicating an APP13 segment with 2406 (minus 2) bytes of data.

I can't answer your Aperture question, sorry.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Paul Heckbert

I got my code for Photoshop Thumbnail reading in APP13 segments working, thanks!

pie

What is the simplest way to view the embedded thumbnail in a jpg image? 
Thanks for the help.

pie


Phil Harvey

I don't know about the simplest way, but this will do it (type these 2 commands in a Terminal window):

exiftool -thumbnailimage -b image.jpg > thumb.jpg
open thumb.jpg


(this assumes that "image.jpg" is in the current directory)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).