Managing data beyond JPEG EOI

Started by Bilge, August 31, 2012, 10:37:08 AM

Previous topic - Next topic

Bilge

Hi,

I can't find any functions to manage data beyond JPEG's EOI marker. I'm aware that this may be used legitimately by some embedded thumbnail implementations and also wannabe hackers who realised stuffing arbitrary data at the end of the file doesn't prevent most renderers from displaying the image.

I'd really like to strip data beyond the EOI to prevent data stuffing as shown in the linked video. Can ExifTool help with this?

Phil Harvey

Yes.

The "Trailer" group represents information after the JPEG EOI.  To quote the exiftool application documentation:

    exiftool -trailer:all= image.jpg

    Delete any trailer found after the end of image (EOI) in a JPEG file.  A
    number of digital cameras store a large PreviewImage after the JPEG EOI, and
    the file size may be reduced significantly by deleting this trailer.  See
    the JPEG Tags documentation for a list of recognized JPEG trailers.


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Bilge

Thaks, Phil. Seems like you've thought of everything!

Bilge

Even if I remove the trailer I still have (presumably invalid) entries for thumbnails in tags such as PhotoshopThumbnail, OtherImage and ThumbnailImage. Moreover, some tags such as ThumbnailOffset and ThumbnailLength still exist. Besides removing each one individually, is there some way to mass remove thumbnail tags?

Phil Harvey

The PhotoshopThumbnail isn't stored in the trailer.  The other pointers are likely stored in MakerNote information.  In MakerNotes, tags may not be added or deleted.  Instead, you can write a  small dummy image to the file (ie. "-previewimage=dummy -m").

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Bilge

Don't you think it would be a good idea to have a smart group that groups all the thumbnail-related tags so that it's easy to manipulate thumbnails?

Phil Harvey

I don't know why you want this, or what exactly you mean.  ThumbnailImage, PreviewImage, JpgFromRaw and OtherImage cover most of the embedded images, but only sometimes are some of these stored in the trailer.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Bilge

I would want to do this to strip thumbnails from the file, regardless of where they are stored. If you're not interested in creating a group name then I'll have to manage the tag list myself as well as keep my eyes open for any other varieties that crop up in the wild which altogether seems less convenient than having exiftool recognize thumbnail tags.

Similarly it may also be useful to be able to identify and manage tags that embed binary data since I imagine these could be abused to stuff arbitrary data hidden inside the image unless they have inherent size limitations.

Essentially I'm interested in sanitizing JPEGs without having to completely wipe its metadata.

Phil Harvey

You should keep in mind that ExifTool will pass along any unknown data when writing (unless you delete everything).

So to properly sanitize an image, you should delete all metadata then copy back only what you want to keep.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Bilge

I feel that given the sheer volume of information that can be stored I'd rather write an exclusion policy than an inclusion one. I'm not a big fan of a draconian scraping policy that only includes what I want right now. What if I realise something is important later? Too late then, already removed it earlier. Managing a small group of group tags I don't want to see seems better than trying to manage a gigantic list of permissible ones.

Are you emphatically against a thumbnail tag group?

Phil Harvey

Sounds perfectly reasonable.

I don't think a thumbnail group would be all that useful for other users in general, but you can easily create one for your own use by creating a shortcut in the config file.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Bilge

Can I target EXIF:0x0201 in some way or does ExifTool not support specifying Tag IDs?

Phil Harvey

There is no way to currently reference a known tag by its ID number.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Bilge

Quote from: Phil Harvey on September 05, 2012, 12:04:18 PM
There is no way to currently reference a known tag by its ID number.

Would you add this to the tag syntax?

Phil Harvey

No, sorry.  This wouldn't be an easy addition.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).