Metadata Validation in Exiftool

Started by Archive, May 12, 2010, 08:54:31 AM

Previous topic - Next topic

Archive

[Originally posted by metadatacrucher on 2009-02-16 20:31:41-08]

Today I stumbled upon http://www.impulseadventure.com/photo/jpeg-snoop.html" target="_blank">JPEGSnoop, "a free Windows application that examines and decodes the inner details of JPEG and MotionJPEG AVI files. It can also be used to analyze the source of an image to test its authenticity." (quote)

While not being free in the FOSS sense, its photo integrity checks deserve special mention:

 - It can verify whether a photo was modified in an image editor or comes directly from the camera

 - It can suggest a list of possible camera models even for metadata-wiped photos (Note: if the image was modified in an editor, it will provide the editors name and a possible jpeg quality setting)

JPEGSnoop determines this information by http://www.impulseadventure.com/photo/jpeg-snoop-uses.html" target="_blank">comparing JPEG Quantisation Table Checksums, which is much more manipulation-safe than just relying on EXIF/IPTC Metadata.

It would be nice to see similar metadata validation capabilities in Exiftool (e.g. to validate that an image was unmodified, that the metadata hasn't been tampered,..), but I'm afraid that's a bit off the road and might also exceed Perl's data processing capabilities. But maybe this can be achieved by embedding external tools?

regards,

Franz

Archive

[Originally posted by metadatacrucher on 2009-02-16 20:32:44-08]

P.S.: JPEGSnoop runs fine under Wine

Archive

[Originally posted by exiftool on 2009-02-17 10:42:39-08]

Hi Franz,

I actually have a local version of exiftool which does this.  I
haven't added it to the production version due to the work
involved in getting camera/software samples of all possible
JPEG qualities.  Building up a database like this would be a
LOT of work, and require more resources than I have at my
disposal.

- Phil

Archive

[Originally posted by metadatacrucher on 2009-02-17 12:58:29-08]

Hi Phil,

this sounds promising. Concerning the sample database: maybe you can join forces with the author of JPEGSnoop? His software currently ships with a database of approx. 3000 signatures. However, JPEGSnoop has quite long release intervalls, so he might be interested in decoupling his software and the signature database. By the means of this process, the database could be opened up for other applications.

So exiftool could make use of this database and in return also generate compression samples for it (Maybe in form of a local signature file that can be uploaded by the user). I see a win-win situation for both applications: exiftool gets an interesting feature and the JPEGSnoop signature database could both grow in quantity and quality because of the large exiftool userbase

- Franz

Archive

[Originally posted by exiftool on 2009-02-17 14:06:28-08]

3000 is a pretty tiny database.  I already have a database close to
that size, but I don't consider it to be very comprehensive.
Some applications can generate nearly a thousand different
signatures by themselves (if they have a very fine grained JPEG
quality slider), and my database will only recognize a
handful of different applications.

- Phil

Archive

[Originally posted by metadatacrucher on 2009-02-17 15:49:30-08]

I agree. My proposal: just implement the possibility to calculate and store the compression checksums of images in some local hash file with a documented structure. I'm sure sooner or later someone will start a web service that allows uploading/sharing of these files.

Maybe you can offer your database as an additional download to play with.

About the applications: surely there is the problem of endless JPEG quality settings, but luckily nearly all dominant image processing applications offer scripting/commandline interfaces. This surely boosts the checksum retrieval process. IrfanView for example allows the command-line conversion to JPEG and has an explicit Quality parameter.

- Franz

Archive

[Originally posted by exiftool on 2009-02-17 17:45:12-08]

There you go Franz,

I've just released version 7.69 with this feature.

The MD5 sum is calculated only if you specifically request the
-JPEGDigest to be extracted.

The database is contained in the file lib/Image/ExifTool/JPEGDigest.pm.

- Phil

Archive

[Originally posted by metadatacrucher on 2009-02-17 22:01:12-08]

Hi Phil,

thanks a lot for this amazing feature! I already gave it a try, and it offers a fantastic insight into the origin of a jpeg. unfortunately, some digest collisions occur between pictures from cameras and applications, e.g. definitely unmodified photos from my olympus e410 are reported as "Apple Quicktime Quality 845". furthermore the camera seems to vary the compression, because pictures with identical settings yield different compression signatures (some of them unknown).

So while my hope to uniquely identify camera models by compression digest was a bit disappointed, i made a quite interesting discovery: Photosharing Services such as Flickr, Picasa or Facebook all use JPEG Group Library to generate their preview images, but different Quality settings are applied: Flickr seems to use 90 for thumbnails and 96 for preview images, Picasa and Facebook use 85. This setting is global and should apply for all preview images. So one could use image dimensions in conjunction with quality settings to distinguish between downloaded pictures from Flickr/Picasa or Facebook or at least narrow down the possible source to some online services.

- Franz

Archive

[Originally posted by exiftool on 2009-02-18 00:46:19-08]

Yes, I think I also noticed a case where a camera used the
same DQT tables as some software, but I didn't research this
in great detail.  But I'm surprised if a camera uses different
tables for the same settings.  I do think that I noticed that
different tables may be used if the image is simply rotated,
so you must also consider the image orientation (and possibly
other seemingly unrelated changes).  If you
do any substantial work on this, please consider sending the
results to me so I can add them to the database.

- Phil

Archive

[Originally posted by metadatacrucher on 2009-02-18 09:16:33-08]

Hi Phil,

yes it seems that variable JPEG compression is getting more and more popular, especially in consumer DSC's. Calvin Hass, the author of JPEGSnoop, has a good writing about this:

http://www.impulseadventure.com/photo/variable-compression.html" target="_blank">Variable JPEG Compression

His software also offers a function to scan an executable for DQT tables, so theoretically one could scan a firmware files of cameras for all possible DQT tables. As always, there are some exceptions, e.g. the Sony H series uses integer prescale values to adapt the DQT tables to some shooting conditions (according to Calvin's writeup).

Very interesting topic, but one of the "miracle inside an enigma" type ;-)

- Franz

Archive

[Originally posted by dzurn on 2009-02-18 11:35:56-08]

I'll take a look at that new version. These capabilities sound really mind-blowing, especially in trying to identify possible photo-manipulations.

JPEGSnoop is Windows/Wine only, but EXIFTool has a larger potential audience as a command-line tool.

My fantasy involves a Browser plugin for FireFox that would do a little background processing of larger images, and can give a little red border if it looks like the JPEG is suspicious. Or, just select an image if the drain is too much.

However, since much of the browsing time is spent reading/looking, the idle time after the page loads would be a nice time to check in the background the photos you are viewing.

This would help people more easily identify when they are getting smoke blown into their faces and make transparent some of the worst excesses.

Thanks

Darryl

Archive

[Originally posted by exiftool on 2009-02-18 16:15:38-08]

I've been doing a bit more testing here.  There are many
cases where various Nikon cameras use the same DQT
tables as some Canon, Pentax and Panasonic models.

Pity.

- Phil

Archive

[Originally posted by metadatacrucher on 2009-02-18 19:25:56-08]

I think we have to bury the theory that we can identify the camera model/manufacturer just by analyzing the DQT.

Nevertheless, some useful applications remain: we can verify to a certain degree if an original image was modified or not in an editor (by comparing the claimed EXIF make/model to a family of "allowed" JPEG digests).

And we can even check if lossless transformations were performed to an image (by generating additional checksums for rotation commands). For the crop command however we will end up with endless checksums or a smart formula ;-)

-Franz