Detecting file modifications

Started by 11august, November 09, 2013, 02:56:58 PM

Previous topic - Next topic

11august

Hi Phil,

With my French partners, we are working on a project that (indirectly) involve ExifTool.

It's about trying to detect modifications/alterations in images by comparing metadata stored in the files, with a control set of photos taken with the same camera.

Most of the time, the tags are easily identified as markers for such modifications (XMP metadatas, Software tag, Modified Date, etc.).
However, there are some tags for which I can not say 100% for sure if they are added in the file after the use of a post-process software or by any other process (or a mix of both).

This is especially the case for:
- JFIF Version
- XMP Toolkit
- About
- Date acquired (why is this tag different from Date/Time Original?)
- Creator Tool (why it can be seen sometimes at the same time of the software tag with the same value?)
- APP14 Flags 0 and APP14 Flags 1 segments.
- XP Keywords and Last Keywords XMP.

Moreover, for our project, we use exiv2 to extract IPTC and XMP data, but we don't know for sure why all the tags in the list above aren't listed by it.

Is this the result of an ExifTool "special" operation when reading the file? Is there any other better way to have ALL the XMP/IPTC metadata easily recognizable as markers of post-process modifications?

Any help will be greatly appreciated!  :)

Thanks
Co-author and developper of the GEIPAN groupe image analysis software IPACO, part of the French Space Agency CNES

Phil Harvey

I'm not sure what Exiv2 extracts, but in general I would expect ExifTool to extract some things that other utilities do not.  (This is especially the case for EXIF maker note information.  Although Exiv2's maker note decoding is based on ExifTool's, they lag behind somewhat in the new discoveries.)

Things like XMPToolkit and About are often just ignored by metadata readers.  This could be considered part of the XMP header, but ExifTool extracts these as separate tags because I thought it was potentially useful information.  I suspect that most XMP readers out there are based on the Adobe XMP toolkit (not sure about Exif2 though), but ExifTool processes the XMP itself, so it isn't constrained as a more conventional reader could be.

I rambled on a bit, but I don't know if I helped to answer your question.  This is because I'm not really clear about what the question is, or exactly what you are looking for.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

11august

Thank you for your reply Phil, and sorry if I wasn't enough clear in my questions.

Yes, for us (and that was a really good idea to extract these in ExifTool!) XMPToolkit and About are really useful in the comparison of two files.

And what about JFIF? Do you think that the presence of this tag in the EXIF should be considered as a sign of a significant modification of the file?

I'm not sure also about the signification and origin of the tag "XP Keywords"? Shouldn't it be noted as "XMP Keywords" instead?

I guess that, instead of use exiv2 for our project, we should use ExifTool! :D I'll ask to my development partner and come back to you ASAP.
Co-author and developper of the GEIPAN groupe image analysis software IPACO, part of the French Space Agency CNES

Phil Harvey

JFIF is not EXIF.  It is an older standard that was used before EXIF was developed.  The EXIF standard makes JFIF obsolete, but it is still commonly used.  It could certainly be an indication that the file was touched by some other software.

If you see XPKeywords (or any other XP tag) or OffsetSchema then the file has likely been touched by Microsoft software.  These are Microsoft EXIF tags.

This is actually mentioned in the EXIF Tags documentation.  And browsing through all of the JPEG Tags documentation may be helpful.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

11august

Thank you Phil for your reply.

Things will be clearer if I tell you what we are after.

My partners/friends and I are involved in investigations around "Unidentified Aerospace Phenomena", in the area of photo/video analysis. For 5 years, we have been working on the development and the validation of a powerful dedicated software tool, in close cooperation with the French Space Agency.
This software, called IPACO (http://www.ipaco.fr/page28.html) , is derived from a well-established operational tool dedicated to image intelligence from satellite imagery for military purposes, previously developed and distributed by the company one of us has been heading. So far, IPACO has been focussing on interactive tools designed to extract all possible useful information from the documents, but now we are trying to add a "forensic" part into it, so as to alert the analyst about possibilities for the document not to be authentic.

Our software tool already includes a number of opened libraries, in particular libJpeg, Libexif and - more recently - Exiv2.
Of course, we would have loved to integrate an ExifTool library if we had found one as a c++ linkable library (if this does exist, please tell us).
On the other hand, we do not need to be as exhaustive as ExifTool is, since our goal is only to identify any tag that may indicate that the picture has been modified after the initial shot. More precisely, we track modifications of the pixels themselves, not necessarily of surrounding informations such as filename or copyright.

Our difficulty today, when we compare our results with what can be found using ExifTool, is that we have no access yet to JFIF tags and more generally to any tags that would not be explicitely EXIF, IPTC or XMP (manageable through Ilbexif and Exiv2). Is there a library available to reach such tags, or do we have to grab individual bytes in the tables of the jpeg files?
Co-author and developper of the GEIPAN groupe image analysis software IPACO, part of the French Space Agency CNES

Phil Harvey

For JFIF, I am surprised that Exiv2 doesn't read it... You should submit a feature request.  Or maybe this is handled by the JPEG library?  JPEG/JFIF was a standard for a long time before EXIF came around, so there must be lots of C++ libraries out there that will read it.

I have had a few requests from people who are doing similar things.  ExifTool really isn't designed to help with original image verification, but I don't know if there is anything better out there.  However, I know of no C++ library interfaces for ExifTool.  Maybe I'll write one sometime for the fun of it.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

11august

Hi Phil,

After checking, yes, it is handle by the JPEG library and works perfectly! Thanks for the tip.

You said last week about the presence of the JFIF tag that "it could certainly be an indication that the file was touched by some other software". But do you have any idea what could be the process applied to the original photo for this tag to appears in the metadata?

Now, I have another question about the presence of the tag "Warning" when reading a file with ExifTool.
We have few files on our testing panel that have this tag without anything else suspicious (in terms of post-processing).
This is the case for example with the "Bad InteropIFD Directory". What does that mean? In forensic terms, do we have to consider the file corrupt because of some manipulation?

BTW, I read the forum archives about the general signification of the warning tag, and you said that it is informative only and that it indicates that the makernotes were damaged when the image was edited with another utility. So, if I understand correctly, we can consider (still from a forensic point of view) that everytime it appears in ExifTool, the image was processed. Is that right?

Thanks! :)
Co-author and developper of the GEIPAN groupe image analysis software IPACO, part of the French Space Agency CNES

Phil Harvey

Quote from: 11august on November 19, 2013, 04:55:33 AM
You said last week about the presence of the JFIF tag that "it could certainly be an indication that the file was touched by some other software". But do you have any idea what could be the process applied to the original photo for this tag to appears in the metadata?

Specifically, no.  But in general all it would take is to open the file and re-save with some software.  Windows is notorious for modifying files like this, sometimes even when the user doesn't realize it (although I don't think it adds JFIF, but it does do more nasty things than this).

QuoteNow, I have another question about the presence of the tag "Warning" when reading a file with ExifTool.
We have few files on our testing panel that have this tag without anything else suspicious (in terms of post-processing).
This is the case for example with the "Bad InteropIFD Directory". What does that mean? In forensic terms, do we have to consider the file corrupt because of some manipulation?

All it means is that the file was written by buggy software.  Unfortunately, camera firmware is often buggy (read here for some examples[url]), so a Warning doesn't necessarily indicate that the file has been modified.

QuoteBTW, I read the forum archives about the general signification of the warning tag, and you said that it is informative only and that it indicates that the makernotes were damaged when the image was edited with another utility.

Yes, makernote errors are often generated by modifying the file with unaware software.

QuoteSo, if I understand correctly, we can consider (still from a forensic point of view) that everytime it appears in ExifTool, the image was processed. Is that right?

No.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

11august

Thank you again Phil for all your valuable replies!

One last thing (but maybe you don't have any idea); there are, in our photo-test panel, two images that are really surprising as they don't show any difference in their EXIF, read by ExifTool, comparatively to those taken with the same camera and that are original.

They are, without any doubt post-processed as there's a visible watermark on each of them.

Any idea how it could be possible? I know that completely replace the metadata is easy and could be an explanation, but I just was wondering if you're aware of any other alternative way.

One of the image is attached.

Co-author and developper of the GEIPAN groupe image analysis software IPACO, part of the French Space Agency CNES

Phil Harvey

Yes, copying all the metadata as a block would be one way to do this.  But for this particular image, if you are willing to do a lot of work, you could use the Canon OriginalDecisionData to tell that the image was tampered.  I believe the structure of this is fully understood, although ExifTool doesn't do anything with this data.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

11august

Thank you, and forgive my ignorance, but what is "Canon OriginalDecisionData"? BTW, I noticed in one of our photo-test in ExifTool this tag: "Original Decision Data".
Where can I find more information on this?

Thanks again! :)
Co-author and developper of the GEIPAN groupe image analysis software IPACO, part of the French Space Agency CNES

Phil Harvey

Do some googling and you should be able to find out more about this.  I'm pretty sure the reverse-engineered spec is out there somewhere too.  It was Canon's way of validating that an image hadn't been edited.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Quote from: Phil Harvey on November 16, 2013, 04:27:34 PM
I know of no C++ library interfaces for ExifTool.  Maybe I'll write one sometime for the fun of it.

It was fun, but a lot of work too.  I'm very close to having this finished now, and the results are pretty satisfying.  I'll make it public after I do some more testing and write the documentation.  I still need to test it on Windows, and I am not looking forward to this.  Works great on the Mac though!  Very fast too.  It will be called "C++ ExifTool", and will appear in the programming section of the ExifTool home page when it is done.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).