Writting metadata: To raw or not to raw?
This article is ment for those, who think about saving metadata directly into raw files, but are not sure about "risks" involved.
One can read all over the web, how it's "better not to" do that. And when asked why not, we get answers like: it's not safe, (in future) you might not be able to open raw files anymore, better not to mess with proprietary files, one simply doesn't modify "original" files, etc. If such answers are enough to convince you, then don't read further!
In this article, I'll try to explain what's all about -and then you decide. And if you decide, still not to write metadata into raw files, you will at least know why you made that decission.
Here, we will mostly talk about Canon CR2 raw files -reason is simple: as Canon camera owner, I have access to Canon software, so I could make all necessary checks over the years. However, as you will see, all what's written here, also applies for most other raw formats (NEF, DNG,..). The only difference is, there are things, I can't check without having (for example) Nikon raw software.
Almighty TIFFFirst, we need to distinguish between TIFF file format and TIFF image data. TIFF file format is a standard, which describes how various data inside file must be organized, to be readable later. So, very often we can read, that TIFF is actually a "container" which can hold various data about image, incl. image itself.
Inside TIFF file, data is divided into two major groups: metadata group and image data group. This is not entirely correct (as we will see later), but it makes easier to understand:
TIFF HeaderIt's obvious, that it's not enough to just give some file *.TIF sufix: content must match TIFF specification. To ensure this, at the beginning of every TIFF file, there's so called Header. This header only occupies few bytes and his main purpose is, to make initial checks possible (thus answering: does file really contains TIFF data?):
Instead of using Intel byte order, TIF can use Motorola byte order to store data. In this case, the the same header would look like this:
In text that follows, Intel byte order TIF file is assumed (unless noticed otherwise).
Let's sumarize: if first four bytes contain value 49 49 2A 00 or 4D 4D 00 2A, then we can assume, we are dealing with valid TIFF file.
IFD: Image File directoryAbove we said, that third value points to IFD. In our case this value is 08000000, saying: IFD begins at offset $00000008 from begining of the file. Thus, if we skip first eight bytes in this file, we will land at beginning of IFD segment. Now, we must be aware, that IFD doesn't always start at "address" $00000008 -if that would be the case, we wouldn't need that value at all.
Because inside TIFF file, more than one IFD can exist, first IFD is called IFD0, next is IFD1, etc.
IFD is area where metadata is stored. The content looks like:
Let's "anylize" first entry in IFD:
-hey, we just "decoded" one metadata tag! If interested, you can find the rest of Tag ID's (and decoding mechanism) in TIFF specifications.
What's left is, the field (four bytes long) after the last IFD entry (here, at offset $00FA). This field contains pointer (offset) to the next IFD segment. In our case, this field contain value 00000000, indicating no further IFD exist.
Ok, we have seen example of how 4 bytes long (meta)data can be stored inside IFD. But what if we need to store something where more space is needed.. like camera name, for example? In this case, inside IFD0, there will be entry, like:
-now we can see, that IFD entry points to address inside file, where camera name is saved (in our case, this address is $0000FD40). And because IFD contains pointer, there's also the size (length) of data, here $0E bytes (=14 decimal).
If we now look on content at address $0000FD40, and interpret values there as characters, we get "Canon EOS 60D" =Exif:Model data.
Here we examined two kind of data in IFD: numeric value (image width) and array of characters (camera model). Of course, IFD entries can point to many kind of data. For example, IFD entry can point to "ExifIFD" segment (which is similar to regular IFD, however it contains another data) or to Xmp data segment, or it can point to "pure" image data, etc.
All above is not ment to be metadata decoding manual, nor I like to complicate things just for fun... The thing is, only if you really understand above "system", you will also understand what's "behind" TIFF.
Anyway, here's a simplified TIFF structure (one of many possible):
ConclusionsNow, we are ready to make some conlusions... For example, to delete some metadata tag, we need to delete particular IFD entry and (if needed) the data at which entry pointed. At the same time, all offset values after deleted IFD entry, must be checked and recalculated where necessary (otherwise offset values could point to false addresses) -it's similar in case, if we add some metadata tag or change value/content of existing metadata tag.
If we sumarize further: TIFF standard predicts various data inside file and it also predicts that data inside TIFF file will be modified. That is, we know (from above), that after resizing image (=changing "pure" image data), tag "ImageWidth" is updated by image editing software.
We also know, that IFD entry can point to many kind of data, i.e. Exif, Iptc, Xmp,.. And again: TIFF standard defines how all that data is "connected".
One doesn't need to be a genious to conclude, that it's up to software (image/metadata editor), to manage all those connections inside TIFF. Yes, connections: because, as long we talk about manipulating metadata inside TIFF, that's all what's about.
For purpose of this article, structure of image data is not relevant -all we need to know is, there/somewhere must be an IFD entry, pointing to that data.. and there must be an entry telling how big that data is.. etc.
Do you edit metadata inside TIFF?If you read this article, you probably do that (some are doing that without even knowing). But are you aware of possible risks involved? I mean, now we know, that i.e. single wrong written offset value can lead to total loss of image data!
Yeah, there's always a risk involved (lightning can strike in PC)... But most image editors seems to handle that stuff prety well. Ok, let's say it: we modify metadata inside TIFF files without thinking much about.
Where's the article about raw files?Above -you have just read it!
Yes, camera manufacturers aren't that stupid to reinvent the wheel. They use what's there and is proven to work well. The main difference between all those files (TIF, CR2, DNG, NEF,..) is, how image data is "encoded". Because everything else is (almost) the same: Header, IFD segments, ExifIFD, Iptc, Xmp,.. all that is there, and follows the same TIFF principle.
Another TIFF fileHere's an example:
At the beginning of the file (first line), there's known TIFF Header, which points to IFD0 (in this case, at address $00000010).
Right now, we don't know the meaning of content at address $00000008 (second line), so we skip that and go to start of IFD0. Here we see IFD0 contains $0011 (=17 decimal) entries. First entry contains ImageWidth, second entry ImageHeight, etc. Nothing new actually.
But that second row of data still bothers us... Now, if we decode first two numbers there ($43 and $52) as characters, we get:
The next two bytes, taken as single data, represent:
CR2..? Yes, we are looking at Canon raw file! And the next four bytes contain offset to address $0000B714, where we can find another IFD section, containing IFD entries... and finally: one of the entries there, points to raw image data.
We just decoded (part of) metadata inside Canon 60D raw file. Classy, huh? And all that by simply following "the TIFF book".
What about "proprietary" MakerNotes metadata?Let's take a look at above IFD0 entries of our CR2 file again. There's an entry (the last one, in our case), with TagID:
-this TagID value ($8769) says, entry contains pointer to
-here, ExifIFD contains $0026 (=38 decimal) entries, and one of them has TagID value of $927C. According to Exif specification, tagID value $927C says: "this tag points to MakerNote metadata". In our case, MakerNote occupies $0000B094 bytes (=45204 decimal) and starts at address $000003D8.
Up to now, everything was conform with TIFF/Exif standard. Here however, it ends. For most raw image files, content of MakerNote section is officially unknown teritory.
But, curious as we are, let's see what's there:
-we can recognize, that in case of Canon raw file, MakerNote points to a structure, which is equal to regular IFD! And that being the case, manipulating that metadata is as simple as working with TIFF/Exif metadata. However, there's a catch...
Canon didn't published the meanings of TagID values there. That is, we can see, that i.e. first entry (TagID=$0001) points to address $000005CA, and that data there is occupying $00000031 bytes, but we don't know the meaning of that data -officially, nobody knows that! ..except ExifTool.
Over the time, many (but not all) of MakerNote content has been "decoded". And once decoded, modifying particular metadata tag is as "simple" as doing that with TIFF/Exif tags. For the last time (are you bored?), let's look at IFD entry at address $0422 above:
Remember, how we "decoded" Exif:Model (camera name) at the beginning of this article? Same thing here... in this case however, data contains.. exactly! -TagID $0009 is Canon:OwnerName.
It seems to be so simple.. the principle is simple. But that doesn't mean it's easy: to write metadata, it requires a lot of programming discipline and metadata knowledge.
To raw or not to raw?Above are the facts and it's up to you to decide. If interested about other raw files, you can check them either by using some "Hex viewer". Or even better, with ExifToolGUI, where you export metadata into html file.
If nothing else, you at least realized, that modifying metadata doesn't alter image data at all.
My personal opinion (if you're interested)
Image metadata is data about image -it's actually part of image data. And as such, it belongs inside image file. Saying that, I refuse to use software, which isn't capable to write metadata safe directly inside image files or read metadata I've saved inside image files previously.
Using ExifTool, I write metadata inside my Canon raw files for years successfully. And I don't need any additional (expensive) software, to keep track of where my metadata is -it's always there, where image file is.
Here are few screenshots of small starting parts of various raw files (as generated by ExifTool):