Raw image file structure or format

Started by Skids, December 30, 2019, 07:05:44 AM

Previous topic - Next topic

Skids

Hi,

Hi I wonder if anyone knows of a tool or of documentation that describes the structure of a raw file, specifically a Panasonic rw2 file.  I have conducted a few searches and believe that the files are probably based on the tiff file format and are container files.  I'm want to be able to determine what is stored in the container e.g. thumbnail.jpg, preview.jpeg, exif data and raw data.

Any thoughts?

best wishes

Simon

StarGeek

I believe that most RAW files are TIFF based, with propitiatory alterations, except for Canon .CR3 files (MP4 video based?)

While I don't know the details, you might look into the LibRaw project (based upon the old DCRaw code).  It's open source attempt to decode the various RAW formats for use in other programs.  Also RAW Therapee, though I don't know if it's related to LibRaw or separate.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

The ExifTool -htmldump feature gives you a good way to visualize the structure of TIFF-based files like RW2.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Skids

Hi Phil,

Sorry for very late reply but I missed your post of late last year.  htmldump is great but leads to a further questions, (sorry).  As I believe I said I am looking to extract the thumbnail from a Panasonic raw file (.rw2).  I can see the thumbnail data but I am still uncertain how to read the IFD fields to locate the thumbnail image data.  I have attached a copy of the htmldump file just in case you have time to take a look.

One of the descriptions of IFDs in Tiffs implies that if the 4bytes at the end of a IFD are zero then there are no more IFDs :
QuoteAn IFD contains information about the image as well as pointers to the actual image data.. It consists of a 2-byte count of the number of directory entries (i.e. the number of fields), followed by a sequence of 12-byte field entries, followed by a 4-byte offset of the next IFD (or 0 if none). There must be at least 1 IFD in a TIFF file and each IFD must have at least one entry.
The dump file appears to have two numbered IFDs (0 and 1) with the jpeg thumbnail being described in IFD1 (&85b8).  Yet the first IFD ends with the offset set to zero.  The tag JpegFromRaw leads me to &1200 but this is meta data which includes one offset that seems odd (line &1282) as it seems to small.  HtmlDump tries to subtract the &1200 from the offset and ends up going negative and the pure offset leads back to the insides of the first IFD.

I was hoping that it would be obvious how to step through the IFD picking the data I want but it seems rather more challenging.  Perhaps I would be better to just use the brute force approach of searching the file for hex FF D8 FF DB (start of jpeg file data) and FFD9, what do you think?

best wishes

Simon

Phil Harvey

There is only IFD0 in the main image.  The dump also shows the metadata in the embedded JPEG preview image, which contains another IFD0 and an IFD1 for the thumbnail.

You're right, the reported offset base for the IFD entries in the embedded JPEG preview is incorrect.  I'll have to look into this.

Finding the preview is easy:  Just look for tag 0x002e in the main IFD0.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Skids

Hi,
QuoteFinding the preview is easy:  Just look for tag 0x002e in the main IFD0.

I've spent hours trying to work out how to get to the thumbnail tag so I can't agree that it is easy.  The tag 0x002e points to offset 0x1200.  There is what seems to be a JPEG header 0x ff d8 ff e1 but the following bytes reads as another Tiff header not a jpeg file.  This seems to mean that a raw/tiff file may contain other tiffs?! 

If so then all subsequent offsets are from address 0x120c (which is what seems to be the start of the second tiff (0x4949 2a)) .

Following this path arrives at 0x12b2 (Next IFD) which is set to 0xac73, and 0x73ac + 0x120c = 0x85B8 which is the address of the target IFD.

The weak link in the chain of jumps is from the first IFD to the supposed JPEG header.  I can't see how I should know that there is another  tiff waiting to be found.  This Panasonic format seems over complex.  My limited reading of the tiff spec suggests that as a minimum that all offsets refer to the start of the file.  I suspect that rather than outer and inner file its more likely to be two or more tiff like files butted together.

The dump routine tries to subtract 0x1200 when it should be adding 0x120c to offsets in the second tiff file.  I am interested on how you will arrive at 0x120C.

best wishes

Simon

Phil Harvey

Tag 0x002e points to the start of the JPEG, and the size of the tag gives the length of the JPEG.  The JPEG has the EXIF segment near the start, which is the TIFF-format information you are seeing.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Skids

Hi Phil,

Thanks for your post - I think you missed out ending your post with the words "you dolt!", the penny has now dropped.  For other raw / tiff tyro's out there here is my new understanding.

The first IFD in the raw file has a tag field "JpegFromRaw" that describes a block of data by encoding the start point (an offset) and size (in bytes).  This block should be treated as a separate entity, the information within is nothing more to do with the tiff that acts as its container.  In other words once the tag is read we know what it is, where it is and how large it is.

In the case of Panasonic RW2 files this block holds a medium sized Jpeg, a thumbnail jpeg and a complete set of metadata.  For my use case this is largely irrelevant as the block of data referred to extracts as a fully functional jpeg of just under 600Kbytes which may be passed to any jpeg reader.

Its amazing the benefits of a good nights sleep....

best wishes

Simon