What are the risks of writing meta data to dng files?

Started by Skids, May 21, 2022, 08:17:48 AM

Previous topic - Next topic

Skids

Hi,
I am seeking informed opinion on the risks involved when writing meta data such as keywords and ratings into my collection of dng files?  I am not an expert but I have looked inside dng and some raw files and see that they are based on tiff files which in my opinion are a pretty horrible format. I think this because every single pointer has to be updated when new data is inserted into the file.  I'm pretty confident that exiftool gets all this counting correct but I don't actually know.

A second risk might be an i/o error during file update.  I understand from reading the forums that the recommended method of adding a keyword is to allow exiftool to create an updated copy while leaving the original available for reuse should  checking the new file with the validate command raise a serious error.  I am uncertain what settings I should use with the validate command or how to interpret the results so wonder if anyone can point me to the documentation or tell me what results would trigger a revert to original?

Or should I just live with xmp sidecar files?

best wishes

Simon

Joanna Carter

I have been using ExifTool to write directly to all sorts of RAW files, as well as DNG files for years. In all that time, I have have never had any corruption or image loss.

I have written a Mac app that leverages ExifTool and none of my beta testers have had any problems either.

Of course, the ultimate protection is to have a daily backup or, on a Mac, a Time Machine backup running all the time, but ExifTool renames the original file, makes a copy, writes to that copy and then, optionally, deletes the original, so there is only a very slim chance of losing anything.

madison_wi_gal

I would be interested in that Mac app, I do not mind doing command line to get (for example) the lens info right, but a GUI (if this is what the app is) might be nicer to use.

StarGeek

Quote from: Skids on May 21, 2022, 08:17:48 AMsee that they are based on tiff files which in my opinion are a pretty horrible format

I couldn't say how good or bad the format is, but I can say that metadata support for tiffs is pretty much universal, unlike PNGs, which have almost no metadata support.  And the read/write code for tiffs has had decades to work out any bugs.

Some newer RAW file types, such as CR3 (and HEIC?) are MP4 based.

QuoteI think this because every single pointer has to be updated when new data is inserted into the file.  I'm pretty confident that exiftool gets all this counting correct but I don't actually know.

In the past, there have been a very few cases where a bug crept into exiftool, but Phil has always been able to fix the problem and fix the corrupted files.  It's been very, very rare, much better support than any program I've ever seen.

But make sure you read Known Issues to see the exceptions.

QuoteA second risk might be an i/o error during file update.  I understand from reading the forums that the recommended method of adding a keyword is to allow exiftool to create an updated copy while leaving the original available for reuse should  checking the new file with the validate command raise a serious error.

Well, not so much.  It's good when you're first starting out with commands or trying something trick, but I can't think of any case where it is or is not recommended.  As long as you have a proper backup solution, it shouldn't be a worry.  I've been running every command with the -overwrite_original option for close to a decade now and the only times I've made a mistake that wasn't directly reversible was when I severely screwed up a renaming command, which isn't covered by the _original backup files anyway.

But to address the I/O error problem, the way exiftool works is that it creates a new file with all the edits.  Only if that is successful then exiftool will rename the original file to add the _original extension and rename the new file to the original name.  With the -overwrite_original option, the original would be deleted and the new file would be renamed.  The only time where an I/O error would be a danger would be -overwrite_original_in_place option.  In that case, after the new file has been created, the original file is opened for writing and the contents of the new file are copied into the original file.  But the actual use cases for this option are fairly limited.  Mostly in cases where there is file system data that would be lost.  On a Mac, this would be the various MDItem and XAttr properties, which can be more common.  On Windows, this would be Alternate Data Streams, which are pretty rare, as I've only encountered one program that has ever used them.

For more details, see "Under the Hood" of ExifTool

QuoteI am uncertain what settings I should use with the validate command or how to interpret the results so wonder if anyone can point me to the documentation or tell me what results would trigger a revert to original?

For me, the only thing that would trigger a revert is if the file couldn't be read.  With regards to -validate, a large number if the "issues" that it would return are extremely minor and rarely prevent rendering of the image.  For example, Missing required JPEG ExifIFD tag 0xa003 ExifImageHeight/Width is really common.  But, even though the ExifImageHeight/Width tag is required by the EXIF spec, it's a completely useless tag, IMO.  Any program reading the image would look at the actual data size, not give priority to a tag which can be completely out of sync with the actual image (a fairly common occurrence).

QuoteOr should I just live with xmp sidecar files?

This is up to you, as there are pros and cons to either way.  If you're using a Digital Asset Management (DAM) program, it probably only supports XMP sidecars and you would have to go out of your way to constantly sync the XMP data into the files.  XMP sidecars also will significantly speed up backup time, as any changes would require only updating the small sidecar file rather than the much larger image file.

On the other hand, embedding the data means it cannot be lost and will travel with the image.  For example, uploading the image to a website or cloud service.  Also, if you use a variety of programs, you might have to go out of your way to make sure the XMP sidecars are updated and copied if you move the image files.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Skids

Thanks all of you for your thoughtful and in depth replies.  The reason for my question is that at the moment  I manage my images by simply writing keywords into the filenames.  It works but is a little inflexible. 

I have recently discovered a couple of applications that offer the option of writing keywords directly into  dng files and I am writing a utility to extract the keywords from the filenames and write them to either a xmp sidecar or directly into the dng using exiftool.  I want to make sure that I understand the pitfalls before I start modifying my images and putting them at risk.  I'll read the links posted above.


thanks again
Simon

Phil Harvey

ExifTool should be fairly robust when writing DNG files.  The only catch is that the Adobe engineers have blundered in the past with changes to the DNG specification that are not forward compatible with existing writers.  (This is really stupid because they could have easily made the changes compatible if they had a brain amongst them.)  However, when a new specification comes out I try to update ExifTool as quickly as possible if necessary.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).