"Direct Write" option

Started by Mac2, October 04, 2011, 04:57:10 PM

Previous topic - Next topic

Mac2

The mechanism used by ExifTool to write only to copies of the original file is very safe and sound.

But it's also a bummer when you work with large TIFF or PSD files (50 MB - 1 GB), and even more so over a network. To set a rating for two 300 MB standard PSD files takes 10 seconds locally, and may take two minutes over a network.

Changing the rating of 10 or even 100 of these files takes a very long time. Not because ExifTool needs so long to update the rating tag but for all the file copying involved. I'm forced to use overwrite_original_in_place because I need to support UNICODE file names (see also https://exiftool.org/forum/index.php/topic,3565.msg16195.html#msg16195) which makes things worse.


Is there a "write directly to original file, and YES I promise have made backups of all files" option?

Something that just opens the file, updates the XMP data and closes the file again?

I understand that this will not be possible for all meta data formats. And it may not be possible all the time for XMP either, e.g. when the XMP record has to grow and the file needs to be spliced and re-combined after growing the data block...

But for the most common cases in DAM workflows, e.g. setting rating  and label, adding or removing a keyword, this would be a big improvement. If sufficient padding is added to the XMP record on the first write (usually it is) even adding a headline, multiple keywords or comments could be done in-place.

I don't understand enough of the inner functionality of ExifTool to know if this would be possible at all. Or if it would be worth it for Phil to spend time on this. But image files grow larger all the time, and network storage, NAS boxes or even Cloud storage becomes more and more common. Usually users have backups so no data is lost if ExifTool would fail to update a file correctly or there would be a crash or power failure during the process. The performance boost of a direct update mode, even if only for some limited cases, would be great.

Phil Harvey

This is certainly possible, but I don't know if I would ever add a feature like this to ExifTool.

There would be limitations even if I did add this ability:

1) You couldn't use this technique to update some file types.

2) You couldn't do this unless the file already contained XMP.

3) The amount of information you added would be limited to about 4 kB or less.

I'm never happy with limitations, especially serious ones like this, so that makes me very unenthusiastic when it comes to adding a feature like this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, Phil

I fully understand your point, of course.
And I too see the limits and implications of such a special case. And in no way I would want to weaken the clean and solid design of ExifTool. It's extremely reliable, and its design is a big part of that reliability. And that's how we all like it  :)

But you can make some pragmatic assumptions about the typical workflows in professional / hobby usage:

A file very often has embedded XMP before it even reaches a DAM product / ExifTool. Either the XMP was added in-camera, during the initial ingest process, in ACR, Photoshop, Nikon Capture or whatever. Even if this is not the case, we would gain from the special case after the initial XMP record was written.

For users which only work with plain JPEG files, the problem is not as dramatic anyway. 3 to 6 MB JPEG files can be copied very fast so the standard ExifTool approach works fine with these files. It's mostly large files, files on "slow" network servers or large batches of files I'm thinking about.

4 Kb is more than enough for many cases. Caption, By-line, photographer address, usage restrictions, a number of keywords, a rating and a label. That's usually all that goes into a file before it is archived. I estimate the standard XMP data change volume is 200 to 1,000 bytes. Perhaps 2K if you work with large controlled vocabularies or need to add larger scientific copy.

If support for selected (!) unmovable "fixed length" fields like EXIF date/time, EXIF orientation or (already existing) GPS coordinates could be included, this would save a lot of time too. These fields are the most often updated tags in typical DAM environments. And they are often updated in large batches, which means high volume I/O.

Even having a very restrictive implementation like:

A) Works only for XMP (and selected fixed length EXIF tags)
B) Requires existing XMP
C) Requires sufficient "slack" space in the existing XMP for the changes
D) In case of any doubt, don't

could lead to a tremendous performance gain for many use cases.

Changing an XMP rating could be done by writing a single byte, instead of copying a 100 MB TIFF file over a network. Even adding or removing a keyword or label could be done in-line.
Correcting the EXIF date and time for 2000 JPEG files from this years vacation on a slow NAS would probably drop to a couple of seconds instead of 10 minutes. Much better for Cloud / wireless applications and mobile usage.

Phil Harvey

I don't disagree with anything you say, and I realize that a feature like this would be useful to many people.  It could also allow me to add partial write support for some formats which aren't yet supported for writing (AVI, MOV for example).  Actually, this feature is already on my to-do list, but not very near the top -- I'll move it up a bit.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, Phil

then you've got me sitting silent, hoping that this will show up some day  :)
No rush, though. I'm sure you have plenty of other things on your to-do list. And also a real real life  ;)

Phil Harvey

I just checked my to-do list.  This was item number 235.  I'll move it up to number 100. :)

Please don't get your hopes up too high... things don't tend to get completed until they move up to the top 20 or 30 in the list.  Currently there are 361 items in the "to-do" list, with 669 items already moved to the "done" list.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Already made the Top-100 - Yeah! I say  ;D

Looking at how fast you squeeze out updates, there is definitely light at the end of the tunnel.
If this also brings us (limited) write support for ID3 or other tags too, even more worth the wait.

(BTW: Thank you for adding the feature to use -if with group names in your latest release!)