Are image files updated in-place, or completely re-written?

Started by dd-b, June 26, 2017, 11:55:10 AM

Previous topic - Next topic

dd-b

When deleting keywords, and when adding keywords, to a JPEG, TIFF, or camera-original RAW file (Nikon NEF, Fuji S2 RAF, Fuji F11 RW2, Olympus ORF in particular), will the file be completely re-written (copied to a new inode on a Unix box), or are the updates sometimes made in-place or by extending the end of the file?

I ask because I'm gearing up to do a large clean-up of keywording on a lifetime of photos, and they live on ZFS filesystem, and there are snapshots being taken and kept.  Which means that, if the whole file is rewritten, the free space I need to do the clean-up is vastly larger than if the data is mostly updated in-place or by extending the end of the existing file.  I'm using the Perl library interface, not the standalone application.

Phil Harvey

ExifTool always creates a new output file.  The original is not touched unless you specify -overwrite_original (in which case the output file is renamed to replace the original), or -overwrite_original_in_place (in which case the contents of the output file are used to overwrite the original, and the output file is deleted).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

You might want to look into XMP sidecar files to see if that would work better for you.  If your software supports them, that is.  Smaller files that mean less impact on your snapshots.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

dd-b

Too many other programs in the toolchain don't support sidecar files. Also, part of what I'm doing is fixing accumulated errors (I guess some bits of the collected images go back 23 years; that's their digital representations, the images themselves some of them go back 60 years on film), and standardizing some things, so I need to take out existing keywords.

dd-b

So, given the write-and-rename process you describe, I have to worry very little about file damage in crashes (or from SIGINT), it sounds like. The rename is atomic, and aborting anything else at worst leaves a temp file with a funny name sitting around?

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dd-b

Quote from: StarGeek on June 26, 2017, 01:03:50 PM
You might want to look into XMP sidecar files to see if that would work better for you.

Although, on looking more deeply into how Photo Mechanic and Adobe products handle sidecar files, maybe that *is* a useful solution.  Because there's just *one* crucial program in the toolchain that doesn't handle them, and giving it up might be easier than replacing all the production and backup disks with bigger ones.

Martin Z

#7
Quote from: Phil Harvey on June 26, 2017, 11:57:36 AM-overwrite_original (in which case the output file is renamed to replace the original)
-overwrite_original_in_place (in which case the contents of the output file are used to overwrite the original, and the output file is deleted)

Hi Phil,

I was trying to think these through and work out which might be best to use...
•  If I'm right, -overwrite_original is equivalent of having a file [File1.jpg], EXIFtool creates [File2.jpg] and then renames [File2.jpg]-->[File1.jpg](?)
•  Whereas -overwrite_original_in_place is like doing [File1.jpg] --> 'Save as' [File2.jpg], then [File2.jpg] --> 'Save as' [File1.jpg], and then deleting [File2.jpg](?)



EDIT: I initially thought these were much more similar than they were, as they both preserved the original file's attributes, however looking a bit more into this, I believe I have figured out the difference(s) now and hopefully the info below summarises what the two parameters mainly do (but please let me know if the below is not right)...

•  -overwrite_original: By default, when writing to a file, EXIFtool will preserve the original by renaming it and write a new file to the path provided. -overwrite_original tells EXIFtool not to move the original file 'out of the way' and just write the new file over the top of the old one.
 └  For explanation purposes, this parameter could be considered as: -skip_backup
 └  ⚠️ The 'created' and 'modified' timestamps (and other attributes) are reset and are not preserved

•  -overwrite_original_in_place: This parameter has the same -skip_backup behaviour as
-overwrite_original but also preserves the original file's timestamps and other attributes
 └  No changes are made to the file's 'created' and 'modified' timestamps, or any other attributes
 └  ✅ To some (at least myself) this may be more of a true 'overwrite original' mode (especially if you want to update the file 'silently' without changing the modified date, etc)
 └  ⚠️ However, be advised that using this option will slow down performance (Phil recommends: "the -overwrite_original option should be used instead unless necessary")

StarGeek

Quote from: Martin Z on May 22, 2023, 08:12:51 PMI was trying to think these through and work out which might be best to use...

On Windows, it would always be best to use -overwrite_original except in the very, very rare occasions that there is an Alternate Data Stream.  And the only program I ever came across that used them is ComicRack, a digital comic book organizer.

On a Mac, there are a lot of file system tags that Finder uses and these would be lost with -overwrite_original.  See MacOS tags MDItem* and XAttr* tags.  To keep those requires -overwrite_original_in_place.

Quote•  If I'm right, -overwrite_original is equivalent of having a file [File1.jpg], EXIFtool creates [File2.jpg] and then renames [File2.jpg]-->[File1.jpg](?)

Technically, it creates File1.jpg_exiftool_tmp, and if successful, deletes File1.jpg and renames File1.jpg_exiftool_tmp to the original name.

Quote•  Whereas -overwrite_original_in_place is like doing [File1.jpg] --> 'Save as' [File2.jpg], then [File2.jpg] --> 'Save as' [File1.jpg], and then deleting [File2.jpg](?)

It creates File1.jpg_exiftool_tmp, and if that is successful, it opens File1.jpg for writing, copies the entire contents of File1.jpg_exiftool_tmp into File1.jpg, and if that is successful, deletes File1.jpg_exiftool_tmp.  This takes twice as long as it has to write the full file twice, once to create the _exiftool_tmp copy and once again to copy all the contents of the new file into the original one.

QuoteEDIT: I initially thought these were much more similar than they were, as they both preserved the original file's attributes,

The FileModifyDate is not preserved unless the -P (-preserve) option is used.  This is technically correct, as the file has now been changed, but a lot of people like to sort using the FileModifyDate.

QuoteTo some (at least myself) this may be more of a true 'overwrite original' mode (especially if you want to update the file 'silently' without changing the modified date, etc)

There is a rare danger that you have to worry about with -overwrite_original_in_place.  What if you have a power outage (or roommate trips over the power cord)? Or in my case, where for some strange reason the outlets in the master bathroom on one side of the house use the same breaker as the ones in my computer room on the other side of the house.  So during the summer when I have the room air conditioner on in the computer room and someone starts vacuuming on using the bathroom outlet, they both go out. The original file is corrupted because the re-write wasn't completed.  You'll only have the new copy. But you'll have to go rename that to remove the _exiftool_tmp.  And now, if you want to keep the MacOS file attributes or Windows ADS, you'll have to find a way to copy that information over to the new file.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

flying_brick

#9
Can I ask which of the save/overwrite scenarios in this thread is happening when using the perl module with the code below?

my $filepath = 'myjpg.jpg';
my $keyword = 'Flowers';
my $exifTool = new Image::ExifTool;
my $info = $exifTool->ImageInfo($filepath);
$exifTool->SetNewValue('IPTC:keywords',$keyword);
$exifTool->WriteInfo($filepath);

It seems to update the information incredibly quickly, it doesn't modify the timestamp on my Mac, and there are no '_original' files hanging around. This always lead me to assume that it was somehow saving the header in-place rather overwriting the whole file somehow. Looking at the information above I'm guessing the above code would most closely match up with '-overwrite_original_in_place -preserve'?

Phil Harvey

This is the same as the -overwrite_original option.  A temporary file is created and renamed to replace the original after coping the file attributes from the original file.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: flying_brick on October 12, 2023, 08:55:50 PMThis always lead me to assume that it was somehow saving the header in-place rather overwriting the whole file somehow.

Exiftool always rewrites the file.  There is no edit in place functionality.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).