ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: sidneyd on February 25, 2023, 06:46:31 AM

Title: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on February 25, 2023, 06:46:31 AM
While I know that exiftool is not supposed to touch the image data itself, when updating a large number of files I suspect there could be some memory leak or other issue which leads to random corruption of the image data itself.

I have been using exiftool (12.57) via a third-party tool GeoSetter and having some corruption occur in the Nikon raw images (.NEF files) for several months (including with prior versions) whenever there is a large number of files updated.

To check that it was not the third-party tool, I spent a few days performing tests with exiftool on Windows 10x64 system with a fast AMD 5900X CPU and 64GB RAM. I created a few folders with a larger number, 1361 Nikon raw images (.NEF), then created some batch scripts and examined the NEF files in Abobe Photoshop, Adobe Bridge and some other tools to check for any visual corruption of the image data both before and after running exiftfool.

1. When running exiftool as a single command (as per example 1 batch file) against all 1361 NEF files, some files will be intermittently corrupted - that is the image data itself gets damaged. 

2. If I reduce the number of images in the directory to a smaller number, let's say 200, then repeat the task, then the same command never leads to any image data corruption no matter how many times the activity is repeated.

3. If I used a for loop in the batch script (as per example 2 below) which calls exiftool with a practically identical command, but only performs the change one NEF file at a time. Then despite multiple runs of the command with slight variations in tag data, there were never any data corruptions.



Example Trail 1 Batch Script
"C:\Program Files (x86)\Geosetter\tools\exiftool" -v0 -overwrite_original -preserve -F ^
 "-FileModifyDate=Now" "-FileCreateDate<DateTimeOriginal" ^
  "-GPSLatitude=52 00 0.00 N" ^
  "-GPSLongitude=01 01 0.00 W" ^
  "-GPSAltitude=222" ^
  "-GPSDateTime<DateTimeOriginal" ^
  "-CountryCode=GBR" ^
  "-IPTC:Country-PrimaryLocationCode:GBR"^
  "-IPTC:Province-State=England" ^
  "-IPTC:Sub-location=RightHere" ^
  "-Location=LOCATION" ^
  "-City=City Name" ^
  "-Title=The Title" ^
  "-ObjectName=Object Name" ^
  "-Headline=The Headline of Someone" ^
  "-Copyright=(c) Copyright 2222 ABC, all rights reserved" ^
  -ext nef .

Example Trial 2 Batch Script
setlocal enabledelayedexpansion
cd /d %~dp0
FOR %%A IN (*.nef) DO (
  Echo Processing %%A
  "C:\Program Files (x86)\Geosetter\tools\exiftool" -v0 -overwrite_original -preserve -F ^
  "-FileModifyDate=Now" "-FileCreateDate<DateTimeOriginal" ^
  "-GPSLatitude=52 00 0.00 N" ^
  "-GPSLongitude=01 01 0.00 W" ^
  "-GPSAltitude=222" ^
  "-GPSDateTime<DateTimeOriginal" ^
  "-CountryCode=GBR" ^
  "-IPTC:Country-PrimaryLocationCode:GBR"^
  "-IPTC:Province-State=England" ^
  "-IPTC:Sub-location=RightHere" ^
  "-Location=LOCATION" ^
  "-City=City Name" ^
  "-Title=The Title" ^
  "-ObjectName=Object Name" ^
  "-Headline=The Headline of Someone" ^
  "-Copyright=(c) Copyright 2222 ABC, all rights reserved" ^
    %%A
)
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on February 25, 2023, 06:59:54 AM
Can you repeat this test with your antivirus software disabled and on another disk drive?

I suspect either a failing disk drive or interference by antivirus software.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on February 25, 2023, 08:39:09 AM
I had already tried on different SSDs, so that is not the cause. ;D

I will check antivirus, though do not understand why antivirus would affect it when it operates without an issue in one by one mode vs processing large number of files.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on February 25, 2023, 11:46:37 AM
Quote from: sidneyd on February 25, 2023, 08:39:09 AMdo not understand why antivirus would affect it when it operates without an issue in one by one mode vs processing large number of files.

Ditto for any exiftool problem.  A memory leak in ExifTool would cause an out-of-memory crash, not the symptoms you are seeing.

If disabling the AV doesn't work, send me one of the corrupted files (and the original too if you can), and I'll take a close look at it to see if I can come up with any theories.  My email is philharvey66 at gmail.com

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: StarGeek on February 25, 2023, 12:06:30 PM
How were you able to detect corrupted NEFs?  I want to try and replicate your results but individually loading up 1,000+ files is a formidable task.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on February 25, 2023, 02:20:36 PM
How I detected the corrupted NEF files was through a rather painful process of visual inspection in a tool such as Adobe Bridge, Thumbs Plus etc which can decode the file. I wish there had been another method, I looked at a number of tools, but nothing worked better than visual inspection.  The long pul is that you have to wait for the program to scan through over 1000 40MB images before you can browse the folder. Then it is usually very clear to see the damaged files as often they have marked colour bands which start part way through the image and cover the rest of the image.
 When compared to the original which did not have that artifact. 

As to Phil's question - is there some other way to send the files to you Phil as eMail gets upset at sending the over 40MB Nikon D810 NEF files?

For further information, I Ran the following tests, on different hardware, different SSDs, with AV on or off:
PC Config (all W10x64 21H2)       Test 1 (1361 NEF)   Test 1B (200 NEF)   Test 2 (1361 NEF one by one)
AMD 5900X 64GB RAM SSD D:      Corruption      All OK         All OK
AMD 5900X 64GB RAM SSD D: AV Off   Corruption      All OK         All OK
AMD 5900X 64GB RAM SSD E:      Corruption      All OK         All OK
AMD 5900X 64GB RAM SSD E: AV Off   Corruption      All OK         All OK
I5-10310U 32GB RAM SSD D:      Corruption      All OK         All OK
I5-10310U 32GB RAM SSD D: AV Off   Corruption      All OK         All OK
I5-10310U 16GB RAM SSD D:      Corruption      All OK         All OK
I5-10310U 16GB RAM SSD D: AV Off   Corruption      All OK         All OK
I5-6600 16GB RAM SSD D:         Corruption      All OK         All OK
I5-6600 16GB RAM SSD D: AV Off      Corruption      All OK         All OK
I5-6600 16GB SSD D:         Corruption      All OK         All OK
I5-6600 16GB SSD D: AV Off      Corruption      All OK         All OK
I5-9400 8GB SSD D:         Corruption      All OK         All OK
I5-9400 8GB SSD D: AV Off      Corruption      All OK         All OK

For test 1, this was repeated multiple times on each system configuration specified using different permutations of SSD and antivirus state to rule out that possibility. When corruptions using test 1 occurred, corruptions would always occur randomly throughout the folder and never were the same files. If exiftool was called separately for each file (as in test 2) or with a smaller number of raw files (as in test 1B with just 200 NEF files in the folder), no matter what CPU, SSD, memory or antivirus setting permutations there would never be a corruption.

This is highly suggestive of some scaling issue - where some resource, pointer etc is being overwritten, out of bounds etc.  When I was involved in product R&D these were the kinds of scaling issues which Systems and Test Engineering would invariably love inflicting on the development team to stress test the program or system.

Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: philbond87 on February 25, 2023, 02:46:30 PM
@sidneyd,

Out of curiosity, have you tested this with any other file types?

Thanks,
Phil (not the Phil)
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: StarGeek on February 25, 2023, 08:15:42 PM
For what it's worth, I was not able to replicate this problem.  But my NEFs are from a 5100, so their size is from 15-20MB instead of 40MB.

I ran your exiftool command over 1,454 random NEFs.  Then I opened Bridge and checked the folder.  No corruption on any file.  Opened up IMatch, made sure it was set to not use WIC codecs but it's own RAW processing, and loaded up the files in that.  None of the files showed any sign of corruption.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on February 25, 2023, 09:53:37 PM
I compared the raw data from two of your files (_DSC9212.nef and _DSC9212_CORRUPT.nef).  There were 14 single-bit differences in the data (see the "hex" column in the binary difference output below).

      offset        char    hex       long    short1 short2    float      double      date
-----------------   ---- -------- ----------- ------ ------ ---------- ----------- ----------
23426844 0165771c   Y/.] 592f9b5d  1570451289  12121  23963  1.398e+18   4.549e-25 2019-10-07 (60%)
23426844 0165771c > Y/.] 592f1b5d  1562062681  12121  23835  6.989e+17   4.549e-25 2019-07-02 (60%)
23426852 01657724   .09. 9b3039bc -1137102693  12443 -17351    -0.0113  -1.543e-10 1933-12-20 (60%)
23426852 01657724 > .0.. 9b30b9bc -1128714085  12443 -17223   -0.02261  -1.543e-10 1934-03-27 (60%)
23426892 0165774c   CC^. 43435e8c -1939979453  17219 -29602 -1.712e-31  -1.759e+30 1908-07-11 (60%)
23426892 0165774c > CCN. 43434e8c -1941028029  17219 -29618 -1.589e-31  -1.759e+30 1908-06-29 (60%)
23426900 01657754   .Ul. c1556c82 -2106829375  21953 -32148 -1.736e-37  4.601e-105 1903-03-29 (60%)
23426900 01657754 > .U.. c155ec82 -2098440767  21953 -32020 -3.473e-37  4.601e-105 1903-07-04 (60%)
23426908 0165775c   .`.. b560ccfc   -53714763  24757   -820  -8.49e+36  -9.355e-15 1968-04-19 (60%)
23426908 0165775c > .`L. b5604cfc   -62103371  24757   -948 -4.245e+36  -9.355e-15 1968-01-13 (60%)
23426916 01657764   ..Y. f0a159e3  -480665104 -24080  -7335 -4.015e+21  -1.03e+128 1954-10-08 (60%)
23426916 01657764 > .... f0a1d9e3  -472276496 -24080  -7207 -8.029e+21  -1.03e+128 1955-01-13 (60%)
23426924 0165776c   .Xb# e3586223   593647843  22755   9058  1.227e-17  2.734e+199 1988-10-23 (60%)
23426924 0165776c > .X.# e358e223   602036451  22755   9186  2.454e-17  2.734e+199 1989-01-29 (60%)
23426948 01657784   f... 660fe496 -1763438746   3942 -26908 -3.685e-25  3.381e+112 1914-02-13 (60%)
23426948 01657784 > f... 660ff496 -1762390170   3942 -26892 -3.943e-25  3.381e+112 1914-02-25 (60%)
23426956 0165778c   ..2. b4ac320a   171093172 -21324   2610  8.603e-33  4.628e-303 1975-06-04 (60%)
23426956 0165778c > ..". b4ac220a   170044596 -21324   2594  7.832e-33  4.628e-303 1975-05-23 (60%)
23427324 016578fc   .!.Q 8121a551  1369776513   8577  20901  8.865e+10  -6.096e+97 2013-05-28 (60%)
23427324 016578fc > .!.Q 8121b551  1370825089   8577  20917  9.724e+10  -6.096e+97 2013-06-10 (60%)
23427796 01657ad4   .... 9fd90a94 -1811228257  -9825 -27638  -7.01e-27    1.8e+304 1912-08-09 (60%)
23427796 01657ad4 > .... 9fd98a94 -1802839649  -9825 -27510 -1.402e-26    1.8e+304 1912-11-14 (60%)
23427804 01657adc   .C.. 9e438f92 -1836104802  17310 -28017 -9.041e-28 -2.974e-297 1911-10-26 (60%)
23427804 01657adc > .C.. 9e430f92 -1844493410  17310 -28145 -4.521e-28 -2.974e-297 1911-07-21 (60%)
23428012 01657bac   .#1. 19233116   372319001   8985   5681  1.431e-25  -1.467e+78 1981-10-19 (60%)
23428012 01657bac > .#.. 1923b116   380707609   8985   5809  2.862e-25  -1.467e+78 1982-01-24 (60%)
23428044 01657bcc   .... e4f1e6cb  -874057244  -3612 -13338 -3.027e+07   7.669e-70 1942-04-21 (60%)
23428044 01657bcc > .... e4f1f6cb  -873008668  -3612 -13322 -3.237e+07   7.669e-70 1942-05-03 (60%)

It is interesting that many of the errors are spaced by exactly 8 bytes, and all of the errors are in the same byte-mod-8, with all in either bit 4 or bit 7.  I see no possible way that ExifTool could cause a problem like this.  It is most certainly a hardware problem.  Bit flips like this can't be software when you are just copying blocks directly from disk to disk.  FYI, here is the ExifTool code that does the copy:

#------------------------------------------------------------------------------
# Copy data block from RAF to output file in max 64kB chunks
# Inputs: 0) RAF ref, 1) outfile ref, 2) block size
# Returns: 1 on success, 0 on read error, undef on write error
sub CopyBlock($$$)
{
    my ($raf, $outfile, $size) = @_;
    my $buff;
    for (;;) {
        last unless $size > 0;
        my $n = $size > 65536 ? 65536 : $size;
        $raf->Read($buff, $n) == $n or return 0;
        Write($outfile, $buff) or return undef;
        $size -= $n;
    }
    return 1;
}

Your theory about pointers being out-of-bounds or overwritten doesn't wash.  This is a hardware issue cut-and-dried.

Since it isn't the disk or AV, it must be a RAM or cache issue of some kind in your system.

- Phil

Edit:  Hmm.  After reading your last post fully I see you have run on multiple hardware systems.  Is there any commonality between these systems other than ExifTool?  If not, you make a strong case, but I still can't see how ExifTool could be the cause.  One more thing to try (although I hate to suggest it because if this fixes the problem we are no closer to finding the cause) is to use one of the other ExifTool packages:  Either the alternate Windows version (https://oliverbetz.de/pages/Artikel/ExifTool-for-Windows), or the pure Perl version if you have Perl installed.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on February 26, 2023, 03:03:06 AM
Thanks for the investigation and as hoted in the footer above, yes I have found this problem on multiple systems, which rules out any hardware such as CPU, RAM, SSD, HDD, GPU.

As to Software commonality, they all run Windows 10 Pro x64 21H2 (19044.2604) which is the latest Windoze 10 and which all have the latest drivers for their hardware and any MS patches.

What I will do, though it will take some time, is to dig through my archives, to see if I have any smaller files than the large 14bit encoded NEF produced by the D810. Then I will also create some batch processed to make some variations on the stress test, with variations in the number of NEF per folder starting at 200, going up to 2200.

I will also try doing an identical runs with the alternate Windows version you mention above to get more data points.

This may take a day or more to run, so stay tune.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on February 26, 2023, 10:37:53 AM
I analyzed the other 2 samples you sent.  Same thing, but fewer bit errors.  There was 1 bad bit in _DSC9366_CORRUPT.nef and 2 bad bits in _DSC9717_CORRUPT.nef .  The 2 bad bits were again separated by a multiple of 8 bytes, and all were either bit 4 or bit 7.

> subfile ~/Desktop/forum14536/_DSC9366.nef t1 0xf6a18
> subfile ~/Desktop/forum14536/_DSC9366_CORRUPT.nef t2 0xf6a74
> phdump t1 t2
      offset        char    hex       long    short1 short2    float      double      date
-----------------   ---- -------- ----------- ------ ------ ---------- ----------- ----------
 4306456 0041b618   f.G. 66c247c8  -934821274 -15770 -14265 -2.046e+05  1.195e+243 1940-05-18 (10%)
 4306456 0041b618 > v.G. 76c247c8  -934821258 -15754 -14265 -2.046e+05  1.195e+243 1940-05-18 (10%)
> subfile ~/Desktop/forum14536/_DSC9717.nef t1 0xb3034
> subfile ~/Desktop/forum14536/_DSC9717_CORRUPT.nef t2 0xb3094
> phdump t1 t2
      offset        char    hex       long    short1 short2    float      double      date
-----------------   ---- -------- ----------- ------ ------ ---------- ----------- ----------
24596684 017750cc   &O.. 264fe7be -1092137178  20262 -16665    -0.4518  1.604e+299 1935-05-24 (63%)
24596684 017750cc > &O.. 264fe7ae -1360572634  20262 -20761 -1.052e-10  1.604e+299 1926-11-20 (63%)
24597316 01775344   }.~. 7dff7ee7  -411107459   -131  -6274 -1.204e+24  -3.704e+57 1956-12-21 (63%)
24597316 01775344 > }.~g 7dff7e67  1736376189   -131  26494  1.204e+24  -3.704e+57 2025-01-08 (63%)

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on February 26, 2023, 09:47:55 PM
I should also point out that most of the bit errors happened at around 24 to 25 MB into the file, but one occurred at around the 5 MB mark.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on February 28, 2023, 12:04:18 PM
I have currently run about 1 million updates with exiftool and will share the data in the next few days when all the test runs are complete. Preliminary data does hint that large files such as D810 or other high resolution raw files are more suseptible to corruption.

While the I am still conducting additional tests to gather more data points - I started thinking laterally and wondered if there could be enhancement to exiftool such as a –RobustCopyImage flag. ;D   If this flag were set, then the program would perform a CRC checksum of the source image data and compare that with a CRC checksum of the destination image data.  Then if there were a mismatch, then exiftool could generate a major warning error and not write the output file.  While not addressing the problem headon, it could provide a higher confidence mode that the image data was copied intact.

Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on February 28, 2023, 01:16:02 PM
I have had to deal with many instances of bit errors in the past, and detecting them is not always as easy as you think.  If I added the -robostCopyImage flag then the read-back would certainly be fast enough to come out of the disk memory cache, which may not reflect the actual value stored.  (Of course, here I'm still assuming some sort of hardware issue, which is still contentious, but has been the source of 100% of problems like this I have seen in the past.)

My first inclination for an ExifTool mod to patch this problem would be to change the 64 kB buffer size to 1 MB to reduce the frequency of read/write cycles.  I could see this being much more efficient on modern systems, and I'm not sure how common it would be for a system to be asked to switch this quickly between read and write for a large data transfer.  It's a long shot, but it is possible that radiated energy at this switching frequency is exciting the data lines of the two error bits, leading to the problem.  A problem like this would be common to all systems with motherboards of the same layout, but changing the frequency would fix it in all cases.  However, we can strategize more after you present your new test results.

- Phil

Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on February 28, 2023, 01:38:35 PM
Hi Phil,

I know from many years working in R&D that these types of issues can be extremely challenging.

Certainly a different chuck size could be some interesting datapoint or option.  If you did add that mod, perhaps leave 64k as the default incase it breaks something else for other people and do the bigger chunks when an option such as -LargeChunks is set.

There is about one more day of testing to run, as I want to ensure I get a full picture covering as many different permutations on one platform, then also to see if that holds true when shifting to a different system with different CPU (AMD vs Intel), Motherboard, Memory, SSD etc.

Regards
Sidney
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 06, 2023, 03:07:34 AM
The additional data testing was conducted using an 18MB Nikon D3 NEF file and a 45MB Nikon D810 file. Testing was primarily conducted on the AMD 5900X AM4 system with MSI motherboard, 64GB RAM, and using three different MVMe SSDs. To provide additional data points, I also ran these same tests on an Intel i5-1030U with MvMe SSD, an Intel i5-6600 with SATA SSD and an Intel i5-6600 with HDD.

I used various batch files to create a spread of testing scenarios with the same D3 or D800 or D810 file copied and renamed into directories with 200, 400, 800, 1200, 1600 and 2200 copies of the same NEF image. Using the same image 6,400 times with different filenames made checking for corruption easier as I could calculate a CRC from the "non-tag" portion of the file.  A set of 4 batch files then executed either a simple or complex set of tag changes to all files in a given directory using either the standard or alternative exiftool windows executable. This allowed testing up to 2,200 files processed with one exiftool command.  Additional wrapper batch scripts were then used to run through 30 iterations of each test – which resulted in 192,000 file updates for each of the 4 variants.  A batch script could be used to check for variations in the non-tagged data CRC compared to a control file and any files with variations flagged and renamed for additional visual comparison in tools such as Adobe Bridge.

During testing, I found some tag options that seemed to change what I believe should be "non-tag data" in the NEF file.  See later section "Issues getting just the raw data without tags for CRC calculations".

So what did the result show?

On all systems, the memory subsystem was checked and passed without error by running PassMark MemTest86 and in the case of the AMD platform this was run for 12 hours without errors. I verified that the latest firmware was installed on all SSD, which it was.

When using a Nikon D3 NEF file with all 4 different scenarios being run through 60 iterations (meaning 1.5million NEF file updates). I was never able to detect any file corruption issue via CRC or visual changes on any one of the test platforms.

However, using virtually identical batch scripts and tests with the 45MB Nikon D810 NEF files intermittent data corruptions occurred. When running through a 30-iteration cycle of the test 192,000 file updates were performed by each test variant. Errors would only occur in folders with more than 400 images, meaning exiftool was processing 400 or more files via one command, rather than each file being updated by a separate instance of exiftool.

Further testing showed what if the copyright tag was not updated as part of the tag set, then these tests would result in no errors. If the copyright tag was updated as part of the multiple tag update with the 45MB Nikon D810 NEF files, then corruption would typically occur at about 2 per 100,000 updates on Intel platforms (and later on the AMD platform).  Initially on the AMD platform, the rate of corruption when setting the copyright tag in the D810 NEF files occurred at significantly higher rates of 20plus corruptions per 100,000 updates.

With the data corruption occurring at a higher rate on the AMD 5900X system, many days were spent running additional tests and different scenarios to see if any contributing factors could be uncovered.  Many different options and configurations were tried, the only two which resulted in any change of corruption rates were as follows:
-    If I ran the test on a SATA-attached SSD rather than one of the three  MVMe SSD I had previously been using. When using this slower storage, interestingly the rate of corruption was about 1/5 of the previous tests.
-    After upgraded the UEFI firmware on the MSI motherboard which included AGESA 1.2.0.8 update, the corruption rate then dropped to be in line with the Intel systems. And subsequent use of SATA or MVMe storage did not appear to make any discernible difference. The only information I could find about AGESA 1.2.0.7 to AGESA 1.2.0.8 update is that it fixed several security items in the UEFI firmware, so I am at a loss to explain why there is now a dramatic change to the corruption occurring when using exiftool with large NEF files, on the AMD system – perhaps some rare corner case timing Phil had mentioned?

In Summary:

Testing was conducted on three Intel and one AMD 5900x systems with over 10 million NEF file updates being performed. All systems had thorough memory tests conducted and hardware burn-in tests run. For the AMD 5900X system, once the UEFI firmware was updated on the MSI motherboard to the latest version which included AGESA 1.2.0.8, the NEF corruption reduced to be inline with levels seem on other Intel systems. 

No corruptions were ever detected for smaller NEF files such as the D3, or even D7000 files.

The file corruption issue only occurs with exiftool when updating large NEF files such as a 45MB D810 NEF when processing 400 or more files at once and when the copyright tag was included in the group of tags to be updated.  Once the AMD issues was resolved the rate on all platforms occurred at a very low rate of about 2 corruptions per 100,000 updates on all platforms AMD and Intel.

The chunk size enhancement has been mentioned as a possible workaround for large files.


Issues getting just the raw data without tags for CRC calculations:

To conduct these tests, I developed some batch commands for exiftool to hopefully output just the raw data without tags so a checksum could be calculated before and after any tag updates.  However, If either of the following tags were used "-*:Software=" or "-Copyright=(c) something" (but not –rights= or –CopyrightNotice=), then CRC data calculated for what I assumed should be the non-tagged portion of the file changed. 

I used a myriad of different exiftool commands in an attempt to get just the raw component of the NEF file minus any tags.  The following command produced the best results, but still was had variations when the copyright tag had been updated on the image.

exiftool" -F -EXIF:all= -IPTC:all=  -XMP:all= -allDates= -all= -q -q  -o - _DSC00000.nef | MD5 –n 

I even tried using "–makernotes=" and some other variations, but to no avail. Maybe Phil can comment on why exiftool still seems to output some copyright data.


Enhancements Ideas:

Chunk size - As previously discussed in this thread, it could be useful to consider having a larger chunk size option for exiftool, as this could reduce the possibility of any bit flipping as Phil mentioned.  It may also make exiftool more efficient on newer systems.  One idea is if the filesize is larger than 20MB, then exiftool could automatically used a larger chunksize. Not too sure of Phil's plans for implementing this?

Raw only data - if there could be some option made to just output the raw data or generate a CRC for the raw data this would be great.

Lastly there is always the utopia of a CRC to be performed on the when the data is chunked from source to destination.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 06, 2023, 01:10:16 PM
Wow, thanks for all of the very thorough testing.

Very interesting.  So the error rate dropped by a factor of 10x with the motherboard firmware update on the AMD 5900X?  And dropped another 5x when you used a  SATA-attached SSD?  It worrying that it never went to zero.

Here is a version of 12.57 (12.57p) (https://exiftool.org/exiftool-12.57p.zip) which uses a 1 MB buffer instead of 64 kB.  There is also the alternate Windows version (https://oliverbetz.de/pages/Artikel/ExifTool-for-Windows) to test (but this still uses a 64 kB buffer).

I won't be happy until we get an error rate of zero.

Unfortunately, doing a CRC would have a huge impact on performance, so this isn't really an option.

I don't now why you need a raw-data-only CRC for testing.  If you are writing the same thing to the same files then the result should be the same.  You can just do a CRC on the entire file to see if there was any corruption.  (You are using a set of identical source files, correct?)

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 08, 2023, 09:22:31 AM
Quote from: Phil Harvey on March 06, 2023, 01:10:16 PMUnfortunately, doing a CRC would have a huge impact on performance, so this isn't really an option.

...also, it isn't clear to me that this would detect the error.  We don't know if the bit error is in the disk read, disk cache, i/o, ram cache, ram, or disk write phase, and doing a CRC would test only a subset of these.  (Of course, if it is software related then the problem occurs while data is in ram, but I still don't see how this could be the case -- software writes whole bytes, it would be a very unique failure to toggle individual bits.)

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 08, 2023, 01:22:49 PM
Thanks, Phil, I had previously tested with the alt version and that had typically performed the same as the standard version.

I have now tested with the larger buffer version (12.57P) and after a quick 24,000 test run, there were 7 corruptions – so somehow things actually seemed worse.  ::)

I wondered if there was some issue with the self-extracting Perl environment, so I deleted the cache-exiftool-12.57p folder in the temporary par-7369646e6579 directory. Then I reran this short test.  This time the results of the short test were good, with zero errors. I then deleted the cache folder another time & repeated the tests, again all OK.  :o

I quickly did the same tests on one of the other intel machines, deleting the cache after each run. Then one out of 10 times, a similar bunch of errors showed up on that machine. Again when running the test again after the cache was deleted there were no errors. ???

This of course leads makes one to wonder what or how deleting the cache and then rerunning the test should cause a different result and why when things do go wrong, they seem to end up with weird bit flips happening. I am also wondering if someone the block copy function could be tested independently as my SSDs are getting some serious wear.

As to the second part – regarding CRC/MDR or the raw portion

Given the corruption issues which have been experienced and with a library of over 400,000 NEF files, you could probably understand why there is some degree of anxiousness. I wanted some way to double-check image integrity, so I could be sure that no silent corruption is creeping into the image library, in which some tags get changed.

I have something close which can export the "non-tag" portion, but still, there seems to be some embedded data which gets changed by exiftool and can not be excluded from the output. The closest to getting non-tag data from the NEF is using
%EXIFTOOL% -F "-EXIF:all=" "-IPTC:all=" "-XMP:all=" "-allDates=" "-all=" -q -q  -o -  filename.  This seems to work except if –Copyright= was changed and some embedded date (perhaps the GPS date).

Based on the above exiftool command, I have built a set of batch files to store an MD5 checksum in a CSV file for each folder. Then other batch files can perform validation against this data and inform me if some alteration to the non-tag portion of the file has occurred.  Doing this has already shown up just over 20 other corruptions in my raw image archive, which would have not otherwise been detected.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 08, 2023, 01:41:33 PM
Wow.  Darn.  OK.  So changing things that I thought may make a difference (exiftool version and buffer size) didn't help at all.  But changing something that should be unrelated (clearing the ExifTool temporary files) does have an effect.  The alternate version has a different method for unpacking the temporary files, but it also shows problems.

So we're no closer to finding the source of the bit errors.

Even if you come up with a method to verify the raw data itself is OK, I don't know if we can disregard the possibility of a bit error in the rest of the file, which could also make it unreadable. :(

I can't run any tests myself to see if I could reproduce this problem because I don't have enough free disk space on my system here, but I would be very surprised if I was able to replicate this behaviour on MacOS.

I don't know what to suggest at this point.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 08, 2023, 02:37:36 PM
Yes I am left with a bit of proverbial head scratching also...

Wondering if it would be worth considering grabbing perl and trying a setup that way?
If so where is a good guide to do that?
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 08, 2023, 03:13:26 PM
Doing this with ActivePerl (https://www.activestate.com/products/perl/) is easy.  Just install ActivePerl then you can run the pure Perl version of ExifTool (https://exiftool.org/Image-ExifTool-12.57.tar.gz) from any directory (just unpack and run -- no need to install).  (But you may need to run exiftool by typing "perl exiftool" instead of "exiftool".)

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 12, 2023, 07:35:31 AM
I see how it could be useful to verify the image data after writing, so I'm going to look into adding an ImageDataMD5 tag which would represent the MD5 of the image data only.  This tag wouldn't be generated unless specifically requested.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 12, 2023, 08:35:23 AM
Thanks Phil, that would be a great benefit.

When you do could you look at support this - both writing the value into a tag or generating the MD5 as an output would good ideas.  I have added three batch scripts to the previous share so you can see examples of how my current method of running an external MD5 mechanism was performed.

I will be happy to do any testing for you on this.

Regards
Sidney
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 12, 2023, 08:39:50 AM
Hi Sidney,

Quote from: sidneyd on March 12, 2023, 08:35:23 AMcould you look at support this - both writing the value into a tag or generating the MD5 as an output would good ideas.

These are built-in features for all tags, including an ImageDataMD5 tag if I can add it.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 15, 2023, 01:42:08 PM
I've just released version 12.58 with the new ImageDataMD5 feature.  To write this to a tag, you would do this:

exiftool "-SOMETAG<imagedatamd5" FILE

- Phil

Note that for some JPEG images the ImageDataMD5 value will change in the next ExifTool release (version 12.59).  In this version I will also add JPEG RST segments to the MD5 calculation.

Also note:  I ran a number of tests trying to reproduce this problem on my MacOS system without success (see this post (https://exiftool.org/forum/index.php?msg=78689)).
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 16, 2023, 04:38:31 AM
Phil,

thanks for the new functionality and additional guidance of using this new MD5 feature. I have run a quick test to generate a CSV file containing MD5 image data checksums and it is looking good. I will run this against the much larger collection over the next week.

I have included ome examples of the commands I am using to do these tests, which may be useful for the reference of others):

To generate a CSV file of checksums:
exiftool -p "$filename, $imagedatamd5" -ext nef . > checksum.csv
To write a MD5 image checksum to xmp:identifier
exiftool -overwrite_original -preserve -F "-FileModifyDate=Now" "-xmp:identifier<imagedatamd5" -ext nef .
To simply display the filename, imagedatamd5 and xmp:identifier side by side for comparison:
exiftool -p "$filename, $imagedatamd5, $xmp:identifier" -ext nef .
To check if the value of $xmp:identifier exists:
exiftool -p "$filename" -if "not defined $xmp:identifier or $xmp:identifier eq ''" -ext nef .
To write a value to xmp:identifier if it does not exist, with the MD5 image data:
exiftool -if "not defined $xmp:identifier or $xmp:identifier eq ''" -overwrite_original -preserve -F "-FileModifyDate=Now" "-xmp:identifier<imagedatamd5" -ext nef .
To display all MD5 validated images:
exiftool "C:\Program Files (x86)\Geosetter\tools\exiftool.exe" -q -p "MD5 OK for: $filename" -if "$xmp:identifier eq $imagedatamd5" -ext nef .
To display all images in which the MD5 image data checksum is different to xmp:identifier:
exiftool -p "Bad MD5 in: $filename" -if "$xmp:identifier ne $imagedatamd5" -ext nef .
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 18, 2023, 06:34:14 AM
It seems that the $imagedatamd5 is only working for Nikon raw images or older canon CR2 at the moment, if I am correct?

- $imagedatamd5 works for all Nikon .nef or .nrw raw images which I could find even latest Z9.
- $imagedatamd5 works for all canon .cr2 raw images.
- It does seem to work for canon .cr3
- It does seem to work for minolta .mrw
- It does seem to work for panasonic .rw2

In cases where it does not work, issuing the command such as the following, results in error:
exiftool -p "$Filename,$ImageDataMD5" IMG_0344.CR3
Warning: [Minor] Tag 'ImageDataMD5' not defined - ./IMG_0344.CR3
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 20, 2023, 07:34:30 AM
yes, some file types are not yet supported -- I'll be adding more in the next release.  Currently only JPG and TIFF-based formats (except Panasonic raw) are supported.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 20, 2023, 08:01:19 AM
Thanks Phil for the confirmation, just wanted to make sure what was currently supported is working  ;D
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 20, 2023, 09:46:37 AM
Hi Phil,

interestingly minor errors are sometimes generated when using "-XMP:Identifier<$ImageDataMD5"

When writing XMP:Identifier in a few files using the following command, minor error occur:
exiftool.exe -overwrite_original -preserve -F "-FileModifyDate=Now" "-XMP:Identifier<$ImageDataMD5" -ext nef .
Warning: [minor] Error 3 placing :filterId in structure or list - ./_DSC2769.nef
Warning: [minor] Error 3 placing :filterParametersIntegerName in structure or list - ./_DSC2772.nef
Warning: [minor] Error 3 placing :filterActive in structure or list - ./_DSC2773.nef


If I made the command to store a static value such as hello world or one of the calculated MD5 values, there were no minor errors:
exiftool.exe -overwrite_original -preserve -F "-FileModifyDate=Now" "-XMP:Identifier="518fef2958a1e1866f6391a699ec3fd5" -ext nef .

However as soon as it revers to "-XMP:Identifier<$ImageDataMD5" there are these minor errors poping up.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 20, 2023, 10:27:12 AM
This can happen when copying tags if the XMP is structure incorrectly in the source file.  What software wrote this XMP, and could you post the raw XMP so I can take a look?  (the output of exiftool -b -xmp FILE > out.xmp)

- Phil

Note: I have split off MichaelKnight's thread concerning corruption on MacOS into a separate topic (https://exiftool.org/forum/index.php?topic=14614.0) because I think this was a different problem.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 20, 2023, 11:18:23 AM
In the next ExifTool release I will add an XMP-et:OriginalImageMD5 tag as a better place to store the ImageDataMD5 value (as opposed to repurposing XMP:Identifier).

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 20, 2023, 11:39:50 AM
These files had previously been updated with GPS and location data via GeoSetter, which used exiftool to write the data.  I believe the data had been written using ExifTool 12.21.

I have copied the nef files as well as the xmp outputs to the previously supplied Google Drive.

Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 20, 2023, 12:01:51 PM
Thanks.  These warnings are coming from the XMP embedded in the Nikon trailer from editing with some Nikon software.  I'll have to look into this in more detail to see exactly why these warnings are generated, but it could be just because ExifTool doesn't have definitions for these (in which case the warnings should probably be suppressed).

- Phil

Edit: I've looked in more detail and it isn't actually true XMP in the Nikon trailer, hence the warnings.  I'll add a patch to suppress these in ExifTool 12.59.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 20, 2023, 01:41:08 PM
I do remember there was one folder where I had tried Nikon StudioNX to look at GPS tags sometime in the past, so perhaps it was Nikon StudioNX which mangled the tags in some of these files. 

I will use the following command as it is my understanding this should rewrite any damaged tag structures in the files.
exiftool -m -r -overwrite_original -all= -tagsfromfile @ -all:all -ext nef .
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 20, 2023, 02:12:41 PM
Just to be clear:  The Nikon software didn't mangle any XMP tags -- it was XML-format metadata that ExifTool was processing (for convenience) as XMP.  The warnings shouldn't have been generated.

I've got the ImageDataMD5 feature working for RW2, CR3, MRW, MOV and MP4 files now too.  I've been working on this since last week -- it has actually required a fair bit of effort since these different formats store the image data differently.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 20, 2023, 02:15:06 PM
OUCH!  I just read the command in your last post.

DO NOT APPLY -all= TO ANY RAW FILE!  This may remove proprietary data that can't be added back in.

The command you gave should only be used for JPEG-format files.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 20, 2023, 02:42:22 PM
So would exiftool -m -r -overwrite_original -tagsfromfile @ -all:all -ext nef . be OK to use?
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 20, 2023, 03:17:40 PM
I wouldn't use any blunt instrument like this on a raw file.  In theory it should be OK, but it is very risky to change so much in a raw file.  Also, what are you trying to accomplish with this command?

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on March 20, 2023, 04:02:26 PM
I wanted to fix or cleanup the XMP embedded stuff which was left behind, after digging through the various tags left by Nikon StudioNX, I will try the following
exiftool.exe -r -overwrite_original -NikonApp:all= -ext nef .
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on March 21, 2023, 08:52:51 AM
This is a safe way to do what you want.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on April 22, 2023, 06:00:11 AM
Hi Phil,

I have been occupied with some other projects recently, so only now have had a chance to test all of the new MD5 functionality with 12.60 in a number of tests.
:D MD5 generation for heaps of different image types now works great.
:D It seems that EXIFTOOL "-XMP-et:OriginalImageMD5<$ImageDataMD5" abc.nef works,

However I can not seem to be able to read the value of -XMP-et:OriginalImageMD5 back from the file as per following examples. 


For example
exiftool.exe -v2 -overwrite_original "-XMP-et:OriginalImageMD5=1234"  -ext nef .
WWriting XMP-et:OriginalImageMD5
======== ./_DSC0653.nef
Rewriting ./_DSC0653.nef...
  Editing tags in: File IFD0 TIFF XMP
  Creating tags in: File IFD0 TIFF XMP
  FileType = NEF
  FileTypeExtension = NEF
  MIMEType = image/x-nikon-nef
  Rewriting IFD0
  Rewriting SubIFD
  Rewriting SubIFD1
  Rewriting SubIFD2
  Rewriting XMP
    - XMP-et:OriginalImageMD5 = '4333b1a3541e03dee7d6a26cef86ee90'
    + XMP-et:OriginalImageMD5 = '1234'

  Rewriting IPTC


But the command to read or extract the stored MD5 checksum
exiftool.exe -table "-Filename" "-XMP-et:OriginalImageMD5"  -ext nef .
_DSC0653.nef    -

exiftool.exe -p "$Filename,$ImageDataMD5,$XMP-et:OriginalImageMD5"  -ext nef .
Warning: [Minor] Tag 'XMP-et:OriginalImageMD5' not defined - ./_DSC0653.nef
    1 directories scanned
    1 image files read


exiftool.exe" -a  "-XMP:all" "-XMP-et:all" -ext nef .
======== ./_DSC0653.nef
XMP Toolkit                    : Image::ExifTool 12.60
Rights                          : (c) Copyright 123
Date/Time Digitized            : 2023:03:12 08:15:10+01:00
Date/Time Original              : 2023:03:12 08:15:10+01:00
Date Created                    : 2023:03:12 08:15:10+01:00
Copyright Status                : Protected
Create Date                    : 2023:03:12 08:15:10.55
Creator Tool                    : NIKON D810 Ver.1.14
Modify Date                    : 2023:03:12 08:15:10+01:00
Marked                          : True
    1 directories scanned
    1 image files read


Am I missing somethign here ???
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on April 22, 2023, 07:14:11 AM
Oops.  You're right. Sorry, I should have tested this.

The value is written, but ExifTool is ignoring the "et" namespace when reading for reasons that I don't remember related to reading the -X option RDF/XML.

I'll fix this in the next release.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on April 22, 2023, 07:50:31 AM
Hey Phil, thanks for confirming that I was not doing something silly :) and as always your fast responses.

If you want to drop me a pre-releae version to test, let me know, otherwise I will keep an eye out for the next release.

Regards
Sidney
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on April 28, 2023, 01:09:35 PM
Hi Phil,

I just tried reading and displaying the $ImageDataMD5 value with 12.61 release and it still comes up as not defined whether I use $XMP-et:OriginalImageMD5 or $OriginalImageMD5.

As example:
exiftool.exe -ver
12.61

exiftool.exe -p "$Filename,$ImageDataMD5,$XMP-et:OriginalImageMD5"  -ext nef .
Warning: [Minor] Tag 'XMP-et:OriginalImageMD5' not defined - ./_DSC0653.nef
Warning: [Minor] Tag 'XMP-et:OriginalImageMD5' not defined - ./_DSC0654.nef
Warning: [Minor] Tag 'XMP-et:OriginalImageMD5' not defined - ./_DSC0655.nef
Warning: [Minor] Tag 'XMP-et:OriginalImageMD5' not defined - ./_DSC7632.nef
Warning: [Minor] Tag 'XMP-et:OriginalImageMD5' not defined - ./_DSC7634.nef
Warning: [Minor] Tag 'XMP-et:OriginalImageMD5' not defined - ./_DSC7635.nef
    1 directories scanned
    6 image files read


exiftool.exe -p "$Filename,$ImageDataMD5,$OriginalImageMD5"  -ext nef .
Warning: [Minor] Tag 'OriginalImageMD5' not defined - ./_DSC0653.nef
Warning: [Minor] Tag 'OriginalImageMD5' not defined - ./_DSC0654.nef
Warning: [Minor] Tag 'OriginalImageMD5' not defined - ./_DSC0655.nef
Warning: [Minor] Tag 'OriginalImageMD5' not defined - ./_DSC7632.nef
Warning: [Minor] Tag 'OriginalImageMD5' not defined - ./_DSC7634.nef
Warning: [Minor] Tag 'OriginalImageMD5' not defined - ./_DSC7635.nef
    1 directories scanned
    6 image files read

exiftool.exe "-Filename" "-ImageDataMD5" "-$XMP-et:OriginalImageMD5" "-OriginalImageMD5"  -ext nef .
Invalid TAG name: "$XMP-et:OriginalImageMD5"
======== ./_DSC0653.nef
File Name                      : _DSC0653.nef
Image Data MD5                  : 4333b1a3541e03dee7d6a26cef86ee90
======== ./_DSC0654.nef
File Name                      : _DSC0654.nef
Image Data MD5                  : 80e452f8a0adb3b8725acc365f4b2e56
======== ./_DSC0655.nef
File Name                      : _DSC0655.nef
Image Data MD5                  : 9b11a58338dd7ea11acfc78e3bd9ef33
======== ./_DSC7632.nef
File Name                      : _DSC7632.nef
Image Data MD5                  : 87a28d422a3c61c13ebb47b3e5b410c8
======== ./_DSC7634.nef
File Name                      : _DSC7634.nef
Image Data MD5                  : c4151d2a2b199e0b97c29873369b563c
======== ./_DSC7635.nef
File Name                      : _DSC7635.nef
Image Data MD5                  : cb3182c5151ed2a44cffb5454e97ee0d
    1 directories scanned
    6 image files read


Regards
Sidney
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: StarGeek on April 28, 2023, 02:07:30 PM
Have you written a value to XMP-et:OriginalImageMD5 yet?  This warning comes up when using the -p (-printFormat) option (https://exiftool.org/exiftool_pod.html#p-FMTFILE-or-STR--printFormat) for any tag when the tag does not exist.

C:\>exiftool -G1 -a -s -Description y:\!temp\Test4.jpg

C:\>exiftool -G1 -a -s -p "$Description" y:\!temp\Test4.jpg
Warning: [Minor] Tag 'Description' not defined - y:/!temp/Test4.jpg

C:\>exiftool -P -overwrite_original -Description=test y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -G1 -a -s -p "$Description" y:\!temp\Test4.jpg
test

From the docs
     If a specified tag does not exist, a minor warning is issued and the line with the missing tag is not printed
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on April 28, 2023, 02:28:21 PM
Hi Stargeek,

yes the value had been written to the files as indicated and Phil had indicated the big was present in the previous code, and fix in 12.61.  I had written the value into the NEF files using a command similar to:
exiftool.exe -v2 -overwrite_original "-XMP-et:OriginalImageMD5<$ImageDataMD5"  -ext nef .

As the display issue was present, I set the value to 1234 and used the -v2 option so that exiftool would show that the value was changed, which it seems to he been, as below.

exiftool.exe -v2 -overwrite_original "-XMP-et:OriginalImageMD5=1234" _DSC0653.nef
Writing XMP-et:OriginalImageMD5
======== ./_DSC0653.nef
Rewriting ./_DSC0653.nef...
  Editing tags in: File IFD0 TIFF XMP
  Creating tags in: File IFD0 TIFF XMP
  FileType = NEF
  FileTypeExtension = NEF
  MIMEType = image/x-nikon-nef
  Rewriting IFD0
  Rewriting SubIFD
  Rewriting SubIFD1
  Rewriting SubIFD2
  Rewriting XMP
    - XMP-et:OriginalImageMD5 = '4333b1a3541e03dee7d6a26cef86ee90'
    + XMP-et:OriginalImageMD5 = '1234'
  Rewriting IPTC
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: sidneyd on April 28, 2023, 03:08:23 PM
Hi StarGeek & Phil.

After I had responded to Stargeek, I did some additional tests.

The value of XMP-et:OriginalImageMD5 had "seemingly" been written to the files with the 12.60 version, but could never be displayed (as earlier in the thread) and I had assumed that the values was correctly written to the file.

So out of curiosity, I used the latest version 12.61 to write $XMP-et:OriginalImageMD5 again into the file.
If I used 12.61 to write the value, into the NEF file, then it can now be correctly displayed.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: olegos on April 30, 2023, 12:20:17 PM
This ImageDataMD5 tag is very useful, thanks for adding it. Could you support it for HEIC images too, please.

By the way, I've used this command with JPEGs before for this purpose, usually to verify that two files are actually the same image with only metadata changes:

jpegtran -copy none image.jpg | md5sum
Interesting that ImageDataMD5 gives a different value than the above.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: StarGeek on April 30, 2023, 08:10:45 PM
Jpegtran is probably optimizing the image data.  Cameras don't always use the most optimized routines to write a jpeg, as speed is more essential.

As an example, I processed a file with
exiftool -all= file.jpg
processed another with the default jpegtran command and -copy none.  Then finally once more with jpegtran with -copy none and -optimize.  The resulting image will still have identical image data but it will be formatted differently.

C:\>exiftool -G1 -a -s -fileorder filename -filesize# Y:\!temp\aaaa
======== Y:/!temp/aaaa/2023-03-25_12.33.53-Exiftool.JPG
[System]        FileSize                        : 7184204
======== Y:/!temp/aaaa/2023-03-25_12.33.53-JpegTranDefault.JPG
[System]        FileSize                        : 7184238
======== Y:/!temp/aaaa/2023-03-25_12.33.53-JpegTranOptimized.JPG
[System]        FileSize                        : 7088213
======== Y:/!temp/aaaa/2023-03-25_12.33.53.JPG
[System]        FileSize                        : 7230200
    1 directories scanned
    4 image files read

One thing that jumps out is that the JpegTran default is larger than the exiftool result.  Looking closer, jpegtran is added a JFIF block, which exiftool does not.  But with the optimize option, the file is smaller.

So my first assumption was wrong.  The difference would be the JFIF block that jpegtran is adding.

Taking it a step further, checking with exiftool's ImageDataMD5
C:\>exiftool -G1 -a -s -fileorder filename -ImageDataMD5 Y:\!temp\aaaa
======== Y:/!temp/aaaa/2023-03-25_12.33.53-Exiftool.JPG
[File]          ImageDataMD5                    : 4a583b879c8b2a2f645f2870b9fff172
======== Y:/!temp/aaaa/2023-03-25_12.33.53-JpegTranDefault.JPG
[File]          ImageDataMD5                    : 4a583b879c8b2a2f645f2870b9fff172
======== Y:/!temp/aaaa/2023-03-25_12.33.53-JpegTranOptimized.JPG
[File]          ImageDataMD5                    : 17934ab470e00f176f36e9c5345415a1
======== Y:/!temp/aaaa/2023-03-25_12.33.53.JPG
[File]          ImageDataMD5                    : 4a583b879c8b2a2f645f2870b9fff172
    1 directories scanned
    4 image files read

The original, the one stripped by exiftool, and the one stripped by jpegtran all have the same hash.  So by default jpegtran isn't changing the image stream from the original.  The optimized one, of course, has a different hash.  But this was a lossless optimization and if you compare the images, for example, with ImageMagick's Compare option (https://imagemagick.org/script/compare.php), you will find the image data to be identical.
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on May 01, 2023, 08:24:24 AM
Quote from: sidneyd on April 28, 2023, 03:08:23 PMThe value of XMP-et:OriginalImageMD5 had "seemingly" been written to the files with the 12.60 version, but could never be displayed (as earlier in the thread) and I had assumed that the values was correctly written to the file.

> Image-ExifTool-12.60/exiftool a.nef "-originalimagemd5<imagedatamd5"
    1 image files updated
> Image-ExifTool-12.60/exiftool a.nef -originalimagemd5
> Image-ExifTool-12.61/exiftool a.nef -originalimagemd5
Original Image MD5              : 518fef2958a1e1866f6391a699ec3fd5

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: Phil Harvey on May 03, 2023, 01:57:08 PM
Quote from: olegos on April 30, 2023, 12:20:17 PMCould you support it for HEIC images too, please.

Done, and version 12.62 is now available with this feature.

- Phil
Title: Re: Image data corruption when update large number of raw files with exiftool
Post by: olegos on May 03, 2023, 10:43:06 PM
Quote from: Phil Harvey on May 03, 2023, 01:57:08 PM
Quote from: olegos on April 30, 2023, 12:20:17 PMCould you support it for HEIC images too, please.
Done, and version 12.62 is now available with this feature.

- Phil
Thank you!