Strange behaviour:DNG file size shrink by more than 50% after metadata update

Started by martenzi, September 22, 2018, 05:22:03 AM

Previous topic - Next topic

martenzi

Attached is htmldumps from the original unopened and untagged file (1), from after metadata update without opening it (2), and from after opening and closing it and then writing metadata (3).

I have a collection of DNGs that are recovered from a previous file deletion session. Their file size is consistently 21 or 31 Mb and I have compared sizes with newly created DNGs from same CR2 source, but different pictures. The strange thing is that I can repeatedly and consistently shrink their file size with more than 50% by:

1. open them in Camera Raw -> open image in Photoshop
2. Close file without saving
3. Using the following line to update metadata:
exiftool '-EXIF:copyright=2018 @ Martenzi' '-EXIF:Artist=Martenzi' '-FileModifyDate<DateTimeOriginal'

It is my understanding that EXIFtool can rearrange and effectively "clean up/correct" metadata during writing process and this may result in a smaller file size. However, the observed size shrinkage seems way too large and does not occur unless the files are first opened and closed w/out saving in photoshop. I have used Kaleidoscope to compare differences and there is nothing picture wise. I have also compared htmldumps with Kal. and both the changed text + the deleted ones seems very marginal to account for such a large size reduction.

The size reduction does not occur unless file has been opened. I have tried to extract any CR2 from image and Adobes utility finds nothing. I have validated all pics in Lightroom before and after changes and everything seems fine. I can't figure out what accounts for the major size reduction.

Perhaps Phil or anyone else can have a look and share some obvious explanation to this.

ps. forgot one thing, pay no attention to the finder tags "Green", "Labeled" and "Opened". They are irrelevant to this.

Phil Harvey

First, let me say that it is disturbing that Adobe software will modify a DNG just by opening it (without saving).

You can see in the dumps of the original files that there is 12 MB of unreferenced data at the end of the file.  This data is not part of the DNG specification, and searching your dump files I can't find any offset that points to this data.

Looking the small part of the data included in the dump I don't see anything I recognize.  I think it is likely garbage left over from the disk after undeleting the file.  I can't explain why ExifTool only drops this garbage if the file has been opened by Camera Raw first.  Could you send me the samples so I can take a look?  (philharvey66 at gmail.com)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

martenzi

Thanks for a fast reply Phil,

I've shared a dropbox link via email for you. Let me know if you need other download source or other files.

One concern I have is whether I risk losing any data by being forced to open this collection of files just to reduce their size. Do you think there is any workaround with exiftool that cleanups the unused data without needing any additional app or opening?

Cheers,
Martenzi

Phil Harvey

Hi Martenzi,

I'll look at these as soon as possible, but it may have to wait until Monday since my weekend plans are about to kick in.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Hi Martenzi,

I've looked at all of the files you sent.  I have run ExifTool on all of them and none of them changes size.  Could you upload one where ExifTool shrinks the file so I can try to reproduce this?  (ie. a file that is just opened, but not shrunk.) Thanks.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

martenzi

Hi Phil,

Just to make sure, did you open them in Camera Raw and clicking "Open image" further in PS?
The newest ones labeled "Opened-Only" are just opened.

Phil Harvey

I mean whatever image you used _before_ running ExifTool in step 3 of your original post.

I want the image that changes size when you write it.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

martenzi

Hi Phil,

The images in folder "Opened-Only" in the dropbox are originals that have been opened and not EXIFed to. The images "Original-Large" have not been opened. It is possible that I have used exiftool on these images prior to recovering them with Disk Drill. However, I have also recovered DNGs with "Correct" file size.

* All the Large images changes size once opened + exifed. I have tested this by making copies and run the workflow on all of them.

Phil Harvey

Ah, yes.  Sorry.  I missed this folder when I downloaded the other images.

I was just curious as to why the unknown trailer is lost after the file was opened.  The reason is because Adobe writes the XMP after the unknown trailer, which means there is no longer any unrecognized trailer because the last data in the file is the XMP.  What was the trailer then looks just like unused data inside the file, so it is lost when ExifTool rewrites the file.

And to answer your earlier question, this command may be used to removed this unnecessary trailer in both cases:

exiftool -trailer:all= -forcewrite=exif FILE

The -trailer:all argument removes the unknown trailer, and the -forcewrite=exif forces the EXIF to be rewritten, removing any unused data blocks.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).