Exiftool and effect of jpegtran optimization

Started by yegiv, January 13, 2025, 07:24:36 AM

Previous topic - Next topic

yegiv

How to use Exiftool to find out why photos are so heavy and what information is deleted during optimization using the jpegtran (-optimize).

For example, the original photo from vivo x100 is a 4.35 MB. After lossless optimization (Baseline, jpeg quality still 96), the size is 3.27 MB. After optimization with exif removal, the size is 2.66 MB.

The first two cases do not make any changes to the image or tags. Where do the megabyte come from?
The second case erases the part with 600 KB of data, which in text form takes up 1800 bytes.

Is it possible to find out what is happening?

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).


Phil Harvey

I would start by using the ExifTool -htmldump and/or -diff feature(s) to look at and/or compare the files.  I can't help more without seeing some samples.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

yegiv

Thank you Phil. I cannot interpret the comparison results on my own(

original file:
https://limewire.com/d/c373a5c0-e467-470e-95c5-f17744f81fe8#ADxa6oHrzFmlVDD-vEamOgbzy0b7R2N7edzeVhesjdE

optimized by jpegtran file:
https://limewire.com/d/588b9d88-a6a6-4f03-8bc4-9e0634b7d722#SEaKdxTGRwlB59SX2PgnRNICT657hgBZqSANGVdGjgY

StarGeek

Quote from: yegiv on January 13, 2025, 07:24:36 AMHow to use Exiftool to find out why photos are so heavy and what information is deleted during optimization using the jpegtran (-optimize).

For example, the original photo from vivo x100 is a 4.35 MB. After lossless optimization (Baseline, jpeg quality still 96), the size is 3.27 MB.

No information is deleted when JpegTran optimizes an image. To quote the Wikipedia page, the -optimize option does "optimisation of the Huffman coding layer of a JPEG file to increase compression". Cameras don't necessarily create the smallest file size because that would require more computation power and slow down the process of saving the image. Also see below where I quote exiftool FAQ #13

QuoteAfter optimization with exif removal, the size is 2.66 MB.
...
The second case erases the part with 600 KB of data, which in text form takes up 1800 bytes.

Part of this will be the binary data in the file such as the ICC_Profile and ThumbnailImage. That covers about 50kb

The rest is probably covered by FAQ #13a, Why is my file smaller after I use ExifTool to write information?. From that FAQ
QuoteThe reason for this could be to simplify camera algorithms by allowing variable-sized information to be written at fixed offsets in the output image

I've used jpegtran for over a decade to losslessly optimize my photos and did a lot of double checking to make sure it was safe. This StackOverflow answer shows how you can compare the before and after using ImageMagick. You can do similar comparisons with image programs such as Lightroom/GIMP.

The one thing you do need to watch for is when there is metadata in a trailer of the image. You can check with
exiftool -Trailer:all file.jpg

Samsung products like to add a trailer, especially the motion photos, where the motion part is an MP4 video attached as a trailer. Also, some programs such as FotoStation will attach data as a trailer. Trailer data will be lost when using jpegtran. I don't think exiftool can copy a trailer, I can't remember about that offhand.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

StarGeek

Digging through the archives, this post shows what ImageMagick returns if there is a difference between to images. In that case, they loaded the image into Paint and resaved.

The difference between your original and optimized images looks like this.
diff.jpg

Notice there are no red pixels which would indicate a difference in color.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Using the -htmldump option I can see that there is an unknown 789445-byte trailer on original.jpg that is missing from optimized.jpg.  The trailer contains a JPEG image that looks like some sort of HDR data, plus this text:

vivo{"com.android.camera.hdr":83458,
"com.android.camera.joint.fullview.orientation":0,
"com.android.camera.fisheye":-1,
"com.android.camera.takenmodel":"vivo X100 Pro",
"com.android.camera.watermarkVersion":null,
"com.android.camera.camerafacing":"0",
"com.android.camera.moduleid":"photo",
"version":2014,
"com.android.camera.project":"PD2324",
"com.vivo.xdr.maxGain":0.80559444,
"com.android.camera.joint.fullview":false}

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Ah, so using -Trailer:all doesn't show it, but using -Trailer
C:\>exiftool -G1 -a -s -trailer Y:\!temp\x\y\original.jpg
[File]          Trailer                         : (Binary data 789445 bytes, use -b option to extract)

So you would not want to run jpegtran on these files if the trailer data is important.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Quote from: StarGeek on January 13, 2025, 12:57:40 PMAh, so using -Trailer:all doesn't show it

Yes, -trailer:all didn't show it because there were no tags extracted (and the trailer wasn't requested as a block).  But ExifTool 13.13 will extract some tags from this trailer.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

yegiv

Quote from: Phil Harvey on January 13, 2025, 12:23:40 PMThe trailer contains a JPEG image that looks like some sort of HDR data, plus this text
Yes, this! Thank you, solved - checked several photos, weight of this invisible "HDR" tail is about 0,7-1,1 MB.