JPEGDigest md5 zero byte sometimes missing?

Started by m2rtin, October 12, 2020, 10:33:02 AM

Previous topic - Next topic

m2rtin

Hi, I calculate the md5 digest of the qtables just as exiftool does in the Calculate method at https://github.com/exiftool/exiftool/blob/master/lib/Image/ExifTool/JPEGDigest.pm. But I am getting different results from exiftool on some images. I can get the same result on those images, but I have to leave out the zero byte q_tbale separator.

OS: macOS
exiftool version: 12.07
command: exiftool -jpegdigest "picture_in_question*"
output: JPEG Digest                     : Unknown (78f761ede4edd41e399aa2a5a87236b0:111111)

expected output: JPEG Digest                     : Unknown (33ec2a36f9ab3a9371020ee36b897d5c:111111)

The picture in question is attached.
I found the first online perl interpreter and spilled my first perl code at https://www.tutorialspoint.com/execute_perl_online.php as follows:


unless (eval { require Digest::MD5 }) {
    $et->Warn('Digest::MD5 must be installed to calculate JPEGDigest');
    return;
}

# each qt is prefixed with qt_id
my $qt1 = pack("C*", (0, 3, 2, 2, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 7, 9, 9, 9, 9, 7, 7, 9, 8, 9, 9, 8, 9, 7, 9, 7, 7, 7, 8, 9, 9, 7, 7, 7, 8, 7, 7, 7, 7, 8, 8, 7, 10, 7, 7, 7, 8, 9, 9, 9, 7, 7, 13, 13, 10, 8, 13, 7, 8, 9, 8));

my $qt2 = pack( "C*", (1, 3, 4, 4, 6, 5, 6, 10, 6, 6, 10, 15, 13, 8, 13, 15, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13));

# qtables of https://upload.wikimedia.org/wikipedia/commons/3/38/JPEG_example_JPG_RIP_001.jpg
# my $qt1 = pack("C*", (0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255));

# my $qt2 = pack( "C*", (1, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255));


@dqtList = ($qt1, $qt2);

my $dqt = '';
my $dat;

foreach $dat (@dqtList) {
    next unless $dat;
    # $dqt .= "\0" if $dqt;
    $dqt .= $dat;
}

my $md5 = unpack 'H*', Digest::MD5::md5($dqt);

print "\nmd5:'$md5'\n";


So here I feed the q_tables manually to the digest. Notice that to obtain the output exiftool returns ('78f761ede4edd41e399aa2a5a87236b0'), I need to comment out the zero byte separator of the q_tables. If I leave it there I get '33ec2a36f9ab3a9371020ee36b897d5c'. I can't figure out why that is. On most other images I get the same results as exiftool. For example:

the exiftool output of image https://upload.wikimedia.org/wikipedia/commons/3/38/JPEG_example_JPG_RIP_001.jpg is 'f83c5bc303fa1f74265863c2c6844edf' => 'Independent JPEG Group library (used by many applications), Quality 0 or 1'
the snippet above also returns 'f83c5bc303fa1f74265863c2c6844edf' if the separator is uncommented.

I don't know if it is a bug per se, but I'm confused what is going on and why I need to leave out the separator in some cases and have it present in other to get the same md5.


m2rtin

Ok, I figured out what is going on.

If the image in question has both quantization tables defined within the same DQT block, there will be no zero byte separator. If there are two DQT blocks (one for each q table), there will be a zero byte separator.

Therefore, in this case, it is a bug, because two pairs of identical quantization tables will have different hashes based on the way they are encoded in the file.

Phil Harvey

Interesting, thanks.

But it is significant if the quantization tables are encoded differently in the file, because this likely indicates the file was written by different software, so in my view the hashes should be different in this case.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).