Are these photos hosting hidden messages?

Started by Keith, March 21, 2020, 04:38:14 AM

Previous topic - Next topic

Keith

After years of putting this off, I'm finally making a serious attempt to organize my photos into a structured archive. To that end, ExifTool and BeyondCompare have been invaluable tools for identifying and confirming duplicate photos among my collections.  Generally, it's been a smooth - albeit time-consuming - process.  But...

I've recently encountered a very curious anomaly among a subset of 'group' photos which were uploaded to Google+ Events by myself and friends.  For those who may not recall, Google+ (RIP) had a feature where those who attended 'Events' together could share photos to a common archive.  And more-or-less the photos would remain intact, including with metadata for the entire group to view and download.  Over the years I downloaded Event sets to my local machine for future archiving.  Then when Google+ was shut down last April, I downloaded Event photos of particular sentiment (my social group was heavily integrated with G+).  I then moved the old and new sets together, resulting in a dozen or so duplicate sets.  Or so I thought...

When I compared many of the duplicate sets to each other using BeyondCompare, ~90% of the photos were binary identical  But a few here and there had small binary differences.  At first I checked to see if perhaps the differences were due to metadata.  But using ExifTool, I found them to be identical.  Then I assumed Google re-compressed some of the photos to save space as they now offer in Google Photos.  But the 'duplicate' photos were barely any different in size - differing by 2-20 bytes (< 0.0001%).  So then I did a pixel-wise comparison and noticed something strange.  Generally when a photo has been re-compressed, the pixel differences are seemingly random and scattered throughout the image.  But this wasn't the case.  Instead, these differences almost resembled small QR codes embedded within the visual data of the photos.  Some were more pattern-like, and others looked like possibly encoded structures.  I will attach some screenshots here for reference.  It's unlike anything I've encountered before, and I've very curious what these might mean.  Apart from the small visual differences, I can find no indication of which photos was the earlier version and which was the latter (again, the metadata is identical).  I even learned a bit about steganography (the process of embedding data into photo files), but nothing there quite matched what I see here.

Can anyone here help me to understand what I'm seeing?

Attached (I will attempt to edit this post to include images in-line):
-screenshot-1 (first full image) with three close-ups of the artifacts (screenshot-1a/b/c)
-screenshot-2 (second full image) with three close-ups of the artifacts (screenshot-2a/b/c)

Image 1 Artifacts:        
image 1 Full Comparison:


Image 1 Artifacts:        
image 1 Full Comparison:

StarGeek

These look to me to just be some minor corruption in the file. Maybe a download error, maybe it happened on the upload, or maybe bit rot .  Jpegs are stored in 8x8 pixel blocks and these look like they're just a changed bit in a couple of those 8x8 blocks.  See this ComputerPhile video for an attempt to explain the very complex process that makes a jpeg image.  At about 3 minutes in it gets into the breakdown of the 8x8 pixel block.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Keith

@StarGeek - That's an interesting suggestion, and the video was well worth the watch.  I had previously not known the underlying encoding of JPG data.

That being said, I'm not convinced that bit rot is what's going on here.  At least not the latent (random) bit flip variety.  I do think you're onto something with the size of the blocks (all appear to be 8x8 or 16x16).  But apart from that there are several issues I see with this being bit rot or even incidental corruption.  I'm open to convincing though.  Here's my thought process:

(1) Statistically, I have binary-compared perhaps 200k-500k photos from local sources prior to this and apart from one THM file, never found another JPG file with bit rot.  By comparison, about 5-10% of these grouped images have binary differences.  Granted this could simply point to a common source of corruption.

(2) All of the images I've opened for visual inspection (~20+) appear perfectly identical and with no visual indication that anything is wrong.  Other JPG corruption I've seen in the past is usually very obvious, casting major color shifts or destroying image data for entire sections of the image.  I'm seeing nothing like that here, even at a pixel-wise level.

(3) Despite the highly-localized pixel differences, large segments of the binary data (the vast majority of the image data) differs between the two versions.  See image below for reference.  Considering the major binary differences, I'd expect these to be either unrelated images or heavily recompressed.  But that simply doesn't match the pixel data.  Am I missing something about this?



One other thing I didn't note before was that likely both sets of images were originally downloaded as zip files.  I don't see why this would cause the discrepancies seen here, but I'm mentioning it now just in case.

Thank you for any thoughts you'd like to share.

Keith

@StarGeek - I was wondering if you had any additional thoughts on my last post.  You seem quite knowledgeable, and I would really appreciate your help in resolving this question so that I can move on with my archiving.

Also, regardless of the cause of the discrepancies, is there any way to determine which is the most 'original' version and which was modified later?  The EXIF data is identical, so no clues there.

StarGeek

Quote from: Keith on March 27, 2020, 07:50:29 PM
@StarGeek - I was wondering if you had any additional thoughts on my last post.  You seem quite knowledgeable, and I would really appreciate your help in resolving this question so that I can move on with my archiving.

I don't have anything else to add.  It appears to be minor corruption to me.  Corruption can affect a single block if that is how it was encoded.  See the man page for the JpegTran utility, specifically if you used -restart 1B as one of the options.  That restarts the encoding at set intervals.  Using 1B would restart it every block.  In such a case, there could be several bytes corrupted and it would only affect a block or two. I believe this is how various jpeg "compressor" programs such as TinyJpg work when they encounter large areas of solid color, just re-encode the blocks where there can be more savings.

The thing to remember is that jpegs can be losslessly changed on the block level.  You can clip on the block grid, you can rotate, you can change it to a progressive jpeg and back again, you can optimize the encoding so it can be an even smaller jpg, all without a single bit of lost or changed color.  Just because some of the individual bytes are different, doesn't mean the the resulting colors are different when decoded.

QuoteAlso, regardless of the cause of the discrepancies, is there any way to determine which is the most 'original' version and which was modified later?  The EXIF data is identical, so no clues there.

I wish there was.  It's something I've looked into over and over, without finding an answer.  The best I've been able to come up with is I load the two images I'm comparing into Irfanview, and sharpen each one 6-10 times.  Then zoom in on a section and try to judge which one has "more jpeg", i.e. more jpeg artifacts.  Multiple recompressions lead to more patterns when sharpened multiple times, while the closer to the original it is, the smoother it looks.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

If it is recompressed then the quantization tables would likely change.  In this case, the following command may give a clue as to what did the compressing:

exiftool -jpegdigest FILE

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Keith

Quote from: StarGeek on March 27, 2020, 11:14:03 PM
Quote from: Keith on March 27, 2020, 07:50:29 PM
@StarGeek - I was wondering if you had any additional thoughts on my last post.  You seem quite knowledgeable, and I would really appreciate your help in resolving this question so that I can move on with my archiving.

I don't have anything else to add.  It appears to be minor corruption to me.  Corruption can affect a single block if that is how it was encoded.  See the man page for the JpegTran utility, specifically if you used -restart 1B as one of the options.  That restarts the encoding at set intervals.  Using 1B would restart it every block.  In such a case, there could be several bytes corrupted and it would only affect a block or two. I believe this is how various jpeg "compressor" programs such as TinyJpg work when they encounter large areas of solid color, just re-encode the blocks where there can be more savings.

The thing to remember is that jpegs can be losslessly changed on the block level.  You can clip on the block grid, you can rotate, you can change it to a progressive jpeg and back again, you can optimize the encoding so it can be an even smaller jpg, all without a single bit of lost or changed color.  Just because some of the individual bytes are different, doesn't mean the the resulting colors are different when decoded.

QuoteAlso, regardless of the cause of the discrepancies, is there any way to determine which is the most 'original' version and which was modified later?  The EXIF data is identical, so no clues there.

I wish there was.  It's something I've looked into over and over, without finding an answer.  The best I've been able to come up with is I load the two images I'm comparing into Irfanview, and sharpen each one 6-10 times.  Then zoom in on a section and try to judge which one has "more jpeg", i.e. more jpeg artifacts.  Multiple recompressions lead to more patterns when sharpened multiple times, while the closer to the original it is, the smoother it looks.
Thank you for once again taking the time to share your deep knowledge of JPEG compression.  Unfortunately, it looks I may not find a satisfying answer to what is going on here.  If the discrepancies were limited to one or a few bytes of code here or there I think I could chalk it up to corruption - either during the zipping or transfer.  And if it was 're-compressed' for some reason, I can't understand why since the sizes are nearly identical.  But without the original online source available any longer, I'll just have to accept that - while frustrating from an archiving perspective - ultimately the image data remains basically identical and select one for posterity.

Your help has been much appreciated.

Keith

Quote from: Phil Harvey on March 28, 2020, 08:15:56 AM
If it is recompressed then the quantization tables would likely change.  In this case, the following command may give a clue as to what did the compressing:

exiftool -jpegdigest FILE

- Phil
I checked this command on a few different photos (and both versions of each one), but the output was identical: 'Unknown' followed by a long hex string.  So from this I presume you're saying that no standard recompression occurred.

Phil Harvey

ExifTool's list of JpegDigest values is by no means comprehensive.  But if the hex numbers are the same then it is a fair indication that the images were produced by the same software.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).