JPEG being detected as TIFF

Started by TSM, June 08, 2017, 07:15:20 AM

Previous topic - Next topic

TSM

I have a JPEG being detected as a TIFF, file confirms its a JPEG and also Photoshop (will not open if i rename to tiff), using jpeginfo it says the header is corrupt so I fix it by passing though jpegoptim and then Exiftool then reads it as JPEG and has lots of extra metadata.
Not sure why some tools will read it correctly while Exiftool does not (even if corrupt).

I have image but cant distribute publically due to copyright, I can send link direct if required to diagnose issue.

Phil Harvey

My email is philharvey66 at gmail.com

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

This file gives the following warning:

Warning                         : Skipped unknown 14 byte header

It isn't recognized as a JPEG image because the JPEG header is not valid (it contains a couple of extra null bytes).  But ExifTool finds the (TIFF-format) EXIF information 14 bytes into the file, which makes it look like a TIFF image with an unrecognized header.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

TSM

Thanks for quick diagnosis.

In instances of corruption could it not (by way of a flag to enable extended operation) instead look for image markers ie 0xffd8 SOI using the likes of File::MMagic and still throw error or would this be scanning the file too much or breaking crossplatform capabilities?
Linux file magic figures it out correctly because it looks for JPEG SOI marker but we currently dont use this as it does not return mimetype which we prefer.
An incorrect file type may be more problematic than a corrupt one in our DAM where we look at what is passed back from ExifTool and then switch what we do with the file.

Phil Harvey

#4
It isn't a valid image, so I see your point that ExifTool should return a type of "TIFF".  But what should it return?  "application/unknown" makes sense for MIME type, but what about FileType and FileTypeExtension?

- Phil

Edit:  Actually, I think that ExifTool shouldn't return any of FileType, FileTypeExtension or MIMEType for these files.  This is what it does for truly unknown files.  If you agree, I'll make this change in the next release.  Also, the warning will be changed to:

Warning                         : Processing TIFF-like data after unknown 14-byte header
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

TSM

In this instance the file was a JPEG image, some programs will read it fine Photoshop/Windows Photo Viewer etc, ImageMagick will not though. Once I pass it though jpegoptim it is readable again. Weird but I guess it depends on how they write the decoder and thats where I bow out.

It may be best to handle it like an unknown file but this could throw many systems out that are tolerant of corrupted JPEGs which are very common. Maybe add a flag to enable strict file typing and also indicate in the output via a specific option that the filetype was not certain. This would not break current operation.

Not sure. I can see problems with both ways, I defer to your judgement on this.

Phil Harvey

Regardless, I think we can both agree that identifying as TIFF is wrong.  And since it really isn't a valid JPEG either, I lean towards treating this as unknown.  A valid JPEG image must start with ff d8 ff, but it seems that some decoders only look for the ff d8.  I don't like the idea of relaxing this check in ExifTool.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).