ExifTool Forum

ExifTool => Newbies => Topic started by: jonharvey on August 17, 2022, 08:24:22 AM

Title: Exiftool read file vs stream gives different FileType, MIMEType
Post by: jonharvey on August 17, 2022, 08:24:22 AM
Hi Phil,
I have a pdf file created in Adobe Illustrator. When I read the meta info from the file:

exiftool -fast -j -c %+.6f 1KwRy3PXvWXNpsPCkwBM3.pdf > pdf-file.json

The pdf-file.json has the expected:
  "FileType": "PDF",
  "FileTypeExtension": "pdf",
  "MIMEType": "application/pdf"

And then as a stream with:

curl -s -S -f file:///Users/jonharvey/Downloads/exiftool-pdf-issue/1KwRy3PXvWXNpsPCkwBM3.pdf | exiftool -fast -j -c %+.6f - > pdf-stream.json

However, the pdf-stream.json has:
  "FileType": "AI",
  "FileTypeExtension": "ai",
  "MIMEType": "application/vnd.adobe.illustrator"

I'd like the stream output to match the file output, so the FileType,FileTypeExtension,MIMEType matches the original file extension and is pdf.

Is this possible with any exiftool command options or am I missing a specific curl option?

Thanks,
Jon
Title: Re: Exiftool read file vs stream gives different FileType, MIMEType
Post by: StarGeek on August 17, 2022, 10:15:50 AM
cURL from a local file? Interesting.  Didn't realize that was something cURL could do.

Try removing the -fast option (https://exiftool.org/exiftool_pod.html#fast-NUM). I know that when I use that option with cURL with an online file, the whole file is not downloaded, so maybe some data is getting cut that would be needed to properly identify the file.
Title: Re: Exiftool read file vs stream gives different FileType, MIMEType
Post by: jonharvey on August 19, 2022, 12:59:27 PM
Hi,

I tried removing the fast option for the stream input, i.e.:
curl -s -S -f file:///Users/jonharvey/Downloads/exiftool-pdf-issue/1KwRy3PXvWXNpsPCkwBM3.pdf | exiftool -j -c %+.6f - > pdf-stream2.json

But still get:

"FileType": "AI",
"FileTypeExtension": "ai",
"MIMEType": "application/vnd.adobe.illustrator"

Title: Re: Exiftool read file vs stream gives different FileType, MIMEType
Post by: StarGeek on August 19, 2022, 08:26:57 PM
Phil will have to comment on this.  I can't replicate the problem locally with the PDFs I tried.
Title: Re: Exiftool read file vs stream gives different FileType, MIMEType
Post by: jonharvey on August 22, 2022, 11:14:48 AM
Thanks for the quick reply and taking a look StarGeek.

I have had a chance to look at the source code and it looks like there is logic in lib/Image/ExifTool/PDF.pm starting at line 364 which uses the file's FILE_EXT to either set the FileType to AI or PDF - here it is:

 364     Illustrator => {
 365         # assume this is an illustrator file if it contains this directory
 366         # and doesn't have a ".PDF" extension
 367         Condition => q{
 368             $self->OverrideFileType("AI") unless $$self{FILE_EXT} and $$self{FILE_EXT} eq 'PDF';
 369             return 1;
 370         },
 371         SubDirectory => { TagTable => 'Image::ExifTool::PDF::Illustrator' },
 372     },

So my guess is that as the input is a stream, the FILE_EXT is not set, so the FileType becomes AI.
When the input is a file, FILE_EXT is set to PDF, so the FileType remains PDF.
But like I said that's just my guess...

If I could somehow pass in the input stream's fileName as an option, so FILE_EXT is set then PDF.pm would probably behave the same as with a file input.


Thanks again,
Jon
Title: Re: Exiftool read file vs stream gives different FileType, MIMEType
Post by: Phil Harvey on August 22, 2022, 12:29:45 PM
Illustrator files are difficult to distinguish from regular PDF files so ExifTool uses the file extension as a clue.  Normally I try to avoid this if possible.

- Phil