Exiftool read file vs stream gives different FileType, MIMEType

Started by jonharvey, August 17, 2022, 08:24:22 AM

Previous topic - Next topic

jonharvey

Hi Phil,
I have a pdf file created in Adobe Illustrator. When I read the meta info from the file:

exiftool -fast -j -c %+.6f 1KwRy3PXvWXNpsPCkwBM3.pdf > pdf-file.json

The pdf-file.json has the expected:
  "FileType": "PDF",
  "FileTypeExtension": "pdf",
  "MIMEType": "application/pdf"

And then as a stream with:

curl -s -S -f file:///Users/jonharvey/Downloads/exiftool-pdf-issue/1KwRy3PXvWXNpsPCkwBM3.pdf | exiftool -fast -j -c %+.6f - > pdf-stream.json

However, the pdf-stream.json has:
  "FileType": "AI",
  "FileTypeExtension": "ai",
  "MIMEType": "application/vnd.adobe.illustrator"

I'd like the stream output to match the file output, so the FileType,FileTypeExtension,MIMEType matches the original file extension and is pdf.

Is this possible with any exiftool command options or am I missing a specific curl option?

Thanks,
Jon

StarGeek

cURL from a local file? Interesting.  Didn't realize that was something cURL could do.

Try removing the -fast option. I know that when I use that option with cURL with an online file, the whole file is not downloaded, so maybe some data is getting cut that would be needed to properly identify the file.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

jonharvey

Hi,

I tried removing the fast option for the stream input, i.e.:
curl -s -S -f file:///Users/jonharvey/Downloads/exiftool-pdf-issue/1KwRy3PXvWXNpsPCkwBM3.pdf | exiftool -j -c %+.6f - > pdf-stream2.json

But still get:

"FileType": "AI",
"FileTypeExtension": "ai",
"MIMEType": "application/vnd.adobe.illustrator"


StarGeek

Phil will have to comment on this.  I can't replicate the problem locally with the PDFs I tried.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

jonharvey

Thanks for the quick reply and taking a look StarGeek.

I have had a chance to look at the source code and it looks like there is logic in lib/Image/ExifTool/PDF.pm starting at line 364 which uses the file's FILE_EXT to either set the FileType to AI or PDF - here it is:

 364     Illustrator => {
 365         # assume this is an illustrator file if it contains this directory
 366         # and doesn't have a ".PDF" extension
 367         Condition => q{
 368             $self->OverrideFileType("AI") unless $$self{FILE_EXT} and $$self{FILE_EXT} eq 'PDF';
 369             return 1;
 370         },
 371         SubDirectory => { TagTable => 'Image::ExifTool::PDF::Illustrator' },
 372     },

So my guess is that as the input is a stream, the FILE_EXT is not set, so the FileType becomes AI.
When the input is a file, FILE_EXT is set to PDF, so the FileType remains PDF.
But like I said that's just my guess...

If I could somehow pass in the input stream's fileName as an option, so FILE_EXT is set then PDF.pm would probably behave the same as with a file input.


Thanks again,
Jon

Phil Harvey

Illustrator files are difficult to distinguish from regular PDF files so ExifTool uses the file extension as a clue.  Normally I try to avoid this if possible.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).