Hello Phil,
The original ticket covered "printable" characters showing before the %PDF header:
https://exiftool.org/forum/index.php/topic,9086.0.html
However, some non English PDF documents appear to have non printable characters (0xca, 0xff) before the %PDF marker, thus the fix introduced in the above ticket cannot identify these as PDFs. Obviously, these documents are not compliant with the standard, but apparently some tools still produce them. Will it be possible to modify the regex to include a .*%PDF
instead of \s*%PDF
in both PDF.pm and ExifTool.pm?
Regards,
Mike
Hi Mike,
As I wrote:
Quote from: Phil Harvey on April 10, 2018, 07:46:35 AM
I allow up to 1024 random bytes before the PDF header (as apparently Adobe Reader does), this would substantially increase the possibility of mis-identifying some other file type as PDF. So I don't like this idea.
I would prefer not to do this.
- Phil