ExifTool handles HTML files in ANSI/ANSI and produces some basic data.
If the same file is encoded in UTF8 oder 16-bit UNICODE, ExifTool always returns "File Format Error".
I have never seen this. Can you post a sample?
- Phil
Hi, Phil
thanks for looking into this.
I have prepared two sample HTML files and attached them.
The ASCII version is processed correctly. The same file saved in UTF8 produces the "File Format Error".
Ah, OK. This file only has a UTF-8 BOM at the start. In your first post you mentioned 16-bit Unicode, so I was thinking UTF-16, which I have never seen.
Adding support for a leading UTF-8 BOM is easy. ExifTool 9.54 will allow this.
Thanks for pointing out this problem.
- Phil
Hi, Phil
great. I've attached the same file in (Windows default)16-Bit Unicode and in 16-Bit Big Endian Unicode. These are rarely used in the wild, though. But the Windows default format is often used in corporate environments which process and emit data in Windows standard 16-Bit Unicode format, without converting to UTF8.
Thanks. I think I'll hold off implementing support for UTF-16 HTML files until there is actually a need (you don't have a need for this, do you?), because it would be a bit ugly to implement.
- Phil
I doubt that 16-Bit Unicode is in wide use, if at all.
When future ExifTool versions handle UTF8 it should cover most real-world files.