Glitch - HTM files in UTF8/UNICODE encoding always return "File Format Error"

Started by Mac2, February 25, 2014, 03:44:10 PM

Previous topic - Next topic

Mac2

ExifTool handles HTML files in ANSI/ANSI and produces some basic data.
If the same file is encoded in UTF8 oder 16-bit UNICODE, ExifTool always returns "File Format Error".

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, Phil

thanks for looking into this.
I have prepared two sample HTML files and attached them.
The ASCII version is processed correctly. The same file saved in UTF8 produces the "File Format Error".

Phil Harvey

Ah, OK.  This file only has a UTF-8 BOM at the start.  In your first post you mentioned 16-bit Unicode, so I was thinking UTF-16, which I have never seen.

Adding support for a leading UTF-8 BOM is easy.  ExifTool 9.54 will allow this.

Thanks for pointing out this problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, Phil

great. I've attached the same file in (Windows default)16-Bit Unicode and in 16-Bit Big Endian Unicode. These are rarely used in the wild, though. But the Windows default format is often used in corporate environments which process and emit data in Windows standard 16-Bit Unicode format, without converting to UTF8.

Phil Harvey

Thanks.  I think I'll hold off implementing support for UTF-16 HTML files until there is actually a need (you don't have a need for this, do you?), because it would be a bit ugly to implement.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

I doubt that 16-Bit Unicode is in wide use, if at all.

When future ExifTool versions handle UTF8 it should cover most real-world files.