Bad character encoding for certain QuickTime metadata

Started by mjr, November 07, 2011, 05:52:34 PM

Previous topic - Next topic

mjr

Hi Phil, thanks for making such a useful tool!

I've encountered a bug in a processing script for a particular movie file. This file has a 'name' field in a user data ('udta') atom. The name is apparently in MacRoman encoding and has a "é" character in it, /x8e. I'm exporting in JSON format to import into a database, and this is being passed through as-is. Obviously that isn't valid UTF8, so my script dies.

By the QuickTime file spec, this field is not subject to internationalization since its tag doesn't start with /xa9. The problem seems to be that EXIFTool passes through the value without doing any charset testing at all. The safe thing to do would be to sanitize output to ensure compliance with whatever output charset is used (i.e. UTF8). It's not specified what charset is in this field, but it's a good guess that it won't be UTF8. The only alternative is to special case this field to MacRoman instead of a straight passthrough.

Here's the command line:
exiftool -j exifbug.mov > exifbug.mov.out

I have stripped the video data out of the movie and sent you a zipped version of the result.

Best Wishes,
Michael Rondinelli

Phil Harvey

Hi Michael,

Good suggestion, and thanks for the sample.

I will decode these strings assuming MacRoman by default, but will add an option to allow this encoding to be specified.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

This is now implemented in ExifTool 8.69 (just released).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).