Hi Phil, thanks for making such a useful tool!
I've encountered a bug in a processing script for a particular movie file. This file has a 'name' field in a user data ('udta') atom. The name is apparently in MacRoman encoding and has a "é" character in it, /x8e. I'm exporting in JSON format to import into a database, and this is being passed through as-is. Obviously that isn't valid UTF8, so my script dies.
By the QuickTime file spec, this field is not subject to internationalization since its tag doesn't start with /xa9. The problem seems to be that EXIFTool passes through the value without doing any charset testing at all. The safe thing to do would be to sanitize output to ensure compliance with whatever output charset is used (i.e. UTF8). It's not specified what charset is in this field, but it's a good guess that it won't be UTF8. The only alternative is to special case this field to MacRoman instead of a straight passthrough.
Here's the command line:
exiftool -j exifbug.mov > exifbug.mov.out
I have stripped the video data out of the movie and sent you a zipped version of the result.
Best Wishes,
Michael Rondinelli
Hi Michael,
Good suggestion, and thanks for the sample.
I will decode these strings assuming MacRoman by default, but will add an option to allow this encoding to be specified.
- Phil
This is now implemented in ExifTool 8.69 (just released).
- Phil