Request 0x0000 instead of 0x55c4 as MP4 data atom default language flag

Started by dae65, August 04, 2020, 09:33:28 AM

Previous topic - Next topic

dae65

Unless a language code is expressly specified using the tag name-language-country syntax, Image::ExifTool writes MP4 data atoms under the moov.udta.meta.ilst.* hierarchy (aka the ItemList group) with a default 0x55c4 flag. This flag is the ISO 639-2 code und, and means undetermined language.

This was introduced in ExifTool 11.40 (see WriteQuickTime.pl) in order to fix "QuickTime writing to preserve existing same-named default-language tags in other groups when writing a default language tag" (see Changes). This is a feature, but it has also introduced what looks like a bug.

Firstly, files end up cluttered with 0x55c4 everywhere, even needlessly in atoms such tmpo (music tempo, aka BeatsPerMinute), where language codes are meaningless:

$ exiftool -ItemList:BeatsPerMinute=100 file.mp4
    1 image files updated
$ hexdump -s 0x86e -C file.mp4
0000086e  00 00 00 1a 74 6d 70 6f  00 00 00 12 64 61 74 61  |....tmpo....data|
0000087e  00 00 00 16 00 00 55 c4  00 64                    |......U..d|


Secondly, other mp4 editors such as Apple iTunes (12.06) and AtomicParsley (0.9.6) write null bytes as the default language code for data atoms, rightly so in my opinion (see below), whereas 0x55c4 are left behind as an ExifTool's fingerprint. Compare ©nam atoms (aka Title) from files edited with iTunes, AtomicParsley, and ExifTool respectively:


$ hexdump -s 0x12bf9 -n 29 -C itunes.m4a
00012bf9  00 00 00 1d a9 6e 61 6d  00 00 00 15 64 61 74 61  |.....nam....data|
00012c09  00 00 00 01 00 00 00 00  54 49 54 4c 45           |........TITLE|
00012c16


$ AtomicParsley atomicparsley.m4a --title TITLE --overWrite
Updating metadata...
$ hexdump -s 0x133c9 -n 29 -C atomicparsley.m4a
000133c9  00 00 00 1d a9 6e 61 6d  00 00 00 15 64 61 74 61  |.....nam....data|
000133d9  00 00 00 01 00 00 00 00  54 49 54 4c 45           |........TITLE|
000133e6


$ exiftool -ItemList:Title=TITLE exiftool.m4a
    1 image files updated
$ hexdump -s 0x12bf9 -n 29 -C exiftool.m4a
00012bf9  00 00 00 1d a9 6e 61 6d  00 00 00 15 64 61 74 61  |.....nam....data|
00012c09  00 00 00 01 00 00 55 c4  54 49 54 4c 45           |......U.TITLE|


Thirdly, here are a few quotes saying we should refrain from using 'und':

From Apple's QuickTime File Format Specification:
QuoteSoftware applications that read metadata may be customized for a specific set of countries or languages. If a metadata writer does not want to limit a metadata item to a specific set of countries, it should use the reserved value ZZ from ISO 3166 as its country code. Similarly if the metadata writer does not want to limit the user's language (this is not recommended) it uses the value 'und' (undetermined) from the ISO 639-2/T specification.

From RFC 3066:
Quote
   5. You SHOULD NOT use the UND (Undetermined) code unless the protocol
      in use forces you to give a value for the language tag, even if
      the language is unknown.  Omitting the tag is preferred.

From RFC 4646:
Quote
   4.  The 'und' (Undetermined) primary language subtag SHOULD NOT be
       used to label content, even if the language is unknown.  Omitting
       the language tag altogether is preferred to using a tag with a
       primary language subtag of 'und'.  The 'und' subtag MAY be useful
       for protocols that require a language tag to be provided.  The
       'und' subtag MAY also be useful when matching language tags in
       certain situations.

It's unclear to me whether this is a bug or not. If you agree it is, I'd like to request ExifTool to revert to the pre-11.40 behavior of writing the language flag as 0x0000 to data atoms by default, and another way of preserving same-named default-language tags in other groups but without writing 0x55c4 across the board.

Thank you. :)

Phil Harvey

Thanks for this report.  I'm really impressed with the research you do and the level of your understanding for these reports.

I should be able to add a simple patch to write 0x0000 instead of 0x55c4 for the 'und' language code.  I've got a test version that does this and so far it seems to work fine.  I don't think this should cause problems wrt the bug fixed by 11.40.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).


yandazhuang

Phil Harvey is an eternal God in my heart. Although I am a Chinese and can't read English, I will learn from you and worship you. The greatest God in my heart --Phil Harvey!

Phil Harvey

Quote from: yandazhuang on August 18, 2020, 03:05:53 AM
Can all exIF parameters be changed in MP4 video?higher-ups

MP4 videos do not generally support EXIF metadata.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).