Incorrect duration in Matroska files

Started by bluebird, May 10, 2023, 09:02:06 AM

Previous topic - Next topic

bluebird

Hi Phil,

I found the duration of Matroska is incorrect in recent versions.

This is the command I use:
exiftool -s -g1 -j -n -q -u filename.ext

It's incorrect because:
1. I'm getting "hh:mm:ss.xxxx" instead of float, even though -n flag is passed.
2. Furthermore even if I manually convert the given "hh:mm:ss.xxxx" to float, it's still incorrect.

I tested this with version 12.62.
Version 12.50 is correct, does not behave as described above.

After some experimentations, the most likely cause is the introduction of tags.
Some Matroska authoring software add tags called "DURATION" (along with other tags such as NUMBER_OF_FRAMES, BPS, etc). These are added not only to the video tracks but also to other tracks such as audio and subtitles.
The presence of "DURATION" tags in tracks overrides duration field.
Further observation, when there are multiple tracks and all or some of them have "DURATION" tags, the value of "DURATION" tag of the last track becomes Matroska duration. Actually all tags of the last track, not just duration, are added to the "Matroska" section.

To avoid mixed-up with introduction of tags in recent version, you might want to consider:
1. Separating tags from fields. For example, if any of the track has a tag called "DOCTYPE" with value "Hello", then it creates "Doctype": "Hello" in addition to "DocType": "matroska". Unlike DocType which have different capitalizations, Duration does not have any uppercase in the middle, hence one spelling only.
2. Putting the values of tags within their own tracks instead of adding them to the "Matroska" section. This is because tracks often have the same tags with different values, for example ENCODER = "Lavc58.xx.100 libx264" in video track, ENCODER = "Lavc58.yy.100 aac" in commentary audio track, ENCODER = "Lavc58.zz.100" in German subtitle, etc. Matroska itself might have ENCODER = "Lavf58.nn.100" for example.

Edit: The comment "Separating tags from fields." should be re-worded to "Clearly distinguish tags from other fields."

StarGeek

Try including the -a (-duplicates) option and removing the -j (-json) option. Or change the -G1 into -G4.

I believe the problem here is that tags with the same name are being hidden from you.  While the -json option automatically implies the -a, it suppresses tags with the same name.  From the docs
     The -a option is implied when -json is used, but entries with identical JSON names are suppressed in the output. (-G4 may be used to ensure that all tags have unique JSON names.)
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

bluebird

Thanks for taking the time to reply.

Removing -j is not an option for me because I need to programatically process the output.
That is the purpose of using -j option.

Adding -a does not help.
Replacing -g1 with -G4 or -g4 does help.
However, the correct duration goes to "Copy1" section.
With duration appears in "Unknown", "Copy1", "Copy2", "Copy3", etc sections, how would one knows which tag is the one you are looking for if there is no clear way of telling them apart?
If we use -g1 or -G1, then we can clearly identify which tags belong to which track, or to the container ie. Matroska.
Therefore while -g4 and -G4 options do show duration, it does not really help for the reason mentioned above.

I do understand your comment about tags with the same name issue, which I also pointed out in my original post.

From the perspective of a user need to look for duration in Matroska (along with other information of individual tracks and container), recent versions no longer work, and I will have to keep using version 12.50

StarGeek

Quote from: bluebird on May 10, 2023, 06:46:56 PMHowever, the correct duration goes to "Copy1" section.
With duration appears in "Unknown", "Copy1", "Copy2", "Copy3", etc sections, how would one knows which tag is the one you are looking for if there is no clear way of telling them apart?

Another option might be to add 7, which is tag ids.  I'm pretty sure that tags that show up with ID-DURATION are the TAG:VALUE pair of tags.

Example
C:\`>exiftool -j -G1:7 -a -s -e --file:all -duration# 20220323_113515-edit.mkv
[{
  "SourceFile": "20220323_113515-edit.mkv",
  "Matroska:ID-1161:Duration": 1741.681,
  "Matroska:ID-DURATION:Duration": "00:29:01.610000000"
}]

C:\`>exiftool -j -G1:7 -a -s -e --file:all -duration# Y:\!temp\yy\4-vidH-41CFDCA01.54D327A8.684EA0.mxf.mkv
[{
  "SourceFile": "Y:/!temp/yy/4-vidH-41CFDCA01.54D327A8.684EA0.mxf.mkv",
  "Matroska:ID-1161:Duration": 16.123,
  "Matroska:ID-DURATION:Duration": "00:00:16.123000000"
}]

C:\`>exiftool -j -G1:7 -a -s -e --file:all -duration# "Y:\!NoOnlineBackup\Steam_Library\steamapps\common\Warhammer 40000 Gladius - Relics of War\Data\Cinematics\Clips\Intros\AdeptusMechanicus.mkv"
[{
  "SourceFile": "Y:/!NoOnlineBackup/Steam_Library/steamapps/common/Warhammer 40000 Gladius - Relics of War/Data/Cinematics/Clips/Intros/AdeptusMechanicus.mkv",
  "Matroska:ID-1161:Duration": 69.166,
  "Matroska:ID-DURATION:Duration": "00:01:09.166000000"
}]

Also, since you're using the -n (--printConv) option, the correct duration is going to be the raw time in seconds.  The other tags are strings and will have colons in them.

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).