Text file identified as video/mpeg

Started by crobbers, March 21, 2016, 05:54:22 PM

Previous topic - Next topic

crobbers

Kind of a strange one here. I have a some text file "example.txt" and if it contains these characters on the first line, the file is identified as MIME type video/mpeg, file type M2T in ExifTool 10.00.

Gxxxxxxx

The first "G" seems important, if I change it to another character it goes back to "Unknown file type". Any of the "x" characters seem to be interchangeable to another character, but trying to remove any will cause it to no longer identify as video/mpeg.

So essentially, a "G" followed by at least 7 consecutive other characters.

Hayo Baan

You stumbled on a file that has the "magic number" of a video file. Magic numbers are the first couple of bytes in a file and are used to identify its type. This marker is used to not have to read the whole file to determine the type. As you can see this sometimes goes wrong. For more info on Magic Numbers, have a look at this website.
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

The M2TS file type has very weak identification, and false positives are a distinct possibility.  There isn't much that ExifTool can look for other than a sync byte as the 1st or 5th byte in the file.  Unfortunately, the sync byte is the ASCII letter "G".

I'll see if I can do any additional checks here to improve the reliability.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I've looked into this in a bit more detail.

Some other software uses these constraints to identify M2TS files:

1. byte[4 + N * 192] == 0x47 "G" (tested for N=0,1,2,3)
2. byte[5] == 0x40
3. byte[6] == 0x00
4. byte[7] & 0x0f == 0x00

Unfortunately, I have some valid M2TS files that fail ALL of these tests.  These are M2TS files that don't have an embedded timecode.  For these files, the following logic is the best I can do:

1. byte[N * 188] == 0x47 "G"

So other than testing the sync byte (0x47 "G") of a few more packets, I don't see any way to improve the recognition of these files.  ExifTool was previously testing only the first packet, but I will change this to test the first 4.

- Phil

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

crobbers