Decoding Insta360 records doubts

Cheloute · February 12, 2023, 06:04:13 PM

Hi,

Yesterday I saw a thread here of 2019 when Phil and some members of this community were trying to decode Insta360 samples (0x300, 0x200, 0x101, ...) starting with a OneX video sample (which unfortunately is not available anymore to help to understand how to decode these data).

I have an EVO camera and am able to parse the insv with exiftool correctly, but I can't understand how to localize and decode (I mean manually) those records. Does anyone could explain it to me?

My first question, the location. I found that gyro data are included in the last atom. I suppose that 0x300, 0x200, ... refers to a "03 00", "02 00", hex values... and doing the parallel with CAMM specs, 0x300 is for acceleremoter 0x200 is for gyro, ... (but maybe not?). Well, my question is: Are these samples location defined anywhere? For instance, with CAMM, there's a dedicated track with some sample tables to point them into mdat. In this case, data are not included in mdat and that looks like a propietary format, but does anyone found how there data are indexed?

Next, to process CAMM data, exiftool is very handy as the -v3 param allows me to see what offset and hexa value corresponds with the decoded value. In this case, including with -v5 I "only" can see "where" the first offset of where all the data are merged, and the decoded values. But I can't figure out the pattern of each kind of records (size and description, to allow me to recalculate them).

If someone could take a moment to explain me how to be able to decode this last atom, it would be great..

Thanks!

Phil Harvey · February 12, 2023, 09:36:34 PM

A lot of progress has been made since 2019. I think you will find that all of this metadata should be decoded with the current version of ExifTool (12.56). To learn how this is decoded, look at the camm tables and the ProcessCAMM function in lib/Image/ExifTool/QuickTimeStream.pl.

- Phil

Cheloute · February 13, 2023, 07:46:25 AM

Hi, Thanks Phil for pointing me to the source code describing how exiftool is doing the job.

The function I was looking for is ProcessInsta360 in the same file.

I'm not familiar with Perl, but as far as I understand the code, it reads the last atom from end to beginning, right?
Doing like this I understand better where to find the 0x101 tag and size and block representing the camera infos, that's great! I'm going to keep on decoding some other samples and try to identify better how this is working.

Thanks for your help and uh.. congratulations to find this, that's very interesting!

blue-j · February 13, 2023, 02:56:08 PM

Cheloute, might you share some sample files for us to examine?

- J

Phil Harvey · February 13, 2023, 02:59:08 PM

Quote from: Cheloute on February 13, 2023, 07:46:25 AMit reads the last atom from end to beginning, right?

Right.

-Phil

Cheloute · February 27, 2023, 07:37:24 AM

Hi,

I have another question about this. What is a 0x200 sample actually representing?
I can't understand this piece of code :

Code Select

elsif ($id == 0x200) {
  $et->FoundTag(PreviewImage => $buff);
}

I saw in the code there's nothing on how to decode 0x500 samples. Is it intentional or nobody found a way to decode them?

Thanks!

StarGeek · February 27, 2023, 10:18:47 AM

Quote from: Cheloute on February 27, 2023, 07:37:24 AMI saw in the code there's nothing on how to decode 0x500 samples. Is it intentional or nobody found a way to decode them?

Most camera company do not share the internals of their MakerNotes, sometimes deliberately obfuscating the data. Additionally, some of the in the MakersNotes my not have any useful purpose or, I'm guessing here, may be a placeholder for a potential future change.

If you look through the version history and older version history, you'll see references to various sources where something has been decoded. For example, you'll see a lot of references to LibRaw, exiv2, and Greybeard (who is a poster on these forums).

So if exiftool doesn't decode it, it's because either someone hasn't had the time to decode it, has been unable to decode it, or it's purpose hasn't been found. Nothing is being intentional hid.

StarGeek · February 27, 2023, 10:21:30 AM

As for the code, that looks to me that finds the PreviewImage (see the Composite tags page), which is an smaller embedded image, usually a jpg, which programs can extract to quickly display thumbnails without having to read and decode the whole image.

But Phil will have to clarify that.

Phil Harvey · February 27, 2023, 01:26:07 PM

Quote from: StarGeek on February 27, 2023, 10:21:30 AMthat looks to me that finds the PreviewImage (see the Composite tags page), which is an smaller embedded image, usually a jpg

Correct.

- Phil

Cheloute · March 21, 2023, 04:51:13 AM

Hi,

Sorry, i missed these last messages.
Firstly, thanks for your help, that's clear now.

And uh... You really did a good job with these metadata. It helps a lot!
Thanks

Cheloute · March 28, 2023, 04:26:08 AM

Code Select

if ($len % 20 and not $len % 56) {
                    $dlen = 56;
                } elsif ($len % 56 and not $len % 20) {
                    $dlen = 20;
                } else {
                    if ($raf->Read($buff, 20) == 20) {
                        if (substr($buff, 16, 3) eq "\0\0\0") {
                            $dlen = 56;
                        } else {
                            $dlen = 20;
                        }
                    }
                    $raf->Seek($epos, 2) or last;
                }

I may be wrong (as I said, I don't know Perl), but I'm understanding that if the content size of a x300 sample is a multiple of 20, then each record inside (timecode, accelerometer[3], angularVelocity[3]) will be defined by bunch of 56 bytes. And if the content size is a multiple of 56, these records will be defined by bunch of 20 bytes (else, ...). Am I right? Because I find surprising that if the size of the content is multiple of 20 (for instance), records are 56 bytes long. That means some bytes won't be used. Same thing if size % 56 == 0 (record length = 20). But I can see elsewhere how these bytes are used. Am I missing something or did I understand the if/else if conditions backwards?

By the way, I see exiftool report the lenses offset as "param" (for instance :u2_1480.79_1518.71_1500.17_0.187049_0.571009_-179.055_1482.6_4553.57_1517.15_0.252409_0.74121_-0.528465_6080_3040_3105 in case of something shooted by One R with 360 mod). I saw in al older post (https://exiftool.org/forum/index.php?topic=9884.90) you didn't know what that means. If that's still the case, I can try to explain:

Based on my previous sample :

u: identify the mode the footage is shot. In this case, u means flat 360, but that's only valid for the One R. Evo for instance is using "m". Still for EVO, "p" means VR180.
2: may be 1 or 2, it specifies how many lenses have been used to shoot the footage.
1480.79 1518.71 1500.17 0.187049 0.571009 -179.055: It represents the offset used by the "main" lens (6 axis). I don't have identified all of them (I didn't need it, yet) nor the units. The first one allows to zoom in or out. The second one seems to adjust the yaw, but I'm not totally sure. The third one seems to be the pitch, but not totally sure neither. The fourth allows to up/down the POV. The fifth seems to move the POV from right to left. The last one is in degree, and allow to adjust the roll. Well, I'm sure about the first, the fourth and the last value only
1482.6 4553.57 1517.15 0.252409 0.74121 -0.528465: Same thing as before, but for the second lens
6080 3040: Total resolution of the raw footage shot by the camera. After stitching, the max resolution would be 5.7K as a part is cropped by the process
3105: No clue. It clearly affect the footage to alter this value, but I can't understand what it's really doing. It seems to depend on the previous values (lenses offsets)

Thanks![/list]

Phil Harvey · March 28, 2023, 12:36:45 PM

Quote from: Cheloute on March 28, 2023, 04:26:08 AMI may be wrong (as I said, I don't know Perl),

The coding is exactly the same in C except and=&& and not=!.

QuoteI'm understanding that if the content size of a x300 sample is a multiple of 20, then each record inside (timecode, accelerometer[3], angularVelocity[3]) will be defined by bunch of 56 bytes.

No. If it is a multiple of 20 and not a multiple of 56 then the records are 20 bytes long. (the % operator returns non-zero if the value is not an even multiple). If it is a multiple of 56 and not 20 then the records are 20 bytes long. Otherwise I read 20 bytes an use the content to determine the record type.

QuoteBy the way, I see exiftool report the lenses offset as "param"...

Thanks. I've added a note in the documentation.

- Phil

News:

Decoding Insta360 records doubts