Hello.
Please tell me what encoding is used for makernotes tags in cr2-files?
I noticed that this is not UTF-8, but rather ANSI (Windows 10).
Can I read more about this somewhere?
If the encoding is known then ExifTool will convert it to UTF8 by default, or whatever you set using the -charset option.
If ExifTool is not doing this properly for some makernote tags, then email me a sample file and I'll take a look (philharvey66 at gmail.com)
- Phil
Thanks, Phil.
I do this command:
exiftool -charset filename="" -charset Cyrillic -charset exif=UTF8 -makernotes:ownername="© Автор Author" -exif:ownername="© Автор Author" -xmp-exifex:ownername="© Автор Author" -ext cr2 .
Two bytes per symbol
(https://exiftool.org/forum/index.php?action=dlattach;topic=10984.0;attach=3558)
One byte per symbol
(https://exiftool.org/forum/index.php?action=dlattach;topic=10984.0;attach=3559)
Two bytes per symbol
(https://exiftool.org/forum/index.php?action=dlattach;topic=10984.0;attach=3560)
I'll send you a photo now.
Is this wrong? Do you know what character set Canon uses? If the character set is not known, then no translation is done, and the Cyrillic character is stored directly.
- Phil
QuoteDo you know what character set Canon uses?
I don't understand what this has to do with Canon. This was not recorded by the camera, this was recorded by me using ExifTool. Or I don't understand something... I expected 2 bytes everywhere.
It is Canon that defines the string encoding in their maker notes.
- Phil
Quote from: Phil Harvey on March 26, 2020, 09:57:39 AM
It is Canon that defines the string encoding in their maker notes.
Are you referring to some specification where ExifTool writes one byte per character (ANSI) to these tags?
My system sends for ExifTool cp1251, so I use
Quote-charset Cyrillic
and I expected ExifTool to convert everything to UTF-8.
It was unexpected for me to see only A9 (cp1251) for the copyright mark, instead of C2A9 (UTF-8).
This is really simple:
ExifTool can't convert something to UTF-8 if it doesn't know the encoding that was used in the file.
- Phil
But there was nothing in the file, it was ExiTool that wrote the line © Автор Author there.
But in some tags it wrote this string in UTF-8, and in -makernotes:ownername it wrote, it seems, in cp1251. Same command, but different result for different tags... I can't figure out why...
Quote from: Andrei Korzhyts on March 26, 2020, 10:47:21 AM
But there was nothing in the file, it was ExiTool that wrote the line © Автор Author there.
But when reading the file back again, ExifTool doesn't know who wrote the metadata.
(just think this through and you should realize why it is Canon that defines the encoding)
- Phil
I think we're talking about different things. Some kind of misunderstanding.
I'm looking at the result in a hex editor and I don't understand why some tags have 2 bytes per character, and others (makernotes) have one. The HEX editor shows the result in cp1251.
Okay. What do you mean when you say that Canon define encoding? How does it look technically?
Yes, we certainly don't understand each other.
You have read FAQ 10, so you know that different types of metadata use different encodings. So when you are looking at the raw data in a file, the copyright character will be encoded as 2 bytes for some metadata, and 1 for others, and beyond this the bytes may be different for different types of metadata.
If the encoding is not known, as with this Canon tag, ExifTool just passes along whatever it is given without recoding it. In this case it is up to the user to write whatever encoding they feel is right. In your case you are using Cyrillic (cp1251), so that is what is written.
- Phil