A recent discussion regarding Exiv2 gave a specific image which was intended to have a UserComment field inserted by ExifPro.
The intended string was"äöüßÄÖÜ Exif-Usercomment" and from what I can tell, the filed was written as it ought to have been, according to my current understanding of the Exif metadata spec.
The reason I am asking about this image and issue here, is because I use the HTML output from Exiftool to view the hex data - and FWIW, kudos to Phil for providing that option - it is THE most used command line option for me.
My question is about the text representation of this field in the HTML output.
My understanding is that the code points in this files are to be UCS-2, but Exiftool (12.05, library 11.63 - I still have not resolved that issue, in case it matters) does not display the umlaut characters, even though the hex code correspond to the proper UCS-2 characters - according to http://www.columbia.edu/kermit/ucs2.html
The output shows the proper characters for the rest of the string, but not the special characters.
I am confused. You say that you are using the HTML (-h I presume) output, which is this for the file you sent:
<tr><td>User Comment</td><td>äöüßÄÖÜ Exif-Usercomment</td></tr>
As far as I can tell, this is correct. If it isn't displayed properly then it isn't this a problem with your browser, not ExifTool?
- Phil
:-\ arghhh another potential wrinkle.
But, actually I am referring to output from the -htmldump option
The -htmlDump option shows only raw unformatted data.
- Phil
I was looking for the characters ä..ß...Ü to also show in the right hand text field, just as Exif-Usercomment does, whether the text is considered UCS-2 or UTF-16.
Just as a matter of curiosity, which one of those 2 does Exiftool assume? or can it handle either of these & more?
UserComment is stored in binary format. If the first 8 bytes are "UNICODE\0", then ExifTool interprets the following bytes as UCS-2 text, as per the specification. But the "Value:" line of the -htmldump tooltip shows the binary data (including the leading "UNICODE\0") represented in ASCII form.
- Phil
Understood, but the attached screenshot may clarify what I am referring to
OK. So you are referring to the ASCII dump column, not the Value in the popup.
- Phil
Both, if that is possible :-)
My point is that both are showing the ASCII representation. There is no reasonable way to show special characters here.
- Phil
Since I have never tried anything as complicated as you have achieved with this htmldump - it has been very helpful in my understanding of the underlying data - I can't really argue on that point.
Still, I can't help but wonder if, given a particular code page for translation, it ought to be possible.
After all, these sort of characters are regularly displayed on all sorts of HTML pages.
Whether that also applies to the pop-up, I have no idea.
Technically, it is easy to display special characters in both displays. The thing is that both of these displays are both showing individual bytes, but these are multi-byte characters. Also, the same display is used for data that isn't characters at all. And characters from different character sets. It would be a real mess if I tried to display non-ASCII characters.
- Phil
Not trying to be a pest, but I don't understand the difference between the displayed data for the rest of the string ".E.x.i.f.-.U ......." and the other chars ä.........Ü. Each 2 byte UCS-2/UTF-16 char takes one spot in the display and the extra bytes a signified/replaced by '.'
One good reason: It would be very difficult to pick out ASCII text from mixed binary/text data. By setting all characters above 255 to ".", ASCII strings are easy to pick out. Also, there is the question of what character set to use, which is a real pickle. Think about it, and read FAQ 10 (https://exiftool.org/faq.html#Q10), and you'll get an idea of the depth of this rabbit hole.
- Phil
Very much appreciate your patience.
Evidently you have thought much more about the twists and turns involved, than I have :D