Exitool HTML dump & UserComment

Started by ScannerBoy, September 08, 2020, 02:08:34 PM

Previous topic - Next topic

ScannerBoy

A recent discussion regarding Exiv2 gave a specific image which was intended to have a UserComment field inserted by ExifPro.
The intended string was"äöüßÄÖÜ Exif-Usercomment" and from what I can tell, the filed was written as it ought to have been, according to my current understanding of the Exif metadata spec.
The reason I am asking about this image and issue here, is because I use the HTML output from Exiftool to view the hex data - and FWIW,  kudos to Phil for providing that option - it is THE most used command line option for me.
My question is about the text representation of this field in the HTML output.

My understanding is that the code points in this files are to be UCS-2, but Exiftool (12.05, library 11.63 - I still have not resolved that issue, in case it matters) does not display the umlaut characters, even though the hex code correspond to the proper UCS-2 characters - according to http://www.columbia.edu/kermit/ucs2.html
The output shows the proper characters for the rest of the string, but not the special characters.

Phil Harvey

I am confused.  You say that you are using the HTML (-h I presume) output, which is this for the file you sent:

<tr><td>User Comment</td><td>&auml;&ouml;&uuml;&szlig;&Auml;&Ouml;&Uuml; Exif-Usercomment</td></tr>

As far as I can tell, this is correct.  If it isn't displayed properly then it isn't this a problem with your browser, not ExifTool?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

 :-\ arghhh another potential wrinkle.
But, actually I am referring to output from the -htmldump option

Phil Harvey

The -htmlDump option shows only raw unformatted data.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

I was looking for the characters ä..ß...Ü to also show in the right hand text field, just as Exif-Usercomment does, whether the text is considered UCS-2 or UTF-16.

Just as a matter of curiosity, which one of those 2 does Exiftool assume? or can it handle either of these & more?

Phil Harvey

UserComment is stored in binary format.  If the first 8 bytes are "UNICODE\0", then ExifTool interprets the following bytes as UCS-2 text, as per the specification.  But the "Value:" line of the -htmldump tooltip shows the binary data (including the leading "UNICODE\0") represented in ASCII form.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

Understood, but the attached screenshot may clarify what I am referring to

Phil Harvey

OK.  So you are referring to the ASCII dump column, not the Value in the popup.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy


Phil Harvey

My point is that both are showing the ASCII representation.  There is no reasonable way to show special characters here.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

Since I have never tried anything as complicated as you have achieved with this htmldump - it has been very helpful in my understanding of the underlying data - I can't really argue on that point.

Still, I can't help but wonder if, given a particular code page for translation, it ought to be possible.
After all, these sort of characters are regularly displayed on all sorts of HTML pages.
Whether that also applies to the pop-up, I have no idea.

Phil Harvey

Technically, it is easy to display special characters in both displays.  The thing is that both of these displays are both showing individual bytes, but these are multi-byte characters.   Also, the same display is used for data that isn't characters at all.  And characters from different character sets.  It would be a real mess if I tried to display non-ASCII characters.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

Not trying to be a pest, but I don't understand the difference between the displayed data for the rest of the string ".E.x.i.f.-.U ......." and the other chars ä.........Ü. Each 2 byte UCS-2/UTF-16 char takes one spot in the display and the extra bytes a signified/replaced by '.'

Phil Harvey

One good reason:  It would be very difficult to pick out ASCII text from mixed binary/text data.  By setting all characters above 255 to ".", ASCII strings are easy to pick out. Also, there is the question of what character set to use, which is a real pickle.  Think about it, and read FAQ 10, and you'll get an idea of the depth of this rabbit hole.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

Very much appreciate your patience.
Evidently you have thought much more about the twists and turns involved, than I have  :D