ExifTool Forum

ExifTool => The "exiftool" Application => Topic started by: ScannerBoy on September 08, 2020, 02:08:34 PM

Title: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 08, 2020, 02:08:34 PM
A recent discussion regarding Exiv2 gave a specific image which was intended to have a UserComment field inserted by ExifPro.
The intended string was"äöüßÄÖÜ Exif-Usercomment" and from what I can tell, the filed was written as it ought to have been, according to my current understanding of the Exif metadata spec.
The reason I am asking about this image and issue here, is because I use the HTML output from Exiftool to view the hex data - and FWIW,  kudos to Phil for providing that option - it is THE most used command line option for me.
My question is about the text representation of this field in the HTML output.

My understanding is that the code points in this files are to be UCS-2, but Exiftool (12.05, library 11.63 - I still have not resolved that issue, in case it matters) does not display the umlaut characters, even though the hex code correspond to the proper UCS-2 characters - according to http://www.columbia.edu/kermit/ucs2.html
The output shows the proper characters for the rest of the string, but not the special characters.
Title: Re: Exitool HTML dump & UserComment
Post by: Phil Harvey on September 08, 2020, 02:26:05 PM
I am confused.  You say that you are using the HTML (-h I presume) output, which is this for the file you sent:

<tr><td>User Comment</td><td>&auml;&ouml;&uuml;&szlig;&Auml;&Ouml;&Uuml; Exif-Usercomment</td></tr>

As far as I can tell, this is correct.  If it isn't displayed properly then it isn't this a problem with your browser, not ExifTool?

- Phil
Title: Re: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 08, 2020, 04:16:03 PM
 :-\ arghhh another potential wrinkle.
But, actually I am referring to output from the -htmldump option
Title: Re: Exitool HTML dump & UserComment
Post by: Phil Harvey on September 08, 2020, 09:08:50 PM
The -htmlDump option shows only raw unformatted data.

- Phil
Title: Re: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 09, 2020, 12:08:52 PM
I was looking for the characters ä..ß...Ü to also show in the right hand text field, just as Exif-Usercomment does, whether the text is considered UCS-2 or UTF-16.

Just as a matter of curiosity, which one of those 2 does Exiftool assume? or can it handle either of these & more?
Title: Re: Exitool HTML dump & UserComment
Post by: Phil Harvey on September 09, 2020, 02:58:06 PM
UserComment is stored in binary format.  If the first 8 bytes are "UNICODE\0", then ExifTool interprets the following bytes as UCS-2 text, as per the specification.  But the "Value:" line of the -htmldump tooltip shows the binary data (including the leading "UNICODE\0") represented in ASCII form.

- Phil
Title: Re: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 09, 2020, 03:59:47 PM
Understood, but the attached screenshot may clarify what I am referring to
Title: Re: Exitool HTML dump & UserComment
Post by: Phil Harvey on September 09, 2020, 09:38:24 PM
OK.  So you are referring to the ASCII dump column, not the Value in the popup.

- Phil
Title: Re: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 10, 2020, 01:01:49 PM
Both, if that is possible :-)
Title: Re: Exitool HTML dump & UserComment
Post by: Phil Harvey on September 10, 2020, 02:51:43 PM
My point is that both are showing the ASCII representation.  There is no reasonable way to show special characters here.

- Phil
Title: Re: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 10, 2020, 03:55:51 PM
Since I have never tried anything as complicated as you have achieved with this htmldump - it has been very helpful in my understanding of the underlying data - I can't really argue on that point.

Still, I can't help but wonder if, given a particular code page for translation, it ought to be possible.
After all, these sort of characters are regularly displayed on all sorts of HTML pages.
Whether that also applies to the pop-up, I have no idea.
Title: Re: Exitool HTML dump & UserComment
Post by: Phil Harvey on September 10, 2020, 09:05:23 PM
Technically, it is easy to display special characters in both displays.  The thing is that both of these displays are both showing individual bytes, but these are multi-byte characters.   Also, the same display is used for data that isn't characters at all.  And characters from different character sets.  It would be a real mess if I tried to display non-ASCII characters.

- Phil
Title: Re: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 11, 2020, 12:48:03 PM
Not trying to be a pest, but I don't understand the difference between the displayed data for the rest of the string ".E.x.i.f.-.U ......." and the other chars ä.........Ü. Each 2 byte UCS-2/UTF-16 char takes one spot in the display and the extra bytes a signified/replaced by '.'
Title: Re: Exitool HTML dump & UserComment
Post by: Phil Harvey on September 11, 2020, 12:53:44 PM
One good reason:  It would be very difficult to pick out ASCII text from mixed binary/text data.  By setting all characters above 255 to ".", ASCII strings are easy to pick out. Also, there is the question of what character set to use, which is a real pickle.  Think about it, and read FAQ 10 (https://exiftool.org/faq.html#Q10), and you'll get an idea of the depth of this rabbit hole.

- Phil
Title: Re: Exitool HTML dump & UserComment
Post by: ScannerBoy on September 11, 2020, 05:11:27 PM
Very much appreciate your patience.
Evidently you have thought much more about the twists and turns involved, than I have  :D