How LR2 interpreters IPTC 1:90 tag?

Started by Archive, May 12, 2010, 08:54:39 AM

Previous topic - Next topic

Archive

[Originally posted by murat on 2009-08-12 07:14:55-07]

Hello Phil and ExifTool community,

Does anybody know how Lightroom 2 interpreters IPTC CodedCharacterSet (1:90)?

For example:

I have an image with only IPTC:City="some national text" and converted it to UTF8 and added IPTC:CodedCharacterSet=UTF8.

This means by my opinion that LR should convert the city from UTF8 to Unicode or default codepage and should displayed it correctly. But LR ignores CodedCharacterSet and displayed text as is in UTF8 (which shows the wrong result for national characters).

I get the only valid result if I leave IPTC:City as is without "DefaultToUTF8" conversion. But this is wrong by my opinion because the image has IPTC:CodedCharacterSet=UTF8

Thanks for any help. Please don't send me to LR's forum.

--

Murat

Archive

[Originally posted by exiftool on 2009-08-12 10:51:48-07]

I can't say what LR2 does, but Adobe helped write the current
recommendation for handling IPTC character encoding.  It would
be reasonable if LR2 used this technique:

Code:
In some non-XMP metadata containers, the encoding is stored in the container
along with the metadata. For example, the Exif UserComment tag has a prefix
that indicates the encoding. Another important example is the IPTC-IIM
metadata container that optionally supports the Coded Character Set 1:90
DataSet indicating the encoding of all the string properties in that
container. This document requires that compliant consumers MUST respect any
stored encoding indicators such as the above examples.

For other metadata string properties the encoding may be undefined by the
container specification. Or the encoding may be de-facto undefined because
in practice, a large number of files exist which are stored in a variety of
encodings. In these situations a compliant reader SHOULD use a reasonable
heuristic to infer the encoding used.

This document recommends that the following heuristic SHOULD be used to
infer the encoding of string properties when the encoding is undefined:

  - Scan the string to see if all bytes are in the range 0..127.
    - If so, assume the string is ASCII.
    - Otherwise, scan the string to see if it is consistent with valid UTF-8.
      - If so, assume the string is UTF-8.
      - Otherwise, assume a reasonable fallback encoding.

The choice of a reasonable fallback encoding is application and workflow
dependent. It can be determined by querying the locale information of the
host device or the user's preference.

- Phil

Archive

[Originally posted by exiftool on 2009-08-12 10:53:01-07]

I should have mentioned that the quote is from the most recent
MWG (Metadata Working Group) recommendation.

Archive

[Originally posted by murat on 2009-08-12 16:53:19-07]

Thanks Phil for the comprehensive answer.

It seems that LR2 ignores utf8 text. I've uploaded the sample (blank image from EXIFTool test files) with

IPTC 1:90 = UTF8

IPTC Headline = Some russian characters

IPTC City = Japanese text

You can see that City (Japanese text) displayed correctly with LR2, but Headline (Russian characters is not.

download the image - ~~~~~iptc_codecharspage_def.jpg

--

Murat

Archive

[Originally posted by exiftool on 2009-08-13 10:44:50-07]

It looks to me like the IPTC:Headline is not valid UTF8 in this sample.

Archive

[Originally posted by murat on 2009-08-14 08:04:07-07]

Phil, I've check it. Seems all is OK, except one thing. I guess this is a LR2 bug. Unfortunately it's not convenient to format text here so I send the feedback and two sample files directly to your email.

I appreciate you for your help.

-- Murat