First: Thank you so much for a great piece of software!
ExifTool writes UTF-8 encoded IPTC metadata correctly, but when extracting metadata, it displays it incorrectly as Latin-1, leading to corrupted characters (ÆØÃ... instead of ÆØÅ).
I used ExifTool Version 13.10 (latest via Homebrew) on macOS Sonoma. I tested the results in macOS Preview and Photoshop.
Steps to Reproduce:
I applied metadata using this command:
exiftool -overwrite_original -charset IPTC=UTF8 \
-IPTC:Keywords="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Subject="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Title="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Description="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Caption-Abstract="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:Caption-Abstract="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Country="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:Country-PrimaryLocationName="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Creator="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:By-line="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
sample-image.jpg
Extracting the metadata in ExifTool:
exiftool -G1 -s -IPTC:Keywords -XMP-dc:Subject -XMP-dc:Title -XMP-dc:Description -Caption-Abstract sample-image.jpg
Output:
[IPTC] Keywords : Sample text ÆØÃ...Ä�ŊŠŦ æøåÄ'ŋšŧ
[XMP-dc] Subject : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[XMP-dc] Title : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[XMP-dc] Description : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[IPTC] Caption-Abstract : Sample text ÆØÃ...Ä�ŊŠŦ æøåÄ'ŋšŧ
Photoshop & macOS Preview display the IPTC metadata correctly (as "Sample text ÆØÅĐŊŠŦ æøåđŋšŧ").
ExifTool misinterprets its own stored data.
Add
-CodedCharacterSet=UTF8From FAQ #10, How does ExifTool handle coded character sets? (https://exiftool.org/faq.html#Q10)
Quoteif CodedCharacterSet exists and has a value of "UTF8" (or "ESC % G") then string values are assumed to be stored as UTF‑8. Otherwise the internal IPTC encoding is assumed to be Windows Latin1 (cp1252)
StarGeek beat me to it, but here was the response that I composed:
This is explained in the IPTC section of FAQ #10 (https://exiftool.org/faq.html#iptc).
Instead of properly setting IPTC:CodedCharacterSet to UTF8, you are using -charset iptc=utf8 which writes UTF8 to IPTC but doesn't record the character set. If you do this, you will need to use -charset iptc=utf8 when reading back again unless the file happens to have CodedCharacterSet already set properly.
- Phil
I had to dig a bit and in the IPTC IIM 4.1 specs, it says that
QuoteIf 1:90 (CodedCharacterSet) is omitted, the default for records 2-6 and 8 is ISO 646 IRV (7 bits) or ISO 4873 DV (8 bits).
The first appears to be simple ASCII and would correlate to Windows cp 20127 us-ascii (according to ChatGPT). It is listed on Wikipedia as having been succeeded by a couple of other ISOs.
The second (again, according to ChatGPT) doesn't have a direct mapping to a code page, but is closely related to Latin1 and Latin2.
So technically, unless there is a newer version of the IPTC IIM spec, it would be incorrect to treat the IPTC IIM tags as UTC unless marked as such.
Edit: Found the 4.2 spec and there is no change.
Quote from: ebben on January 20, 2025, 08:18:47 PMPhotoshop & macOS Preview display the IPTC metadata correctly (as "Sample text ÆØÅĐŊŠŦ æøåđŋšŧ").
One question, how do you know that they are reading the IPTC IIM data and not the XMP data which is IPTC Photo Metadata Standard (https://www.iptc.org/std/photometadata/specification/IPTC-PhotoMetadata). Most apps appear to favor XMP data over IPTC IIM data unless the
IPTCDigest doesn't match.