ExifTool Misinterprets UTF-8 IPTC Metadata as Latin-1 When Extracting Data

Started by ebben, January 20, 2025, 08:18:47 PM

Previous topic - Next topic

ebben

First: Thank you so much for a great piece of software!

ExifTool writes UTF-8 encoded IPTC metadata correctly, but when extracting metadata, it displays it incorrectly as Latin-1, leading to corrupted characters (ÆØÃ... instead of ÆØÅ).

I used ExifTool Version 13.10 (latest via Homebrew) on macOS Sonoma. I tested the results in macOS Preview and Photoshop.

Steps to Reproduce:

I applied metadata using this command:

exiftool -overwrite_original -charset IPTC=UTF8 \
-IPTC:Keywords="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Subject="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Title="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Description="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Caption-Abstract="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:Caption-Abstract="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Country="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:Country-PrimaryLocationName="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Creator="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:By-line="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
sample-image.jpg

Extracting the metadata in ExifTool:

exiftool -G1 -s -IPTC:Keywords -XMP-dc:Subject -XMP-dc:Title -XMP-dc:Description -Caption-Abstract sample-image.jpg

Output:
[IPTC]          Keywords                        : Sample text ÆØÃ...Ä�ŊŠŦ æøåÄ'ŋšŧ
[XMP-dc]        Subject                         : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[XMP-dc]        Title                           : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[XMP-dc]        Description                     : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[IPTC]          Caption-Abstract                : Sample text ÆØÃ...Ä�ŊŠŦ æøåÄ'ŋšŧ
Photoshop & macOS Preview display the IPTC metadata correctly (as "Sample text ÆØÅĐŊŠŦ æøåđŋšŧ").

ExifTool misinterprets its own stored data.

StarGeek

Add
-CodedCharacterSet=UTF8

From FAQ #10, How does ExifTool handle coded character sets?
Quoteif CodedCharacterSet exists and has a value of "UTF8" (or "ESC % G") then string values are assumed to be stored as UTF‑8. Otherwise the internal IPTC encoding is assumed to be Windows Latin1 (cp1252)
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

StarGeek beat me to it, but here was the response that I composed:

This is explained in the IPTC section of FAQ #10.

Instead of properly setting IPTC:CodedCharacterSet to UTF8, you are using -charset iptc=utf8 which writes UTF8 to IPTC but doesn't record the character set.  If you do this, you will need to use -charset iptc=utf8 when reading back again unless the file happens to have CodedCharacterSet already set properly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

I had to dig a bit and in the IPTC IIM 4.1 specs, it says that

QuoteIf 1:90 (CodedCharacterSet) is omitted, the default for records 2-6 and 8 is ISO 646 IRV (7 bits) or ISO 4873 DV (8 bits).

The first appears to be simple ASCII and would correlate to Windows cp 20127 us-ascii (according to ChatGPT). It is listed on Wikipedia as having been succeeded by a couple of other ISOs.

The second (again, according to ChatGPT) doesn't have a direct mapping to a code page, but is closely related to Latin1 and Latin2.

So technically, unless there is a newer version of the IPTC IIM spec, it would be incorrect to treat the IPTC IIM tags as UTC unless marked as such. Edit: Found the 4.2 spec and there is no change.

Quote from: ebben on January 20, 2025, 08:18:47 PMPhotoshop & macOS Preview display the IPTC metadata correctly (as "Sample text ÆØÅĐŊŠŦ æøåđŋšŧ").

One question, how do you know that they are reading the IPTC IIM data and not the XMP data which is IPTC Photo Metadata Standard. Most apps appear to favor XMP data over IPTC IIM data unless the IPTCDigest doesn't match.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype