ExifTool Misinterprets UTF-8 IPTC Metadata as Latin-1 When Extracting Data

Started by ebben, January 20, 2025, 08:18:47 PM

Previous topic - Next topic

ebben

First: Thank you so much for a great piece of software!

ExifTool writes UTF-8 encoded IPTC metadata correctly, but when extracting metadata, it displays it incorrectly as Latin-1, leading to corrupted characters (ÆØÃ... instead of ÆØÅ).

I used ExifTool Version 13.10 (latest via Homebrew) on macOS Sonoma. I tested the results in macOS Preview and Photoshop.

Steps to Reproduce:

I applied metadata using this command:

exiftool -overwrite_original -charset IPTC=UTF8 \
-IPTC:Keywords="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Subject="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Title="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-XMP-dc:Description="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Caption-Abstract="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:Caption-Abstract="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Country="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:Country-PrimaryLocationName="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-Creator="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
-IPTC:By-line="Sample text ÆØÅĐŊŠŦ æøåđŋšŧ" \
sample-image.jpg

Extracting the metadata in ExifTool:

exiftool -G1 -s -IPTC:Keywords -XMP-dc:Subject -XMP-dc:Title -XMP-dc:Description -Caption-Abstract sample-image.jpg

Output:
[IPTC]          Keywords                        : Sample text ÆØÃ...Ä�ŊŠŦ æøåÄ'ŋšŧ
[XMP-dc]        Subject                         : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[XMP-dc]        Title                           : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[XMP-dc]        Description                     : Sample text ÆØÅĐŊŠŦ æøåđŋšŧ
[IPTC]          Caption-Abstract                : Sample text ÆØÃ...Ä�ŊŠŦ æøåÄ'ŋšŧ
Photoshop & macOS Preview display the IPTC metadata correctly (as "Sample text ÆØÅĐŊŠŦ æøåđŋšŧ").

ExifTool misinterprets its own stored data.

StarGeek

Add
-CodedCharacterSet=UTF8

From FAQ #10, How does ExifTool handle coded character sets?
Quoteif CodedCharacterSet exists and has a value of "UTF8" (or "ESC % G") then string values are assumed to be stored as UTF‑8. Otherwise the internal IPTC encoding is assumed to be Windows Latin1 (cp1252)
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

StarGeek beat me to it, but here was the response that I composed:

This is explained in the IPTC section of FAQ #10.

Instead of properly setting IPTC:CodedCharacterSet to UTF8, you are using -charset iptc=utf8 which writes UTF8 to IPTC but doesn't record the character set.  If you do this, you will need to use -charset iptc=utf8 when reading back again unless the file happens to have CodedCharacterSet already set properly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).