Exiftool output in Mac Terminal displaying special characters incorrectly

Started by Hayo Baan, October 03, 2012, 06:02:16 AM

Previous topic - Next topic

Hayo Baan

Hi Phil,

When running exiftool on the command-line in a Mac Terminal window, special characters are displayed incorrectly.

My Mac terminal is set up to use UTF-8 (LANG=en_GB.UTF-8) and exiftool is set to output UTF-8 as well, special characters still display incorrectly though. However, if I tell exiftool to translate the output to latin encoding, everything is suddenly fine :o

Other programs, including my own perl scripts using the exiftool library, that use UTF-8 as character encoding work just fine though, so I don't understand why exiftool on the command-line doesn't...

Latin: Correct output (accented í):
~$ exiftool -charset Latin -Caption-Abstract test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


UTF-8: Incorrect output (garbled í instead of accented í):
~$ exiftool -charset UTF8 -Caption-Abstract test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


Running the output through od -txC -c reveals that the former uses characters "c3  ad" (hex) for character í, while the latter uses "c3  83  c2 ad" (hex) instead.
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

The problem is that the IPTC is not encoded properly.  Read FAQ 10 for more information.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Quote from: Phil Harvey on October 03, 2012, 07:01:47 AM
The problem is that the IPTC is not encoded properly.  Read FAQ 10 for more information.

Hi Phil, thanks for your prompt reply. However, this is not the problem (IPTC is encoded correctly and matches the encoding I configured for it). The same problem exists with XMP (UTF8 encoded) fields:
Latin: Correct output (accented í):
~$ exiftool -charset Latin -Caption-Abstract -Description test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


UTF-8: Incorrect output (garbled í instead of accented í):
~$ exiftool -charset UTF8 -Caption-Abstract test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


Furthermore, as I mentioned, when I use the perl library in one of my scripts to output the caption, it displays the characters correctly.
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

OK, then I really don't understand because I use a Mac myself and have never seen this problem.

Could you post a sample image? (or email to philharvey66 at gmail.com) What version of OS X are you using? 

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Hi Phil,

Attached sample produces the following output for me:
~$ exiftool -charset UTF8 -charset EXIF=UTF8 -Caption-Abstract -Description -ImageDescription -Warning -CodedCharacterSet test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Image Description               : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Coded Character Set             : UTF8

~$ exiftool -charset Latin -charset EXIF=UTF8 -Caption-Abstract -Description -ImageDescription -Warning -CodedCharacterSet test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Image Description               : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Coded Character Set             : UTF8

Hayo Baan – Photography
Web: www.hayobaan.nl

Hayo Baan

Hayo Baan – Photography
Web: www.hayobaan.nl

Hayo Baan

Did some more testing and I think the problem may be related to the LANG variable. When I specify e.g., LANG=en_GB.ISO8859-1 or LANG=en_GB.ISO8859-15, exitool output is correct when specifying -charset UTF8, but incorrect when specifying -charset Latin. This should actually be the expected behaviour as my terminal has Unicode UTF-8 specified as encoding.

When I change the encoding to e.g., Windows Latin 1 and try again in a new window, output seems as expected (e.g., correct with -charset Latin, incorrect with -charset UTF8).

Hope this helps you find why in my specific case things are not as expected.

Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

I think your LANG setting observation may be the key.

What happens if you set your LANG to "en_GB.UTF-8" ?

Mine is "en_CA.UTF-8", and your test file gives the correct output when the Terminal is set for UTF-8.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Hi Phil,

Hmm, that doesn't seem to be it either... My LANG setting was originally at en_GB.UTF-8 already. I tried yours, but there too I get the same erroneous results. So there must be something else at play (is your terminal configured as Unicode (UTF-8) like mine too?)

UPDATE: Ah, found it: I had PERL_UNICODE set to SDAL, that seemed to be the culprit; unsetting it fixes everything. Now to determine what I need to set it to to maintain compatibility with my own scripts (as I set it this way because of wide character problems in my own scripts).

Anyway, thanks for all your support!
Hayo Baan – Photography
Web: www.hayobaan.nl

Hayo Baan

Hi Phil,

Unless you have a neater way to fix this, I have now resorted to creating an alias for exiftool which sets PERL_UNICODE to 0 when running the command.
alias exiftool='PERL_UNICODE=0 /usr/local/bin/exiftool'

Sorry for having bothered you with this all...
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

I'm glad you figured this out.  I never would have thought of this!  I didn't even know about the PERL_UNICODE setting.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I did a bit of reading about the PERL_UNICODE environment setting.  Apparently it causes problems with other modules too: Should PERL_UNICODE be considered harmful?.

I really dislike the Perl Unicode support, and I have wasted a lot of time trying to turn off the Unicode features added in newer versions of Perl.  The PERL_UNICODE environment seems to be another way to subvert these efforts.  :(

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Yeah, read the guys question. I too found that Perl's Unicode support is quite complex (to say the least...) With the PERL_UNICODE setting I thought I had found the easiest/most stable solution. Guess I was wrong ;D

Perhaps it's time I revise my "solution" and rethink the unicode approach of my scripts...

Cheers, and thanks again!
Hayo Baan – Photography
Web: www.hayobaan.nl