ExifTool Forum

ExifTool => The "exiftool" Application => Topic started by: Hayo Baan on October 03, 2012, 06:02:16 AM

Title: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 06:02:16 AM
Hi Phil,

When running exiftool on the command-line in a Mac Terminal window, special characters are displayed incorrectly.

My Mac terminal is set up to use UTF-8 (LANG=en_GB.UTF-8) and exiftool is set to output UTF-8 as well, special characters still display incorrectly though. However, if I tell exiftool to translate the output to latin encoding, everything is suddenly fine :o

Other programs, including my own perl scripts using the exiftool library, that use UTF-8 as character encoding work just fine though, so I don't understand why exiftool on the command-line doesn't...

Latin: Correct output (accented í):
~$ exiftool -charset Latin -Caption-Abstract test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


UTF-8: Incorrect output (garbled í instead of accented í):
~$ exiftool -charset UTF8 -Caption-Abstract test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


Running the output through od -txC -c reveals that the former uses characters "c3  ad" (hex) for character í, while the latter uses "c3  83  c2 ad" (hex) instead.
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Phil Harvey on October 03, 2012, 07:01:47 AM
The problem is that the IPTC is not encoded properly.  Read FAQ 10 (https://exiftool.org/faq.html#Q10) for more information.

- Phil
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 07:22:55 AM
Quote from: Phil Harvey on October 03, 2012, 07:01:47 AM
The problem is that the IPTC is not encoded properly.  Read FAQ 10 (https://exiftool.org/faq.html#Q10) for more information.

Hi Phil, thanks for your prompt reply. However, this is not the problem (IPTC is encoded correctly and matches the encoding I configured for it). The same problem exists with XMP (UTF8 encoded) fields:
Latin: Correct output (accented í):
~$ exiftool -charset Latin -Caption-Abstract -Description test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


UTF-8: Incorrect output (garbled í instead of accented í):
~$ exiftool -charset UTF8 -Caption-Abstract test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)


Furthermore, as I mentioned, when I use the perl library in one of my scripts to output the caption, it displays the characters correctly.
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Phil Harvey on October 03, 2012, 07:27:11 AM
OK, then I really don't understand because I use a Mac myself and have never seen this problem.

Could you post a sample image? (or email to philharvey66 at gmail.com) What version of OS X are you using? 

- Phil
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 07:37:45 AM
Hi Phil,

Attached sample produces the following output for me:
~$ exiftool -charset UTF8 -charset EXIF=UTF8 -Caption-Abstract -Description -ImageDescription -Warning -CodedCharacterSet test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Image Description               : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Coded Character Set             : UTF8

~$ exiftool -charset Latin -charset EXIF=UTF8 -Caption-Abstract -Description -ImageDescription -Warning -CodedCharacterSet test.jpg
Caption-Abstract                : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description                     : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Image Description               : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Coded Character Set             : UTF8

Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 07:40:54 AM
Forgot: I'm using OS X 10.7.4

Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 08:14:59 AM
Did some more testing and I think the problem may be related to the LANG variable. When I specify e.g., LANG=en_GB.ISO8859-1 or LANG=en_GB.ISO8859-15, exitool output is correct when specifying -charset UTF8, but incorrect when specifying -charset Latin. This should actually be the expected behaviour as my terminal has Unicode UTF-8 specified as encoding.

When I change the encoding to e.g., Windows Latin 1 and try again in a new window, output seems as expected (e.g., correct with -charset Latin, incorrect with -charset UTF8).

Hope this helps you find why in my specific case things are not as expected.

Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Phil Harvey on October 03, 2012, 08:35:26 AM
I think your LANG setting observation may be the key.

What happens if you set your LANG to "en_GB.UTF-8" ?

Mine is "en_CA.UTF-8", and your test file gives the correct output when the Terminal is set for UTF-8.

- Phil
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 08:56:14 AM
Hi Phil,

Hmm, that doesn't seem to be it either... My LANG setting was originally at en_GB.UTF-8 already. I tried yours, but there too I get the same erroneous results. So there must be something else at play (is your terminal configured as Unicode (UTF-8) like mine too?)

UPDATE: Ah, found it: I had PERL_UNICODE set to SDAL, that seemed to be the culprit; unsetting it fixes everything. Now to determine what I need to set it to to maintain compatibility with my own scripts (as I set it this way because of wide character problems in my own scripts).

Anyway, thanks for all your support!
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 09:17:01 AM
Hi Phil,

Unless you have a neater way to fix this, I have now resorted to creating an alias for exiftool which sets PERL_UNICODE to 0 when running the command.
alias exiftool='PERL_UNICODE=0 /usr/local/bin/exiftool'

Sorry for having bothered you with this all...
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Phil Harvey on October 03, 2012, 09:30:05 AM
I'm glad you figured this out.  I never would have thought of this!  I didn't even know about the PERL_UNICODE setting.

- Phil
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Phil Harvey on October 03, 2012, 10:32:49 AM
I did a bit of reading about the PERL_UNICODE environment setting.  Apparently it causes problems with other modules too: Should PERL_UNICODE be considered harmful? (http://svok.blogspot.ca/2009/10/should-perlunicode-be-considered.html).

I really dislike the Perl Unicode support, and I have wasted a lot of time trying to turn off the Unicode features added in newer versions of Perl.  The PERL_UNICODE environment seems to be another way to subvert these efforts.  :(

- Phil
Title: Re: Exiftool output in Mac Terminal displaying special characters incorrectly
Post by: Hayo Baan on October 03, 2012, 12:57:45 PM
Yeah, read the guys question. I too found that Perl's Unicode support is quite complex (to say the least...) With the PERL_UNICODE setting I thought I had found the easiest/most stable solution. Guess I was wrong ;D

Perhaps it's time I revise my "solution" and rethink the unicode approach of my scripts...

Cheers, and thanks again!