Hi Phil,
When running exiftool on the command-line in a Mac Terminal window, special characters are displayed incorrectly.
My Mac terminal is set up to use UTF-8 (LANG=en_GB.UTF-8) and exiftool is set to output UTF-8 as well, special characters still display incorrectly though. However, if I tell exiftool to translate the output to latin encoding, everything is suddenly fine :o
Other programs, including my own perl scripts using the exiftool library, that use UTF-8 as character encoding work just fine though, so I don't understand why exiftool on the command-line doesn't...
Latin: Correct output (accented í):
~$ exiftool -charset Latin -Caption-Abstract test.jpg
Caption-Abstract : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
UTF-8: Incorrect output (garbled à instead of accented í):
~$ exiftool -charset UTF8 -Caption-Abstract test.jpg
Caption-Abstract : Palau de les Arts Reina SofÃa (architect: Santiago Calatrava)
Running the output through od -txC -c reveals that the former uses characters "c3 ad" (hex) for character í, while the latter uses "c3 83 c2 ad" (hex) instead.
The problem is that the IPTC is not encoded properly. Read FAQ 10 (https://exiftool.org/faq.html#Q10) for more information.
- Phil
Quote from: Phil Harvey on October 03, 2012, 07:01:47 AM
The problem is that the IPTC is not encoded properly. Read FAQ 10 (https://exiftool.org/faq.html#Q10) for more information.
Hi Phil, thanks for your prompt reply. However, this is not the problem (IPTC is encoded correctly and matches the encoding I configured for it). The same problem exists with XMP (UTF8 encoded) fields:
Latin: Correct output (accented í):
~$ exiftool -charset Latin -Caption-Abstract -Description test.jpg
Caption-Abstract : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)UTF-8: Incorrect output (garbled à instead of accented í):
~$ exiftool -charset UTF8 -Caption-Abstract test.jpg
Caption-Abstract : Palau de les Arts Reina SofÃa (architect: Santiago Calatrava)
Description : Palau de les Arts Reina SofÃa (architect: Santiago Calatrava)Furthermore, as I mentioned, when I use the perl library in one of my scripts to output the caption, it displays the characters correctly.
OK, then I really don't understand because I use a Mac myself and have never seen this problem.
Could you post a sample image? (or email to philharvey66 at gmail.com) What version of OS X are you using?
- Phil
Hi Phil,
Attached sample produces the following output for me:
~$ exiftool -charset UTF8 -charset EXIF=UTF8 -Caption-Abstract -Description -ImageDescription -Warning -CodedCharacterSet test.jpg
Caption-Abstract : Palau de les Arts Reina SofÃa (architect: Santiago Calatrava)
Description : Palau de les Arts Reina SofÃa (architect: Santiago Calatrava)
Image Description : Palau de les Arts Reina SofÃa (architect: Santiago Calatrava)
Coded Character Set : UTF8
~$ exiftool -charset Latin -charset EXIF=UTF8 -Caption-Abstract -Description -ImageDescription -Warning -CodedCharacterSet test.jpg
Caption-Abstract : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Description : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Image Description : Palau de les Arts Reina Sofía (architect: Santiago Calatrava)
Coded Character Set : UTF8
Forgot: I'm using OS X 10.7.4
Did some more testing and I think the problem may be related to the LANG variable. When I specify e.g., LANG=en_GB.ISO8859-1 or LANG=en_GB.ISO8859-15, exitool output is correct when specifying -charset UTF8, but incorrect when specifying -charset Latin. This should actually be the expected behaviour as my terminal has Unicode UTF-8 specified as encoding.
When I change the encoding to e.g., Windows Latin 1 and try again in a new window, output seems as expected (e.g., correct with -charset Latin, incorrect with -charset UTF8).
Hope this helps you find why in my specific case things are not as expected.
I think your LANG setting observation may be the key.
What happens if you set your LANG to "en_GB.UTF-8" ?
Mine is "en_CA.UTF-8", and your test file gives the correct output when the Terminal is set for UTF-8.
- Phil
Hi Phil,
Hmm, that doesn't seem to be it either... My LANG setting was originally at en_GB.UTF-8 already. I tried yours, but there too I get the same erroneous results. So there must be something else at play (is your terminal configured as Unicode (UTF-8) like mine too?)
UPDATE: Ah, found it: I had PERL_UNICODE set to SDAL, that seemed to be the culprit; unsetting it fixes everything. Now to determine what I need to set it to to maintain compatibility with my own scripts (as I set it this way because of wide character problems in my own scripts).
Anyway, thanks for all your support!
Hi Phil,
Unless you have a neater way to fix this, I have now resorted to creating an alias for exiftool which sets PERL_UNICODE to 0 when running the command.
alias exiftool='PERL_UNICODE=0 /usr/local/bin/exiftool'
Sorry for having bothered you with this all...
I'm glad you figured this out. I never would have thought of this! I didn't even know about the PERL_UNICODE setting.
- Phil
I did a bit of reading about the PERL_UNICODE environment setting. Apparently it causes problems with other modules too: Should PERL_UNICODE be considered harmful? (http://svok.blogspot.ca/2009/10/should-perlunicode-be-considered.html).
I really dislike the Perl Unicode support, and I have wasted a lot of time trying to turn off the Unicode features added in newer versions of Perl. The PERL_UNICODE environment seems to be another way to subvert these efforts. :(
- Phil
Yeah, read the guys question. I too found that Perl's Unicode support is quite complex (to say the least...) With the PERL_UNICODE setting I thought I had found the easiest/most stable solution. Guess I was wrong ;D
Perhaps it's time I revise my "solution" and rethink the unicode approach of my scripts...
Cheers, and thanks again!