ExifTool Forum

ExifTool => Archives => Topic started by: Archive on May 12, 2010, 08:54:35 AM

Title: Having trouble with character encoding and © symbol
Post by: Archive on May 12, 2010, 08:54:35 AM
[Originally posted by djeyewater on 2009-05-13 15:37:34-07]

I'm using exiftool with the -json output option, but it isn't encoding an exif value that contains the © character to UTF-8, so when I try to decode the JSON it doesn't work (I'm using the PHP json_decode function). I also tried using the -EscapeHTML option, but the copyright symbol is still output as © rather than ©

I tried adding some chinese characters to the exif, and exiftool encoded them as UTF-8 when extracting, just not the copyright symbol.

Any ideas on what's causing this/how to fix it?

Thanks

Dave
Title: Re: Having trouble with character encoding and © symbol
Post by: Archive on May 12, 2010, 08:54:35 AM
[Originally posted by exiftool on 2009-05-13 15:53:58-07]

Hi Dave,

The EXIF copyright tag isn't translated since the encoding is
not specified by the EXIF spec.  However, this should work if you
write the string in UTF-8:

exiftool a.jpg -usercomment="\302\251"

    1 image files updated

exiftool a.jpg -copyright

Copyright                       : ©

exiftool a.jpg -copyright -json

[{

  "SourceFile": "a.jpg",

  "Copyright": "©"

}]

exiftool a.jpg -copyright -json -escapehtml

[{

  "SourceFile": "a.jpg",

  "Copyright": "©"

}]

- Phil
Title: Re: Having trouble with character encoding and © symbol
Post by: Archive on May 12, 2010, 08:54:35 AM
[Originally posted by djeyewater on 2009-05-14 10:48:54-07]

Thanks, I didn't realise the problem was the value not being encoded in Unicode.

Unfortunately I can't ensure that the exif values will be encoded in Unicode, so I guess what I can do is to extract the metadata using -json -escapehtml, and then utf8_encode the json string in php before decoding it.

As a suggestion for a possible future feature, I'd find it quite useful if Exiftool had an option to convert all non UTF-8 strings to UTF-8

Regards

Dave
Title: Re: Having trouble with character encoding and © symbol
Post by: Archive on May 12, 2010, 08:54:35 AM
[Originally posted by exiftool on 2009-05-14 11:15:22-07]

Hi Dave,

Note that some EXIF are stored as UCS-2, and these are converted
to UTF-8.  See the EXIF description in
FAQ
number 10
for details.

It is not possible to reliably convert other values because
there is no way to determine the original encoding (the only option
would be to ask the user to provide these details).

- Phil