ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: springm on June 20, 2012, 06:13:26 AM

Title: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: springm on June 20, 2012, 06:13:26 AM
Hi,
I have the Czech string 'Růžová', which I want to write as the Sub-location into an Image file:


#!/usr/bin/perl
use warnings;
use strict;
use lib qw!/home/springm/perl/Image-ExifTool-8.90/lib!;
use Image::ExifTool;
my $imagefile = "20120520-204858mws.rw2";
my $exifTool = new Image::ExifTool;
$exifTool->ExtractInfo($imagefile, {}); # create exiftool object
$exifTool->SetNewValue('Sub-location', 'Růžová');
$exifTool->WriteInfo($imagefile);


Unfortunately that string gets garbled either when writing into the file or during output


springm@denkzwerg:~/Bilder/test$ ./ext2.pl
springm@denkzwerg:~/Bilder/test$ exiftool 20120520-204858mws.rw2 | grep Sub-location
Sub-location                    : R?žová


(I cross-checked, it's not the terminal and also not the editor, as I tried reading the string from a verified utf-8 file as well)

From the docs I read that Image::Exiftool should handle all utf-8 without additional measures, so what is wrong there?

Markus
Title: Re: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: Phil Harvey on June 20, 2012, 07:27:47 AM
Hi Markus,

Did you read the IPTC section of FAQ number 10 (https://exiftool.org/faq.html#Q10)?

- Phil
Title: Re: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: springm on June 20, 2012, 07:30:56 AM
Phil, I read it but obviously did not fully understand it. I'll try again.

Markus
Title: Re: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: Phil Harvey on June 20, 2012, 08:02:33 AM
OK, I just wanted to make sure you read this.  I can help if you still don't understand.

Basically, the bottom line is that you should set CodedCharacterSet to "UTF8" when writing any IPTC.  If you do this, and pass the proper encoding to ExifTool (corresponding to the Charset setting, which is UTF8 by default), then it should work.

Beware though, that existing IPTC may need to be recoded when you set CodedCharacterSet.  FAQ 10 gives an example of how this is done.

- Phil
Title: Re: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: springm on June 20, 2012, 01:31:10 PM
Phil, thanks for the hint. Now I got it right and the correct string gets recorded in the file.

As reverse gazetteering through google yields utf8 strings, it might be worth to explicitly mention this in the documentation.

Best - Markus
Title: Re: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: Phil Harvey on June 20, 2012, 01:38:04 PM
Hi Markus,

I'm not sure where else you would like this mentioned in the documentation.  FAQ 10 has this:

    Note that unless UTF‑8 is used, applications have no reliable way to determine
    the IPTC character encoding. For this reason, it is recommended that
    CodedCharacterSet be set to "UTF8" when creating new IPTC.

- Phil
Title: Re: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: springm on June 20, 2012, 03:27:56 PM
Phil, of course it's in the FAQ. But the SetNewValue-part of the module documentation gives many examples of how to set metadata values, yet mentions charset as an effective option only in the very last line.

I have to state however, that the Image::Exiftool documentation is one of the most precise and best structured writings about an immensely complex program. So most probably my wish for a modification of the docs has to do a lot with my shortcomings when trying to understand this module.

And yes, I should have said so before: Thank you very much for this module, and especially for your decision to give free access to it.

Best - Markus
Title: Re: 8.90: Problem writing Czech utf-8 Sub-location string
Post by: Phil Harvey on June 21, 2012, 07:43:49 AM
Hi Markus,

Thanks for your suggestion.

The documentation is of course a trade-off between verbosity and ease of use.  I often duplicate information which is important and overlooked, but otherwise try to keep the documentation as concise as possible -- or else nobody could ever be expected to read it.  However, I forgot you are using the API, which can certainly be more verbose than the application documentation.

And thanks for your compliments on the documentation.  It is refreshing to know that someone actually reads it. :)

- Phil