8.90: Problem writing Czech utf-8 Sub-location string

Started by springm, June 20, 2012, 06:13:26 AM

Previous topic - Next topic

springm

Hi,
I have the Czech string 'Růžová', which I want to write as the Sub-location into an Image file:


#!/usr/bin/perl
use warnings;
use strict;
use lib qw!/home/springm/perl/Image-ExifTool-8.90/lib!;
use Image::ExifTool;
my $imagefile = "20120520-204858mws.rw2";
my $exifTool = new Image::ExifTool;
$exifTool->ExtractInfo($imagefile, {}); # create exiftool object
$exifTool->SetNewValue('Sub-location', 'Růžová');
$exifTool->WriteInfo($imagefile);


Unfortunately that string gets garbled either when writing into the file or during output


springm@denkzwerg:~/Bilder/test$ ./ext2.pl
springm@denkzwerg:~/Bilder/test$ exiftool 20120520-204858mws.rw2 | grep Sub-location
Sub-location                    : R?žová


(I cross-checked, it's not the terminal and also not the editor, as I tried reading the string from a verified utf-8 file as well)

From the docs I read that Image::Exiftool should handle all utf-8 without additional measures, so what is wrong there?

Markus

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

springm

Phil, I read it but obviously did not fully understand it. I'll try again.

Markus

Phil Harvey

OK, I just wanted to make sure you read this.  I can help if you still don't understand.

Basically, the bottom line is that you should set CodedCharacterSet to "UTF8" when writing any IPTC.  If you do this, and pass the proper encoding to ExifTool (corresponding to the Charset setting, which is UTF8 by default), then it should work.

Beware though, that existing IPTC may need to be recoded when you set CodedCharacterSet.  FAQ 10 gives an example of how this is done.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

springm

Phil, thanks for the hint. Now I got it right and the correct string gets recorded in the file.

As reverse gazetteering through google yields utf8 strings, it might be worth to explicitly mention this in the documentation.

Best - Markus

Phil Harvey

Hi Markus,

I'm not sure where else you would like this mentioned in the documentation.  FAQ 10 has this:

    Note that unless UTF‑8 is used, applications have no reliable way to determine
    the IPTC character encoding. For this reason, it is recommended that
    CodedCharacterSet be set to "UTF8" when creating new IPTC.


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

springm

Phil, of course it's in the FAQ. But the SetNewValue-part of the module documentation gives many examples of how to set metadata values, yet mentions charset as an effective option only in the very last line.

I have to state however, that the Image::Exiftool documentation is one of the most precise and best structured writings about an immensely complex program. So most probably my wish for a modification of the docs has to do a lot with my shortcomings when trying to understand this module.

And yes, I should have said so before: Thank you very much for this module, and especially for your decision to give free access to it.

Best - Markus

Phil Harvey

Hi Markus,

Thanks for your suggestion.

The documentation is of course a trade-off between verbosity and ease of use.  I often duplicate information which is important and overlooked, but otherwise try to keep the documentation as concise as possible -- or else nobody could ever be expected to read it.  However, I forgot you are using the API, which can certainly be more verbose than the application documentation.

And thanks for your compliments on the documentation.  It is refreshing to know that someone actually reads it. :)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).