I have simple script (it is minimal test case) which copy "XMP:HierarhicalSubject" tag from XMP to "IPTC:Keywords" tag of JPEG file and, also, printing this data to STDOUT.
And if this tag contains UTF-8 data, it is copied right (JPEG file contains proper UTF8 data), but STDOUT get data double-encoded (UTF8 bytes encoded into UTF8 again!).
Printing UTF8 strings from perl itself (from script source) works well!
I have such preamble in my script:
use utf8;
use v5.12;
use strict;
use warnings;
use warnings qw(FATAL utf8);
use open qw(:std :utf8);
and I use this options for "destination" metadata:
$dstExif->Options('PrintConv' => 0, 'Charset' => 'UTF8', 'CharsetEXIF' => 'UTF8');
$dstExif->SetNewValue('*'); # Forget anything!
$dstExif->SetNewValue('CodedCharacterSet', 'UTF8', 'Type' => 'PrintConv', 'AddValue' => 0, 'Replace' => 1, 'Protected' => 1);
After that this works (destination file is Ok):
$xmpExif->Options('PrintConv' => 0);
my @v = $xmpExif->GetValue('HierarhicalSubject');
$dstExif->SetNewValue('Keywords', \@v, 'Type' => 'Raw', 'AddValue' => 0, 'Replace' => 1);
but
my @v = $xmpExif->GetValue('HierarhicalSubject');
print join(", ", @v), "\n";
shows double-encoded characters!
Adding constant perl string to @v with non-latin characters works too (and such array printed out really wired: one string is Ok, second double-encoded)! Both non-latin tags are set to destination correctly!
What is wrong with UTF8 returned by GetValue()?
It seems from your "use open" that you are opening the file in UTF-8 mode? ExifTool expects to read binary files. If you pass a file opened in UTF-8 mode I would expect something funny to happen like this.
- Phil
I'm using form of ImageInfo() with pathname, but utf-8 is set as default encoding for open() calls which doesn't specify encoding.
Ok, I see, ExifTool use two-argument open() in my case. I'll pass proper file handle then.
Nope, passing $fh, which was open as open($fh, '<:raw', $path); doesn't help. Destination file is Ok, as in previous case, but log output is double-encoded!
OK. Hmm. Try removing the "use utf8;" to see if that helps. In general, I do not recommend "use utf8" with ExifTool. If you treat characters as bytes throughout, then I don't think you will see this problem.
An alternative may be to call Encode::decode_utf8() on the returned strings (as mentioned in the API docs).
- Phil
Nope! removing "use utf8" doesn't help, too! Again, result of WriteInfo() is valid and correct, but simple "print $v[0]" where @v contains cyrillic letters loaded from XMP shows double-encoded bytes!
It looks like a magic :)
Ok, really, it is only cosmetic -- debug output problems.
Oh wait. Is stdout somehow set to utf8 mode?
- Phil
Quote from: Phil Harvey on June 15, 2015, 12:55:16 PM
Oh wait. Is stdout somehow set to utf8 mode?
- Phil
Yes, that's what the
use open (:std :utf8); does.
Note to blacklion: if you are looking for a more automatic way of setting UTF8 in your Perl scripts, have a look at the utf8::all module at cpan. It sets lots of things automatically for you with just one statement:
use utf8::all; (I have contributed to this module and can wholeheartedly recommend it ;))
Quote from: Hayo Baan on June 15, 2015, 01:04:10 PM
Yes, that's what the use open (:std :utf8); does.
Well that makes sense then. I've never done this myself.
- Phil