Problems with UTF-8 in values returned by GetValue()

Started by blacklion, June 14, 2015, 02:26:58 PM

Previous topic - Next topic

blacklion

I have simple script (it is minimal test case) which copy "XMP:HierarhicalSubject" tag from XMP to "IPTC:Keywords" tag of JPEG file and, also, printing this data to STDOUT.
And if this tag contains UTF-8 data, it is copied right (JPEG file contains proper UTF8 data), but STDOUT get data double-encoded (UTF8 bytes encoded into UTF8 again!).
Printing UTF8 strings from perl itself (from script source) works well!

I have such preamble in my script:


use utf8;
use v5.12;
use strict;
use warnings;
use warnings  qw(FATAL utf8);
use open      qw(:std :utf8);


and I use this options for "destination" metadata:


$dstExif->Options('PrintConv' => 0, 'Charset' => 'UTF8', 'CharsetEXIF' => 'UTF8');
$dstExif->SetNewValue('*'); # Forget anything!
$dstExif->SetNewValue('CodedCharacterSet', 'UTF8', 'Type' => 'PrintConv', 'AddValue' => 0, 'Replace' => 1, 'Protected' => 1);


After that this works (destination file is Ok):


$xmpExif->Options('PrintConv' => 0);
my @v = $xmpExif->GetValue('HierarhicalSubject');
$dstExif->SetNewValue('Keywords', \@v, 'Type' => 'Raw', 'AddValue' => 0, 'Replace' => 1);


but


my @v = $xmpExif->GetValue('HierarhicalSubject');
print join(", ", @v), "\n";


shows double-encoded characters!

Adding constant perl string to @v with non-latin characters works too (and such array printed out really wired: one string is Ok, second double-encoded)! Both non-latin tags are set to destination correctly!

What is wrong with UTF8 returned by GetValue()?

Phil Harvey

It seems from your "use open" that you are opening the file in UTF-8 mode?  ExifTool expects to read binary files.  If you pass a file opened in UTF-8 mode I would expect something funny to happen like this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

blacklion

I'm using form of ImageInfo() with pathname, but utf-8 is set as default encoding for open() calls which doesn't specify encoding.

blacklion

Ok, I see, ExifTool use two-argument open() in my case. I'll pass proper file handle then.

blacklion

Nope, passing $fh, which was open as open($fh, '<:raw', $path); doesn't help. Destination file is Ok, as in previous case, but log output is double-encoded!

Phil Harvey

OK.  Hmm.  Try removing the "use utf8;" to see if that helps.  In general, I do not recommend "use utf8" with ExifTool.  If you treat characters as bytes throughout, then I don't think you will see this problem.

An alternative may be to call Encode::decode_utf8() on the returned strings (as mentioned in the API docs).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

blacklion

Nope! removing "use utf8" doesn't help, too! Again, result of WriteInfo() is valid and correct, but simple "print $v[0]" where @v contains cyrillic letters loaded from XMP shows double-encoded bytes!

It looks like a magic :)

Ok, really, it is only cosmetic -- debug output problems.

Phil Harvey

Oh wait.  Is stdout somehow set to utf8 mode?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Quote from: Phil Harvey on June 15, 2015, 12:55:16 PM
Oh wait.  Is stdout somehow set to utf8 mode?

- Phil

Yes, that's what the use open (:std :utf8); does.

Note to blacklion: if you are looking for a more automatic way of setting UTF8 in your Perl scripts, have a look at the utf8::all module at cpan. It sets lots of things automatically for you with just one statement: use utf8::all; (I have contributed to this module and can wholeheartedly recommend it  ;))
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

Quote from: Hayo Baan on June 15, 2015, 01:04:10 PM
Yes, that's what the use open (:std :utf8); does.

Well that makes sense then.  I've never done this myself.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).