ExifTool Forum

ExifTool => The Image::ExifTool API => Topic started by: blacklion on June 14, 2015, 02:26:58 PM

Title: Problems with UTF-8 in values returned by GetValue()
Post by: blacklion on June 14, 2015, 02:26:58 PM
I have simple script (it is minimal test case) which copy "XMP:HierarhicalSubject" tag from XMP to "IPTC:Keywords" tag of JPEG file and, also, printing this data to STDOUT.
And if this tag contains UTF-8 data, it is copied right (JPEG file contains proper UTF8 data), but STDOUT get data double-encoded (UTF8 bytes encoded into UTF8 again!).
Printing UTF8 strings from perl itself (from script source) works well!

I have such preamble in my script:


use utf8;
use v5.12;
use strict;
use warnings;
use warnings  qw(FATAL utf8);
use open      qw(:std :utf8);


and I use this options for "destination" metadata:


$dstExif->Options('PrintConv' => 0, 'Charset' => 'UTF8', 'CharsetEXIF' => 'UTF8');
$dstExif->SetNewValue('*'); # Forget anything!
$dstExif->SetNewValue('CodedCharacterSet', 'UTF8', 'Type' => 'PrintConv', 'AddValue' => 0, 'Replace' => 1, 'Protected' => 1);


After that this works (destination file is Ok):


$xmpExif->Options('PrintConv' => 0);
my @v = $xmpExif->GetValue('HierarhicalSubject');
$dstExif->SetNewValue('Keywords', \@v, 'Type' => 'Raw', 'AddValue' => 0, 'Replace' => 1);


but


my @v = $xmpExif->GetValue('HierarhicalSubject');
print join(", ", @v), "\n";


shows double-encoded characters!

Adding constant perl string to @v with non-latin characters works too (and such array printed out really wired: one string is Ok, second double-encoded)! Both non-latin tags are set to destination correctly!

What is wrong with UTF8 returned by GetValue()?
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: Phil Harvey on June 14, 2015, 05:06:28 PM
It seems from your "use open" that you are opening the file in UTF-8 mode?  ExifTool expects to read binary files.  If you pass a file opened in UTF-8 mode I would expect something funny to happen like this.

- Phil
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: blacklion on June 14, 2015, 05:34:28 PM
I'm using form of ImageInfo() with pathname, but utf-8 is set as default encoding for open() calls which doesn't specify encoding.
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: blacklion on June 14, 2015, 05:40:10 PM
Ok, I see, ExifTool use two-argument open() in my case. I'll pass proper file handle then.
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: blacklion on June 14, 2015, 05:54:07 PM
Nope, passing $fh, which was open as open($fh, '<:raw', $path); doesn't help. Destination file is Ok, as in previous case, but log output is double-encoded!
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: Phil Harvey on June 14, 2015, 07:14:02 PM
OK.  Hmm.  Try removing the "use utf8;" to see if that helps.  In general, I do not recommend "use utf8" with ExifTool.  If you treat characters as bytes throughout, then I don't think you will see this problem.

An alternative may be to call Encode::decode_utf8() on the returned strings (as mentioned in the API docs).

- Phil
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: blacklion on June 15, 2015, 09:43:59 AM
Nope! removing "use utf8" doesn't help, too! Again, result of WriteInfo() is valid and correct, but simple "print $v[0]" where @v contains cyrillic letters loaded from XMP shows double-encoded bytes!

It looks like a magic :)

Ok, really, it is only cosmetic -- debug output problems.
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: Phil Harvey on June 15, 2015, 12:55:16 PM
Oh wait.  Is stdout somehow set to utf8 mode?

- Phil
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: Hayo Baan on June 15, 2015, 01:04:10 PM
Quote from: Phil Harvey on June 15, 2015, 12:55:16 PM
Oh wait.  Is stdout somehow set to utf8 mode?

- Phil

Yes, that's what the use open (:std :utf8); does.

Note to blacklion: if you are looking for a more automatic way of setting UTF8 in your Perl scripts, have a look at the utf8::all module at cpan. It sets lots of things automatically for you with just one statement: use utf8::all; (I have contributed to this module and can wholeheartedly recommend it  ;))
Title: Re: Problems with UTF-8 in values returned by GetValue()
Post by: Phil Harvey on June 15, 2015, 02:30:15 PM
Quote from: Hayo Baan on June 15, 2015, 01:04:10 PM
Yes, that's what the use open (:std :utf8); does.

Well that makes sense then.  I've never done this myself.

- Phil