writing utf-8 UserComment

Started by Archive, May 12, 2010, 08:54:34 AM

Previous topic - Next topic

Archive

[Originally posted by voda on 2009-05-07 12:58:59-07]

Hello,

I have a problem writing a utf-8 string in the ExifIFD:UserComment field form a perl script.
When i use $ exiftool -UserComment="this is a comment:ěščřžýáíé" test.jpg everything is OK.

But with this script:
Code:
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use diagnostics;

binmode STDOUT, ':utf8';
use Image::ExifTool qw(:Public);

my $tag = "UserComment";
my $val = "this is a comment:
ěščřžýáíé
Code:
";
my $file = "test.jpg";

my $exifTool = new Image::ExifTool;
$exifTool->Options(Charset => 'UTF8');
my ($success, $errStr) = $exifTool->SetNewValue($tag, $val);
if ($success) {
   print "OK\n";
} else {
   print "FAILED => $errStr\n";
}
$success = $exifTool->WriteInfo($file);
if ($success) {
   print "OK\n";
} else {
   print "FAILED\n";
}
in the comment isnt realy the newline - only have problems here with the czech chars

I get this output:

OK

OK

and output from diagnostics:
Code:
Character in 'C' format wrapped in pack at
        /usr/share/perl5/Image/ExifTool/Writer.pl line 217 (#1)
    (W pack) You said

        pack("C", $x)

    where $x is either less than 0 or more than 255; the "C" format is
    only for encoding native operating system characters (ASCII, EBCDIC,
    and so on) and not for Unicode characters, so Perl behaved as if you meant

        pack("C", $x & 255)

    If you actually want to pack Unicode codepoints, use the "U" format
    instead.
and the image is unchanged (except for the FileModifyDate tag).

Is something wrong in my script, or where is the problem?

Thanks voda

Archive

[Originally posted by exiftool on 2009-05-07 13:13:59-07]

What version of Perl are you using?  Your script works for me with Perl 5.8.6.

But problems like this can be created if exiftool receives a string with
wide characters.  Internally, I attempt to convert all strings to a series
of UTF8 bytes (not characters), but it appears this may not be working
for you for some reason.

- Phil

Archive

[Originally posted by exiftool on 2009-05-07 13:15:00-07]

Also, what version of ExifTool?

Archive

[Originally posted by exiftool on 2009-05-07 13:26:09-07]

Note that there are know problems in this area with
Perl 5.10 which were patched in ExifTool 7.18.  So
you should be using 7.18 or later (but I suggest using
the latest version, 7.74).

- Phil

Archive

[Originally posted by voda on 2009-05-07 13:30:17-07]

Perl: 5.10.0-19

ExifTool: 7.30

all from debian testing

Archive

[Originally posted by exiftool on 2009-05-07 13:39:32-07]

Hi Voda,

Thanks.  I can reproduce this with Perl 5.10.0 and ExifTool 7.74.

The problem goes away if you remove the "use utf8;" in your script,
but I will have to investigate further to see if there is something
I can do to avoid this problem if the script invokes "use utf8".

 - Phil

Archive

[Originally posted by voda on 2009-05-07 13:59:57-07]

Now it works.

Thanks voda

Some more info:

the same problem is when i use data from a webpage(is in utf-8)

$html = decode ('UTF-8', get $url); => doesnt work (same output with diagnostics as before)

$html = get $url; => works

Archive

[Originally posted by exiftool on 2009-05-07 15:12:05-07]

Thanks for the information. I should be able to fix this with
the next exiftool release.

I hate Perl Unicode.  All I ever want to do is to use byte data,
but converting UTF8 to byte data is a moving target in Perl.
I need a technique that works across all Perl versions.

- Phil

Archive

[Originally posted by exiftool on 2009-05-08 12:16:15-07]

I am positing this for John Ellis - PH

John wrote:

[I tried posting this on the CPAN forum, but I couldn't figure out how
to get it to accept my <code> tags.] - I think the forum didn't
like the apostrophe characters that you were using, so I changed them - PH


 
I'm having trouble storing Unicode characters in the XMP:Description
field from Perl.  The script below takes a single argument, a filename.
It writes a string containing the Unicode character \x{263a} (a smiley)
into XMP:Description.  But when the XMP:Description field is retrieved,
the smiley has been changed to a ":".  I've verified this by using
"exiftool -b -xmp" to examine the contents as well.

 
Am I doing something wrong?   I've spent several hours debugging
this (and learning much more than I anticipated about Perl and Unicode).
Perl 5.10.0 (ActiveState), Exiftool 7.67.

 
Thanks very much for any help,

John

 
-----------------------------------------------------------------------

Here is the result of running the script:

 
Code:
>exif-uni.pl b.jpg
 
Value written:
length:  7
is_utf8: true
chars: 3c 3c 3c 263a 3e 3e 3e
bytes: 3c 3c 3c e2 98 ba 3e 3e 3e
 
Value read:
length:  7
is_utf8: false
chars: 3c 3c 3c 3a 3e 3e 3e
bytes: 3c 3c 3c 3a 3e 3e 3e
 
----------------------------------------------------------------------
And here is the script:
 
# usage: exif-uni.pl <file>
 
use strict;
use Image::ExifTool ();
use Encode;
 
my $value = "<<<\x{263a}>>>";
 
my $exifTool = new Image::ExifTool;
$exifTool->SetNewValue ('Description', $value);
$exifTool->Options (PrintConv => 0);
$exifTool->WriteInfo ($ARGV [0]);
DumpStr ("Value written", $value);
 
my $exifTool = new Image::ExifTool;
$exifTool->Options (PrintConv => 0);
$exifTool->ExtractInfo ($ARGV [0]);
$value = $exifTool->GetValue ('Description', 'Raw');
 
DumpStr ("Value read", $value);
 
sub DumpStr {
    my ($label, $s) = @_;
 
    print $label, ":\n";
    print "length:  ", length ($s), "\n";
    print "is_utf8: ", Encode::is_utf8 ($s) ? "true" : "false", "\n";
 
    print "chars:";
    for my $i (0 .. length ($s) - 1) {
        printf " %2x", ord (substr ($s, $i, 1)); }
    printf "\n";
 
    Encode::_utf8_off ($s);
    print "bytes:";
    for my $i (0 .. length ($s) - 1) {
        printf " %2x", ord (substr ($s, $i, 1)); }
    print "\n\n";}

Archive

[Originally posted by exiftool on 2009-05-08 12:21:44-07]

The problem is that ExifTool expects all input in raw bytes.
Internally, exiftool attempts to convert UTF-8 strings to
byte strings, but this is broken when used with Perl 5.10.
This will be fixed in version 7.75 when it is released, but
until then you can manually convert the UTF-8 values
before passing them to ExifTool:

Code:
$value = Encode::encode('utf8', $value);

Doing this won't be necessary for version 7.75 or later, but it
won't hurt.

All values returned by exiftool are byte strings, so you must
decode them as UTF-8 if you want to use them this way in
your script (this won't change with 7.75).

I hope this makes sense.

Sorry for any inconvenience. This whole Perl/Unicode business
really is a PITA.

- Phil

Archive

[Originally posted by exiftool on 2009-05-08 12:46:18-07]

I have just uploaded a
https://exiftool.org/Image-ExifTool-7.75p.tar.gz" target="_blank">7.75 pre-release
for you test out.  Please let me know if this version solves your problem.

Thanks.

- Phil

Archive

[Originally posted by johnrellis on 2009-05-08 18:32:19-07]

Yes, my test script now works as you described with the pre-release -- SetNewValue accepts Perl Unicode strings and GetValue returns those strings as raw bytes that need to be decoded.  I have not fully tested my full program with the pre-release -- perhaps later this weekend.   Any guess as to the release date?  

I agree that Perl Unicode is painful -- an unfinished work in progress.  Thanks very much for the prompt response,

John

Archive

[Originally posted by exiftool on 2009-05-08 22:39:04-07]

Hi John,

Thanks for the test.  If I have a chance, I hope to release 7.75 offically tomorrow
or Sunday.

- Phil

Archive

[Originally posted by johnrellis on 2009-05-08 22:57:02-07]

A minor suggestion for the API documentation: Include the caveat that for tags like XMP:Description that officially support Unicode, the string values returned by Exiftool are taken directly from the file and are strings of 8-bit bytes representing the UTF-8 encoding of the value.

Thanks.