Non-ascii filenames in utf-8 encoded argfile

Started by Archive, May 12, 2010, 08:54:33 AM

Previous topic - Next topic

Archive

[Originally posted by hoh on 2009-03-25 10:46:55-07]

I am running exiftool.exe on windows and using the -@ option to pass the arguments via a file e.g.

exiftool.exe -@ args.txt

I am encoding args.txt as utf-8 as some of the tags I insert are utf-8 encoded.
This all works fine until I try to tag a photo which has a non-ascii character in its name e.g. a 'u' with an umlaut (two dots) over it. Then I get a File not found error.

It works fine if I pass the same photo name on the commandline. It also works fine if args.txt is saved in ascii format. However, if I save args.txt as utf-8 it can't find the file.

Am I doing something wrong?

Archive

[Originally posted by exiftool on 2009-03-25 11:28:35-07]

I assume you're in Windows.  Other people have had
this problem in Windows too.  I don't know how to specify
special characters in filenames in Windows using UTF-8.
The only way I know is to use the default Windows encoding
(Latin1 in north america) to specify the filename.  It is certainly
possible to mix encodings in a single .args file, and exiftool won't
have a problem with this.  However, if you edit the arg files
with an editor, it may not be able to handle this.  Or
you could take the filename out of the .args file and
pass it directly, then make the .args file UTF-8 and the
command line parameters Latin1.

There may be better solutions, but I'm not a windows
expert.

This problem doesn't occur on other systems.

- Phil

Archive

[Originally posted by hoh on 2009-03-25 12:04:19-07]

How is the windows exe built? Is it the perl scripts which are opening and parsing the args file to get the filename or is it a wrapper around the perl scripts that is doing this?

Archive

[Originally posted by exiftool on 2009-03-25 12:11:33-07]

This is all done using standard Perl.  No character translation
is done by exiftool on file names.  The .args file is read
using perl, and the filename arguments are passed directly
to the Perl open() function (with the exception of filenames
containing '*' or '?', which are first passed through
File::Glob::bsd_glob()).

- Phil

Archive

[Originally posted by hoh on 2009-03-25 12:31:40-07]

When the .args file is read using perl is the encoding specified? Could it be that the assumed encoding is different on windows to other platforms?

Thanks for your responsiveness!

- Howard

Archive

[Originally posted by exiftool on 2009-03-25 14:18:34-07]

The .args file is read as a binary file.  No encoding is specified.
The only thing I do is to ignore a leading UTF-8 BOM if it exists.

I just tried this here, and it works for me if I specify the filename
in Latin1 in the .args file.  And I am sure I would
have no problems with other arguments in UTF-8 in the
same file.

- Phil

Archive

[Originally posted by atvonk on 2009-03-25 16:48:28-07]

My Windows Vista computer seems to want filenames for open() encoded in ISO 8859-1 (aka Latin-1). This computer is a standard US installation. The following code works for me, in a UTF8 encoded Perl script. The filename has an o-acute character in it (written as "o" in the code below, since this message board does not allow such characters in code; it should be "photo/tst/09 Se a Cab&#243.wma", a Santana song) which renders differently in various encodings:

Code:
use 5.010;
use Encode;

my $fn5 = "photo/tst/09 Se a Cabo.wma";

# This prints n/a at the end
say "File $fn5, valid UTF8: ", utf8::valid($fn5), " size: ", -s $fn5;
Encode::from_to($fn5, 'utf8', 'iso-8859-1');
# And now it prints the filesize
say "File $fn5, valid UTF8: ", utf8::valid($fn5), " size: ", -s $fn5;

my $nfn = "utf8-names.txt";
say "\nFilenames from $nfn";
open(my $utf8file, "<", $nfn) or die "Cannot open $nfn $! $^E\n";
while (<$utf8file>) {
    chomp;
    # This prints n/a at the end
    say "File $_, valid UTF8: ", utf8::valid($_), " size: ", -s $_ ? -s $_ : 'n/a';
    Encode::from_to($_, 'utf8', 'iso-8859-1');
    # And now it prints the filesize
    say "File $_, valid UTF8: ", utf8::valid($_), " size: ", -s $_ ? -s $_ : 'n/a';
}
close($utf8file);

The key idea seems to be to use Encode::from_to to convert the filename from UTF-8 to ISO-8859-1.

Alexander.

Archive

[Originally posted by exiftool on 2009-03-25 18:55:19-07]

Hi Alexander,

This is consistent with my tests.

But I'm not sure what the point is.  If you are using
a Perl script to write your .args file, then your
problem is solved.  I hope you aren't proposing
that I add a patch to exiftool to do this, because
I don't think this is appropriate.

- Phil

Archive

[Originally posted by atvonk on 2009-03-25 19:10:01-07]

Hi Phil,

My point was not to suggest any patch to exiftool. I wanted to point out that a general Perl script can be written to fix Howard's problem.

Alexander.

Archive

[Originally posted by exiftool on 2009-03-25 20:18:34-07]

Hi Alexander,

Great.  Thanks for clearing this up.

And thanks for your input.

- Phil

Archive

[Originally posted by hoh on 2009-03-26 09:47:50-07]

Hi Phil,

Unfortunately my problem can't be fixed by using Latin-1 encoded file names as I also want to handle far eastern languages e.g. Chinese

I've never touched perl before but I had a look at where the argfile is opened: are you sure its opened as binary? A quick web search told me that the open function can take a parameter which tells it the encoding of the file e.g.

Code:
unless (open(ARGFILE,"<:utf8",$argFile)) {

 

Unfortunately I don't have any way of easily testing this but it could potentially fix my problem. I  agree that its a windows-only issue :-(

- Howard

Archive

[Originally posted by exiftool on 2009-03-26 11:35:02-07]

Hi Howard,

I can not impose a fixed encoding like this on the argfile
because it is not necessarily UTF-8.  Also, I suspect this
won't solve your problem anyway.

I checked, and I actually don't set binmode for the argfile
after it is opened, but that only affects the handling of
linefeeds, not the special character encoding.

A possible alternative is to use the short forms for the filenames.
These are short DOS-compatible names which don't contain
any special characters and can be used to reference any
file on a Windows system.  The only thing is that I don't
know how to do the Windows-to-DOS filename conversion.

- Phil

Archive

[Originally posted by hoh on 2009-03-26 11:49:39-07]

Hi Phil,

Is there any way I can build the perl as a windows exe myself to test if it works?

I agree it would be wrong to impose a fixed encoding on the argfile. Could it be added as an option instead (if it works that is)?

- Howard

Archive

[Originally posted by exiftool on 2009-03-26 13:05:01-07]

Hi Howard,

If you want to test it, install ActivePerl and use the Perl
installation.  Compiling the Windows.exe is complicated.

- Phil