Non-Unicode filenames within Unicode directory can not be read.

Started by FrankB, September 17, 2023, 06:45:49 PM

Previous topic - Next topic

FrankB

- I am on Windows 10 64 Bits
- I run Exiftool 12.65 (Oliver Betz installer)

I'm currently working on a new release of ExiftoolGui to support international characters better.The initial problem as reported by GeoVan you can find here: https://exiftool.org/forum/index.php?topic=15171.0

By now it works already much better. I decided to always use an UTF8 args file to specify the filenames, and that solves nearly all the issues.

But in my tests I have found a strange case that I can not explain, other than being a bug in Exiftool. I hope I'm wrong though.

Consider this directory/file structure.

D:\data\Greek\Greek.jpg
D:\data\Greek\Ελληνικα.jpg
D:\data\Ελληνικα\Greek.jpg
D:\data\Ελληνικα\Ελληνικα.jpg

(Ελληνικα means Greek, as GeoVan explained)

If I create a small args file (utf8) in D:\data called args_subdir with
-charset
filename=utf8
-charset
utf8
-filename
*.jpg
Greek.jpg
Ελληνικα.jpg

and I execute in directory D:\data\Ελληνικα the command:

exiftool -@ ..\args_subdir

it will NOT read Greek.jpg

If I do the same in D:\data\Greek there is no problem.

PLease find the attached files and a Readme containing additional info in the ZIP.

Thanks in advance,
Frank

As a workaround for ExiftoolGui I now put the complete pathname in the args file.

EDIT: 18 Sept. Turns out if the directory is on a hard-drive then it works, on a network share it doesn't. See my reply.


FrankB

I tried to reproduce the BUG with 'Strawberry Perl', hoping to be able to provide more info. At first I wasn't able to reproduce the bug. Until I realized that I had always tested with directories on a network share that had drive mappings.

So the problem is reproducible on network shares, not on a directory on the hard-drive.

Changed the original post to read d:\data instead of c:\data

FrankB

#2
I did have a deep-dive, for someone who has never programmed in Perl, into the image-exiftool library. I'm convinced that I have found where it goes wrong. I am not saying I have a definitive solution, but I can now pinpoint the problem.

In EncodeFilename, at around line 4299, there is a test if the filename has 'non-standard' characters.
if ($file =~ /[\x80-\xff]/ or $force')
$file only contains the file name, not the directory name, so that explains my findings.

To allow better testing I added 'or $enc eq 'UTF8' and that solves my problem. But you will need to add '-charset filename=utf8' in the options of course!

I think it would be best to test for the directory name also, but I was unable to find where the current directory is stored. Maybe it isn't?

#------------------------------------------------------------------------------
# Encode file name for calls to system i/o routines
# Inputs: 0) ExifTool ref, 1) file name in CharSetFileName, 2) flag to force conversion
# Returns: true if Windows Unicode routines should be used (in which case
#          the file name will be encoded as a null-terminated UTF-16LE string)
sub EncodeFileName($$;$)
{
    my ($self, $file, $force) = @_;
    my $enc = $$self{OPTIONS}{CharsetFileName};

    if ($enc) {
#
# Frank B. If CharsetFileName=UTF8, always encode.
# Note: Only the file name is checked for \x80-\xff. Not the directory name.
#        if ($file =~ /[\x80-\xff]/ or $force') {
        if ($file =~ /[\x80-\xff]/ or $force or $enc eq 'UTF8') {
#
            # encode for use in Windows Unicode functions if necessary
            if ($^O eq 'MSWin32') {
                local $SIG{'__WARN__'} = \&SetWarning;
                if (eval { require Win32API::File }) {
                    # recode as UTF-16LE and add null terminator
                    $_[1] = $self->Decode($file, $enc, undef, 'UTF16', 'II') . "\0\0";
                    return 1;
                }
                $self->WarnOnce('Install Win32API::File for Windows Unicode file support');
            } else {
                # recode as UTF-8 for other platforms if necessary
                $_[1] = $self->Decode($file, $enc, undef, 'UTF8') unless $enc eq 'UTF8';
            }
        }
    } elsif ($^O eq 'MSWin32' and $file =~ /[\x80-\xff]/ and not defined $enc) {
        $self->WarnOnce('FileName encoding not specified') if IsUTF8(\$file) < 0;
    }
    return 0;
}

Phil Harvey

I must say that I don't understand this result.  The current directory name isn't used in any of the function calls, but somehow the Windows Unicode functions must be called if the current directory name contains Unicode characters, but only on network share drive.

This explains some oddities we have seen for network shares.

Testing the current directory name doesn't make sense because this isn't necessary for local disks.

Your solution would affect the existing behaviour.  To be safe I will instead implement this with a new option:

sub EncodeFileName($$;$)
{
    my ($self, $file, $force) = @_;
    my $enc = $$self{OPTIONS}{CharsetFileName};
    $force = 1 if $$self{OPTIONS}{WindowsWideFile};

So you should get the behaviour you want by setting -api WindowsWideFile=1 in your command.  This will appear in ExifTool 12.66

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

FrankB

I am very happy with your solution Phil.
For the time being I will keep passing the full path name in Exiftoolgui, so no hurry.

Thanks Frank

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

FrankB

Thanks Phil.

Will test it next week with ExiftoolGui and report back.

Frank

FrankB

Did a test in ExifToolGui, and it works perfect Phil.

In fact, as far as i'm concerned, WindowsWideFile=1 could have been the default. But dont change anything, I will make that the default in ExifToolGui.

Thanks Frank