Hallo,
On ExifTool homepage under headline "known problems" is written:
"In Windows, ExifTool will not process files with Unicode characters in the file name.
This is due to an underlying lack of support for Unicode filenames in the Windows standard C I/O libraries."
I ignored the above message and did some tests using the "windows short_filename" (DOS 8.3 filename)
for files with Unicode-characters in filename.
The filename to be discussed is <directory_path>\<filename>.
For my tests I used various ExifTool versions up to 8.50 and I tested on a Windows 2000 and on a Windows XP system.
(I had no other windows systems for testing).
The suffix of <filename> always had only ASCII-characters, like *.jpg.
With wording
- an ASCII character is a charcter with hex-value < 128
- an ANSI character is a character with hex-value < 256 AND it is a valid character within the pc-system codepage.
E.g. on my machine with codepage 1252 the 'german Umlaute' Ä, Ö etc. are valid ANSI characters.
Chinese charcters are of course Unicode-characters, because of hex-value > 255.
Also cyrillic characters are Unicode-characters on my system, because they are not valid within the codepage 1252 of my system.
- an Unicode-character is a character with hex-value > 255 OR it is not a valid character in pc-system codepage.
for <directory_path> and <filename> we see the following:
1) Both <directory_path> and <filename> contain only ASCII-characters.
You all know what great job ExifTool does.
2) <directory_path> and/or <filename> also contain some ANSI-characters.
No difference to 1).
3) <filename> contains only ANSI-characters but <directory_path> contains at least 1 Unicode-character.
I addressed the file using <short_name_of_directory_path>\<filename> (e.g.: D:\direct~1\testpicture.jpg),
which contains only ANSI-characters and I have seen no restriction working with such files.
For such files it is also possible to create e.g. a *.MIE file using <short_name_of_directory_path>/<filename.MIE>
4) <filename> contains at least 1 Unicode-character
I addressed the file as follows
a) <short_name_of_directory_path>\<short_filename>
in case of <directory_path> also contains a Unicode_character (e.g.:D:\direct~1\filena~1.jpg)
b) <directory_path>/<short_filename>
in case of <directory_path> contains only ANSI-characters (e.g.: D:\testdirectory\filena~1.jpg).
In combination with option -overwrite_original_in_place ExifTool opens the file as in case 1)
So e.g. it is also possible to modify some metadata tags.
I have seen no restriction from ExifTool side.
In case of modifying a metadata tag and NOT using the option -overwrite_original_in_place you will get a file in the
specified directory but the <long_filename> is the given <short_filename> (which contains only ASCII-characters).
This behaviour of ExifTool (together with Perl) is wonderful for me.
Now I have the following feature request:
Please do NOT change this behaviour opening/accessing a file.
Thanks in advance
Herb
Hi Herb,
Thanks!
This is very interesting and useful, and explains why I wasn't able to reproduce this problem when I was testing (I was probably using what you call ANSI characters in my tests).
Doesn't this boil down to the fact that you can always get exiftool to read/write a file by specifying the short directory and short filename? I'm not exactly sure why you use long names at all.
But unfortunately I think there are still cases where exiftool won't do what it should:
1) When extracting information from multiple files by specifying a directory name, and one or more files in the directory contain Unicode characters in their name.
2) When writing output to a different directory with Unicode characters in the name. Here, exiftool will create directories if they don't exist, but it won't be able to create directories with Unicode characters.
- Phil