ExifTool Forum

ExifTool => The "exiftool" Application => Topic started by: eed on September 04, 2020, 08:02:13 AM

Title: FAQ: 18b. "I'm having problems with special characters on the Windows
Post by: eed on September 04, 2020, 08:02:13 AM
Windows10 64 bit. At the command prompt (cmd) first I changed code page to utf8:


C:\work>chcp 65001
Active code page: 65001


Now I'm trying to execute command:

C:\work>ExifTool -charset filename=utf8 -@ D:\αβγ\test.txt
"Error opening arg file D:\???\test.txt"


Note that arg file test.txt is in folder with Greek name "αβγ".

I'm set code page to utf8 and I'm using "-charset filename=utf8" option.

What I'm missing?

Greek name is just for example.
If I have (on same computer) folders in different languages - Greek, Cyrillic, Turkish etc, how can process arg file in these folders?
Title: Re: FAQ: 18b. "I'm having problems with special characters on the Windows
Post by: Phil Harvey on September 05, 2020, 07:13:15 AM
I don't know if you are missing anything.  I can't test this right now, but special characters in Windows file names are a real problem.

- Phil
Title: Re: FAQ: 18b. "I'm having problems with special characters on the Windows
Post by: StarGeek on September 05, 2020, 11:04:28 AM
This is something I've always had a problem with.  I couldn't get these characters to work even with the new Windows Terminal.  I still haven't gotten around to installing the Windows Linux subsystem yet to see if that works.
Title: Re: FAQ: 18b. "I'm having problems with special characters on the Windows
Post by: eed on September 11, 2020, 05:34:41 AM
Hi, Phil.

I did some tests and found out that problem with special characters in Windows command line is not related to ExifTool, but it is a Windows issue.
I did a small simple console program with one task: output hex bytes dump of it's command line.

To get command line two Windows API functions can be used: GetCommandLineA or GetCommandLineW.
"-A" version (ANSI) works with codepages while "-W" (wide) works with unicode.

Results from tests:
With GetCommandLineW command line is in UTF-16 (UTF-16LE little-endian order to be more specific).
So result is UTF-16 alwais. Regardles of code page set with chcp. Even if code page is set to utf-8 (chcp 65001) result is UTF-16.
The test app can be started from command prompt, from widows explorer or via other app with CreateProcess - in all cases we get UTF-16.

With GetCommandLineA command line is in ANSI format.
The original UTF-16 command is converted to ANSI with possible data loss, because all unicode chars cannot be mapped to ANSI.
And usually we have a data loss (unless we limit ourselves to English only).
Unfortunately even if code page in console is set to utf-8 command line is NOT unicode, but ANSI.


Based on that I have one suggestion for ExifTool (for Windows only):
Instead of use standard input file "STDIN" use GetCommandLineW to get command line (in UTF-16LE).
If needed it can be converted to utf-8 with Windows API function WideCharToMultiByte.

Benefits: Command line does not depends anymore of code page set.
The same command line can contain any chars in any language, even in several different languages at once.


I'm not sure if this is a good idea or not. Just a suggestion.
Title: Re: FAQ: 18b. "I'm having problems with special characters on the Windows
Post by: Phil Harvey on September 11, 2020, 08:03:46 AM
Thanks for this suggestion.  Unfortunately, it would be some work to switch to use GetCommandLineW instead of the standard Perl argument handling.  I think this would be possible, but it would require me to spend far more time in my Windows virtual machine than I would like.  (My virtual machine is dead slow because I don't have enough ram on that system.  Also, I would rather be dealing with metadata than adding system-specific patches.)

- Phil