Proposal for a robust and simple Windows version

Started by obetz, May 16, 2019, 07:23:36 AM

Previous topic - Next topic

allsan8

Quoteforgive me if I was not clear in my wording

You were clear.  (I avoid the command prompt.   :)  It isn't open unless it has to be open.)  I just uninstalled and installed to see what I was doing wrong.  I think I figured out why it wasn't working here.  After both installs, I immediately checked the path information.  I left those windows open while I tried the command prompt.  This time, I checked the path information, closed out their related windows, opened the command prompt.  Worked.  ExifToolGUI worked.  Sorry about that.  Thank you for your patience and making this available.

allsan8

Installed on the Windows 10 computer.  Installed in the Admin account.  Chose all users.  Installed everything.  Used all defaults.

Went into the Restricted account ... everything works (that I could quickly test and use most often) without Admin privileges.  Command prompt from folder.  ExifToolGUI.  Programs that open other programs such as ExifToolGUI.

Thank you!

obetz

thanks for the feedback.

A benefit of the "for all" installation is that ExifTool is by default installed to %Programfiles(x86)%, there being protected against accidental or malicious corruption.

Oliver

johnrellis

Hi Oliver,

I just learned about your Portable Perl Applications on Windows -- great stuff.

It just occurred to me that, with only a small amount of effort, it could greatly simplify ExifTool's long-standing issues with Unicode filenames on Windows.  To get a feel for those complexities, see my notes attempting to fully document the current ExifTool behavior: https://exiftool.org/forum/index.php/topic,8382.msg43116.html#msg43116.

Summary of the proposal: Change "ppl.c" to use wmain() to accept 16-bit Unicode characters in the argument and environment vectors. "ppl.c" would then convert that 16-bit Unicode to UTF-8 before invoking RunPerl().

Details

I believe the root of the problem is that Windows Perl uses a C-standard main() defined to accept 8-bit char strings:

int main(int argc, char **argv, char **env)

Those 8-bit character strings are interpreted in the current Windows System Code Page.

Internally, Windows uses 16-bit Unicode throughout. When a process is started, it receives its arguments and environment in 16-bit Unicode. The Windows Standard C library converts the 16-bit Unicode argument and environment vector (wchar_t) to 8-bit characters (char) before invoking main(). It  uses the current Windows System Code Page to do the conversion, which will truncate any Unicode characters not in the current system code page.

For example, by default, Windows computers in English (United States) are in code page 1251 Windows Latin 1 (ANSI). If you pass a 16-bit argument vector to Windows Perl containing Unicode characters that aren't in Windows Latin 1, those characters get truncated to 8 bits. This behavior is different from Unix systems, in which UTF-8 is the usual encoding of characters passed in "argv".

The solution is straightforward -- "ppl.c" should use wmain() instead:

int wmain(int argc, wchar_t **wargv, wchar_t **wenv)

It then converts "wargv" and "wenv" from 16-bit Unicode to UTF-8 and passes those converted argument vectors to RunPerl().  I found sample code here: https://github.com/circulosmeos/Perl-with-Unicode-for-Windows/blob/master/runperl.c

This would then allow ExifTool to receive full Unicode command-line arguments from any Windows program, including the Windows Console (cmd.exe), without any changes to ExifTool itself.

Warning: I am not an expert in Windows, Perl, ANSI C, or ExifTool. But I've spent too many hours trying to write Unicode-capable plugins that invoke ExifTool and I've learned way more than I wanted to.



johnrellis

"ExifTool's long-standing issues with Unicode filenames" => "ExifTool's long-standing issues with Unicode command-line arguments"

obetz

Hi johnrellis,

it would be great to have this additional benefit.

I will look into this in the next days.

Can you provide a simple test case, IOW the ExifTool command, the expected result and the current result?

Oliver

johnrellis

I packaged up a simple ExifTool test case in this folder: https://www.dropbox.com/s/jqg6szsyhntc3vr/exiftool-unicode-test-2019-06-27.zip?dl=0 . See "test.bat" for a detailed explanation.

"test.bat" also runs two programs, "wecho.exe" and "echo.exe", which use wmain() and main() respectively and which hex-dump their arguments to stdout. This lets you see easily what the console and cmd.exe are passing to wmain() and main().  (The whole business with the console's current code page and the Windows System Code Page can be quite confusing.)

Also, "ppl.c" should use the wide-char versions of the file functions (e.g. FindFirstFile).  See https://docs.microsoft.com/en-us/windows/desktop/learnwin32/working-with-strings. I think this allow "exiftool" to be installed in a directory path containing arbitrary Unicode characters, something that doesn't currently work.

obetz

this could be a minefield, full of legacies, strange development decisions, potential side-effects. It will take a  while to sort this out.

Quote from: johnrellis on June 27, 2019, 08:30:00 PM
Also, "ppl.c" should use the wide-char versions of the file functions (e.g. FindFirstFile).  See https://docs.microsoft.com/en-us/windows/desktop/learnwin32/working-with-strings. I think this allow "exiftool" to be installed in a directory path containing arbitrary Unicode characters, something that doesn't currently work.

Do you seriously think somebody should want to install ExifTool (or any other program) in a folder with non-ASCII characterse.g. c:\програ́мма\ExifTool? I don't think that I will actively support this unless somebody points out a very good reason.

Oliver

herb

Hello to all,

please allow to give my 2 cents.
The answer to your question is simple: YES

My friend in Singapore always uses paths with chinese characters.

Best regards
Herb

obetz

Quote from: herb on June 29, 2019, 07:01:41 AM
My friend in Singapore always uses paths with chinese characters.

for installing programs?

Or just for his data?

Oliver

herb

Hello,

Afaik, he uses it for both.
Files that are installed for testing (e.g. also Exiftool) have path with unicode characters.

Best regards
Herb

johnrellis

Quote from: obetz on June 29, 2019, 03:12:39 AM
Do you seriously think somebody should want to install ExifTool (or any other program) in a folder with non-ASCII characterse.g. c:\програ́мма\ExifTool? I don't think that I will actively support this unless somebody points out a very good reason.

There are a fair number of Adobe Lightroom plugins from at least several different authors (me and others) that include ExifTool (and other utilities like ImageMagick) bundled in their implementation.  These plugins can (and do) get installed anywhere. Typically they get installed either in C:\Users\user or on an external drive, both of which can have non-ANSI Latin 1 characters.

Sometimes installing ExifTool on such a path works, sometimes not -- it depends on whether the path character is in the current Windows System Code Page.  My experience is that lots of non-English-speaking users end up using path characters that aren't in the current Windows System Code Page, probably because they're using computers that have been configured for English.

Note that if "ppl.c" uses wmain(), it will receive its command-line arguments as 16-bit characters, so using the wide string and file routines is the natural thing to do (and a straightforward textual change using "#define Unicode").

johnrellis

Quote from: obetz on June 29, 2019, 03:12:39 AM
this could be a minefield, full of legacies, strange development decisions, potential side-effects. It will take a  while to sort this out.
Agreed that there's a lot of legacy here. But with some surgical care, your "ppl.c" could significantly improve a very sore pain point with ExifTool.  The typical user would get full Unicode compatibility from the command line merely by doing "chcp 65001" in their batch files.  That's much simpler than the current situation (see my notes I linked above).

(The original sin is the Windows port of Perl using main() instead of wmain().)

Luckily, ExifTool has a clean interface at the command line -- it will accept UTF-8 in all of its arguments, including paths. So that provides a clean interface for "ppl.c".

obetz

Quote from: johnrellis on June 29, 2019, 11:36:15 AM
The typical user would get full Unicode compatibility from the command line merely by doing "chcp 65001" in their batch files.

I'm not yet sure about this.

Quote from: johnrellis on June 29, 2019, 11:36:15 AM
The original sin is the Windows port of Perl using main() instead of wmain().

GetCommandLineW() is an alternative to wmain. Many aspects have to be considered...

In the mean, I found that Unicode support of several tools is still, hmm, "not mature".

Even the BMP (Basic Multilingual Plane) isn't handled correctly by several widespread text editors, e.g. Latin-D ꜦꜧꜨꜩꜲꜳꝎ is rendered wrong in PSPad and NPP (Notepad++) as well as in comparison windows of Beyond Compare. Don't ask what my older programming editor does...

Even worse is the handling of characters from higher planes (beyond UCS-2).  This forum software (SMF) told me "The following error or errors occurred while posting this message: The message body was left empty" when I composed this reply initially containing a "Five spoked asterisk" U+1F7AF. Windows Explorer doesn't display the clef U+1D11E.

Complex diacritics like Ỗ (O + Circumflex + Tilde) make funny results.

In the attached ZIP, you find some files with "interesting" names.

Now I know even better why use only 7 bit ASCII and no spaces in file names, and I'm glad that my native language has only few non-ASCII characters easily replaced by ASCII tupels.

Oliver

johnrellis

QuoteI found that Unicode support of several tools is still, hmm, "not mature".
Windows 10 made some significant changes in which fonts get installed by default, and these tools may not have been updated accordingly: https://support.microsoft.com/en-us/help/3083806/why-does-some-text-display-with-square-boxes-in-some-apps-on-windows-1. Or they could be just not setting suitable fallback fonts to display all Unicode characters.

QuoteIn the attached ZIP, you find some files with "interesting" names.
I compared the display of those in Windows File Explorer with Mac Finder, and with the exception of the MUSICAL SYMBOL G CLEF in File Explorer, they all displayed the same. File Explorer's fall back fonts must not include that character.