News:

2023-03-15 Major improvements to the new Geolocation feature

Main Menu

Proposal for a robust and simple Windows version

Started by obetz, May 16, 2019, 07:23:36 AM

Previous topic - Next topic

obetz

Quote from: johnrellis on June 30, 2019, 08:29:14 PM
QuoteI found that Unicode support of several tools is still, hmm, "not mature".
Windows 10 made some significant changes in which fonts get installed by default, and these tools may not have been updated accordingly:

the problems of the tools mentioned in my previous post are far beyond missing glyphs.

Even notepad.exe counts two columns for each character in plane 1 (SMP) but is not a powerful editor. At least, it showed no severe display or editing errors.

Sublime worked, but is not (yet?) my preferred editor.

Quote from: johnrellis on June 30, 2019, 08:29:14 PM
QuoteIn the attached ZIP, you find some files with "interesting" names.
I compared the display of those in Windows File Explorer with Mac Finder, and with the exception of the MUSICAL SYMBOL G CLEF in File Explorer, they all displayed the same.

Try "dir" in a cmd window, since we are talking about command line handling...

The command window doesn't seem to have fallback to other (proportional) fonts, and even "DejaVu Sans Mono" (reported to have the best coverage) lacks many glyphs.

Ceterum censeo: I still consider it a risk to install software to directories with Unicode path names. I'm so old that I prefer to stick with the "POSIX portable filename character set".

I'm sorting out now some subtle issues and likely will publish then a Unicode-enabled launcher.

johnrellis

Re editors, I primarily use Sublime on Mac, and I've had no issues with its Unicode support, but I haven't used it on Windows. On Windows, I do a limited amount of editing with Unicode on Visual Studio 2017 and Word, and no issues there. I used TextPad on Windows for code editing (including the occasional non-ANSI Latin characters) up until 2014, and it was OK.

Re the command console, when I need to work with a larger range of languages, I use NSimSun, which of the seven fonts pre-installed in the console for English (United States), seems to have the best language coverage.  It doesn't cover many of the more exotic symbols.

(My Lightroom plugins that include ExifTool are used in 112 countries, so I've had a fair amount of experience troubleshooting problems in many other languages, including Cyrillic and Asian languages. This motivated me, with Phil's help, to figure out exactly how the Windows console, main(), Perl, and ExifTool interacted.)

johnrellis

I experimented with modifications to "ppl.c" to use wmain() and encode the argv in UTF-8 before passing it to ExifTool, and in my testing so far it works fine (as long as ExifTool's -charset is set to the default "utf8").   

To support ExifTool's current semantics, a small change would need to be made to ExifTool. So to decouple the Unicode issue from your proposed "ppl.c" launcher (which is very worthwhile independent of Unicode), I posted a separate proposal for Unicode: https://exiftool.org/forum/index.php/topic,10246.0.html

You can see my modified version here: https://www.dropbox.com/s/prdk5yyqsb2j40t/ppl-2019-07-01.c?dl=0. I was focusing on the ExifTool use case, not the more general use cases you're supporting.  A diff should cleanly highlight the changes:

- Using "wchar_t" instead of "char" for command-line arguments and file paths.
- Providing encode(str, codepage), which encodes a wide string into any code page (e.g. UTF-8 or the current system code page).
- Using ARRAYSIZE instead of "sizeof" to compute the number of elements in a wchar_t array.
- Using the wide-char versions of the string library rountes (e.g. "wcsrchr" instead of "strrchr").
- Using the printf format code %S instead of %s for wide strings.

- Converting "env" from UTF-16 to the current system code page, so that RunPerl receives it just as it would from main().

- Converting argv [2 - argc-1] from UTF-16 to UTF-8, and converting argv [0 - 1] to the current system code page. Windows Perl always opens scripts with the 8-bit file i/o routines, regardless of whether -Ci is specified, so RunPerl needs to receive the script argument encoded in the current system code page, just as it is passed by main().

There wasn't any need to change the names of the file routines, since I compiled with UNICODE defined in the preprocessor. (Visual Studio did that for me -- I don't know how that works precisely with gcc.)



obetz

Quote from: johnrellis on July 01, 2019, 05:35:23 PM
I experimented with modifications to "ppl.c" to use wmain() and encode the argv in UTF-8 before passing it to ExifTool, and in my testing so far it works fine (as long as ExifTool's -charset is set to the default "utf8").   

You can see my modified version here: https://www.dropbox.com/s/prdk5yyqsb2j40t/ppl-2019-07-01.c?dl=0. I was focusing on the ExifTool use case, not the more general use cases you're supporting.

Starting with simple tests, with plain Perl and a diagnostic script, shows the details and saves a lot of time.

I converted your source file from (unusual?) UTF-16 LE BOM to ASCII and replaced the wide function calls by universal calls so the same sources can be used for Unicode and 8 Bit.

I enabled the perl.exe mode and added some debug statements.

Find sources and diagnostic script attached.

Quote from: johnrellis on July 01, 2019, 05:35:23 PM
- Converting "env" from UTF-16 to the current system code page, so that RunPerl receives it just as it would from main().

"just as it would from main()"? Or just as 8 Bit?

I tried to feed Perl with UTF-8 encoded environment with funny results: The names are taken in UTF-8, but not the values.

Try env vars with Unicode names, Unicode values and both, e.g.
Set Φοο=Bar
Set Foo=Βαρ
Set Βαρ=Φοο

I didn't manage to get perl info.pl e€e to output all parts correctly.

Quote from: johnrellis on July 01, 2019, 05:35:23 PM
- Converting argv [2 - argc-1] from UTF-16 to UTF-8, and converting argv [0 - 1] to the current system code page. Windows Perl always opens scripts with the 8-bit file i/o routines, regardless of whether -Ci is specified, so RunPerl needs to receive the script argument encoded in the current system code page, just as it is passed by main().

In my understanding, -Ci is not intended to resolve this. Besides, you can have valid Unicode file paths not accessible via any 8 bit code page. Since I don't see much value in running software in "c:\users\Влади́мир Παπαδόπουλος" I will not investigate more work here.

Quote from: johnrellis on July 01, 2019, 05:35:23 PM
There wasn't any need to change the names of the file routines, since I compiled with UNICODE defined in the preprocessor. (Visual Studio did that for me -- I don't know how that works precisely with gcc.)
There are universal macros resolving to the short or wide function depending on UNICODE or _UNICODE.

TEXT() or _T() makes wide or short literals.

An uppercase %S format specifier is a non-standard alias to %ls but gcc assumes wide arguments even with lowercase %s.

Oliver

obetz

I forgot to mention an important observation: The unicode exe hangs just before the "return" for several tenths of a second.

The delay doesn't happen if the RunPerl() line is commented. So calling RunPerl() in the Unicode environment has a delayed side-effect, really weird! I will postpone this Unicode experiment until new information is available.

Oliver

P.S.: I updated the (traditional non-Unicode) Strawberry Perl  based ExifTool Windows Installer to version 11.54:
https://oliverbetz.de/ExifTool_install_11.54.exe

johnrellis

"I converted your source file from (unusual?) UTF-16 LE BOM"

That's the default encoding used by Visual Studio.

johnrellis

Re the delay on exit, do you observe that with gcc?  I now see a subsecond delay with Visual Studio.

johnrellis

I installed mingw-w64 and compiled and linked with this:

gcc -DUNICODE -D_UNICODE -DEXPLICITLINKING -municode -m32 -o exiftool.exe ppl.c

and I don't observe any delay. Interesting.

obetz

MinGW also here: "gcc (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 8.1.0"

The delay is caused by the "-O2" optimization switch. The options I used initially:


CFLAGS    := -Wall -Wstrict-prototypes -O2 -s -mms-bitfields -fwrapv -municode -D_UNICODE
# -O2 Optimize even more, performs nearly all supported optimizations that do not involve a space-speed tradeoff
# -Wstrict-prototypes is not needed if a reasonable static code checker is used. Wall might (!) catch problems not detected by PC-Lint
# -s Remove all symbol table and relocation information from the executable, reduces size by more than 50%!
# -mms-bitfields Microsoft compatible bitfields
# -fwrapv instruct the compiler to assume that signed arithmetic overflow wraps, else "i+1 > i" is always true


Without any -O statement, it exits fast. Really weird.

We don't really need the optimization, but as long as I don't know how the delay is possible at all, I don't trust the code.

johnrellis

The delay is caused by incorrectly declaring RunPerl_t with the CALLBACK calling convention (a macro for __stdcall):

typedef int (CALLBACK* RunPerl_t)(int argc, char **argv, char **env);

RunPerl in "perl524.dll" was compiled with the _cdecl calling convention. Deleting CALLBACK (causing __cdecl to assumed) eliminates the delay, with and without O2, in both Visual Studio and mingw-w64 "gcc".

Details

I verified that -O[123] with "gcc" also caused the delay.  I looked at unoptimized and optimized assembly and noticed a difference in how they implemented "return". The unoptimized code pops the wmain() stack frame by subtracting a constant from the frame pointer saved away on entry to  wmain():

leal   -12(%ebp), %esp   

Whereas the optimized code pops the stack from by adding a constant to the stack pointer:

addl   $636, %esp   

This suggested that perhaps some function called by wmain() wasn't properly restoring the stack pointer on return. In unoptimized code, this wouldn't matter, since wmain() would properly restore the stack point from the saved-away frame pointer regardless.  But in optimized code, wmain() would return with an improperly restored stack pointer.

I read up on the calling conventions: In __stdcall, the called function pops the arguments from the stack before returning, while in __cdecl, the caller pops the arguments after the function returns.  So the obvious suspect was RunPerl.

The perl-5.30.0 sources show this declaration for RunPerl in "perllib.c":

EXTERN_C DllExport int
RunPerl(int argc, char **argv, char **env)


The DllExport macro is defined in "win32/win32.h" as "__declspec(dllexport)", which doesn't specify a calling convention.

I didn't track down the compile flags that the distribution would set by default for compiling "perllib.c", but I assumed they wouldn't override the default calling convention (__cdecl).  But using the Visual Studio Disassembly window, I did verify that RunPerl in "perl524.dll" uses the __cdecl convention, not popping its arguments from the stack.

I don't know why this bug caused wmain() to delay on return but not main(). But strange behavior is always to be expected when called functions mangle the stack.

obetz

#55
Quote from: johnrellis on July 05, 2019, 04:40:18 PM
The delay is caused by incorrectly declaring RunPerl_t with the CALLBACK calling convention (a macro for __stdcall):

typedef int (CALLBACK* RunPerl_t)(int argc, char **argv, char **env);

thanks for spotting this stupid copy/paste error. The installer has been updated.

Caution: The "unicode" version is to be considered "experimental".

Oliver

obetz

this is an answer to a post in another thread, since I think that it belongs to this thread:

Quote from: herb on August 10, 2019, 06:31:12 AM
(Windows)Exiftool.exe is also used by other applications (e.g. XnViewMP or IMatch) and the *.exe file is stored in a subdirectory of the application installation directory.
For me an advantage is that all these Exiftool.exe files use the "unpacked Perl modules" in ONE common directory - in %appdata%\... or defined by global environment variable PAR_GLOBAL_TEMP.

At the moment I do not see how this can be achieved using your Ppl(exiftool).exe.
All necessary files stored in directory "exiftool_files" should be stored only once on a Windows system.

First of all: PAR_GLOBAL_TEMP is deprecated and dangerous especially for the scenario you describe. Don't use it.

Without PAR_GLOBAL_TEMP, having the executables in %temp% has also several drawbacks. See https://oliverbetz.de/pages/Artikel/Portable-Perl-Applications In the end, you waste even more space because there is no automatic cleanup of old versions.

IMatch prefers to have it's specific version of ExifTool because there are certain version dependencies and there is a risk breaking IMatch by using a different ExifTool version than shipped with IMatch, so don't fiddle with the embedded ExifTool unless there is a good reason.

Other applications like ExifTool Gui can use an "installed" ExifTool added to the "Path" environment variable.

I didn't yet investigate the ExifTool integration of XnViewMP.

Let me know whether you use both IMatch and XnViewMP or any other set of applications bringing their own ExifTool, then I will check for impact and options.

Any other concern you think I should investigate?

Oliver

herb

Hello Oliver,

thanks for your quick reply and sorry that I asked within another post.
Asking my question I thougth on a small enhancement of (Ppl)Exiftool.exe.

Today it starts Perl.exe that is stored in the subdirectory "exiftool_files" of its installation directory.
I thought about an global environment variable or an *.cfg file (parallel to *.exe) or ... that points to Perl.exe (to be used) - with an absolute path.

In my case it would allow to have only "1 installation" of (Ppl)Exiftool.exe for XnViewMP and my application.
I agree that some more investigations have to be done for IMatch.

Only a thougth
Best regards
Herb

herb

(Again) Hello Oliver,

I have an additional question to default config file .exiftool_config.
Why must this file be stored inside subdirectory exiftool_files using your (Ppl)Exiftool.exe?

It is not used if it is stored inside the current directory parallel to (Ppl)Exiftool.exe.
This makes my goal "to have only 1 installation of exiftool_files" a little bit more complicated.

Best regards
Herb

obetz

Quote from: herb on August 10, 2019, 09:06:35 AM
Asking my question I thougth on a small enhancement of (Ppl)Exiftool.exe.

Today it starts Perl.exe that is stored in the subdirectory "exiftool_files" of its installation directory.
I thought about an global environment variable or an *.cfg file (parallel to *.exe) or ... that points to Perl.exe (to be used) - with an absolute path.

My launcher doesn't call Perl.exe but perl5xx.dll

The larger part of ExifTool are not the dll files but the ExifTool and Perl components being executed.

There are well established mechanisms to run an "installed" version of a program from "anywhere", for example the "Path" environment variable.

If a software author embedding ExifTool decides not to use "Path", it's not exactly my job to undermine his decision.

I could imagine ways to achieve what you want, for example:

  • Check whether we were called via a symlink
  • Use a stub

But is it worth the effort?

Maybe you can symlink the whole exiftool_files folder and duplicate only the small exiftool.exe. That's not beautiful but might do the job.