Exiftool and Umlaut file names

Started by Mac2, August 18, 2013, 10:03:52 AM

Previous topic - Next topic

Mac2

I apologize for the long post. But this drives me nuts, really.  :o

I've read all related posts and the FAQ and everything but I cannot get ExifTool to work with umlauts or other file names not using plain ANSI/ANSI characters. I have to deal with Chinese, Russian, Japanese and other file names.

I use exiftool.exe in my Windows application.
I spawn it via the CreateProcess API function and the following command line:

c:\exiftool\exiftool.exe -config c:\exiftool\myconfig.CON -stay_open True -@ -

I redirect the standard handles into my application and communicate with ExifTool using these handles. When I want ExifTool to do something, I setup an ARGS file in memory and send it to ExifTool via stdin. ExifTool returns the result via the stdout. All this works well and is quite efficient.

But ass soon as I try to process a file name like

c:\images\glück.jpg

ExifTool fails :'( and returns:

File 'c:\images\glück.jpg' does not exist for -tagsFromFile option.

Since ExifTool has the correct file name in the error message I think that the transfer of the file name to ExifTool via stdin and the ARGS file works. But that the file system functions in Perl somehow cannot handle the file name?!

So I tried the same file on the command line. I opened a command prompt (cmd.exe) and wrote the ARGS file from hand using Notepad. I then ran this ARGS file via exiftool -@ test.args and the file processed without a problem. Doh!

I figured that this was maybe a code page problem or something. Via chcp I checked that the command line window I used for my test used the standard Windows code page 850. It also works when I switch to UTF via chcp 65001, just for testing.

What's different when I run ExifTool.exe from my program then? My program is a GUI application and thus has no console attached. But when I use CreateProcess Windows creates a new console in which exiftool.exe is then started.

I used AllocConsole, GetConsoleProcessList and GetConsoleCP to determine a) which processes are created when I do a CreateProcess(exiftool.exe) and if these processes have consoles and which code pages these consoles use.

I noticed that a single CreateProcess("exiftool.exe" ...) with the arguments as shown above spawns two exiftool.exe processes. I can see them in TaskManager and GetConsoleProcessList also lists these two exiftool.exe processes. No idea why. But for each process GetConsoleCP returns code page 850, which is the same code page used for my command line experiment. This should allow ExifTool to process file names with German Umlauts just fine.

This is as far as I can look into this myself. I don't see any further diagnostic tools in exiftool.exe to dig deeper into this. My questions:

a) If ExifTool receives the correct file name (and can return it in the "does not exist" error message), why does it fail to access the file?
b) Why are two exiftool.exe processes created? Maybe this is caused by -stay_open ?


PS.: I know about the possibility to rename or copy files before sending them to ExifTool. But I often have to deal with read-only files, or very big files of 50 MB to 1 GB. Renaming not always works, and copying a 1 GB file over the network to set a XMP rating is just not possible.

I'm also aware of the GetShortPathName() function in Windows which makes file names with non ASCII characters digestible for ExifTool. But this function fails for file names shorter than 8.3 on removable media and network shares (it just returns the original name, with the Umlauts).

Using this function also prevents the use of -overwrite_original because ExifTool then renames the file to the short file name and the original file name is lost. The alternative -overwrite_original_in_place can be used, but makes processing of large files several times times slower. Setting the rating in a 500 MB TIFF file takes 10 seconds with overwrite_original and 35 seconds with overwrite_original_in_place. On network files this is 5 to 10 times slower even.

If somebody has an idea I would be really grateful.

Phil Harvey

This is a known problem.  There are work-arounds as you mentioned using the equivalent 8.3 filenames, but they aren't a complete solution, plus I think these are dependent on your system settings.  Frankly, I don't understand Windows handling of special characters in file names, and sometimes something that works for one person won't work for someone else.  I suspect this is similar to the differences you are seeing when running ExifTool from the command line vs. from your program.  I haven't yet found a complete solution at the ExifTool end.  Here are a list of threads I have collected on this subject:

  --> https://exiftool.org/forum/index.php/topic,3155.0.html
  --> https://exiftool.org/forum/index.php/topic,2394.0.html
  --> https://exiftool.org/forum/index.php/topic,3224.0.html
  --> https://exiftool.org/forum/index.php/topic,4029.0.html
  --> https://exiftool.org/forum/index.php/topic,3565.0.html
  --> https://exiftool.org/forum/index.php/topic,4649.0.html
  --> https://exiftool.org/forum/index.php/topic,2677.0.html
  --> https://exiftool.org/forum/index.php/topic,5193.0.html
  --> http://stackoverflow.com/questions/4232397/perl-managing-path-encoding-on-windows

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Thanks, Phil

I've already read (most) of these topics and will read the two I have not yet read.

Part problem of Perls missing support for Unicode or UTF-8 (?) file names (after, what, 15 years or so). This creates all kinds of problems once you leave the ASCII zone. Like most IT folks I never have a problem with this because I normally don't use file names with special characters, not even Umlauts (I'm German). But normal users use all kinds of funky file names in Windows - because they can.

Usually you have the situation were a German/Russian/Hungarian/... user just uses file names with characters from his local code page. This usually does not create file names which require UNICODE encoding, just a proper handling of the file names in the whatever code page is set for the user.

The command prompt window created by Windows uses the default system code page. So the file names almost always work there. But fail when exiftool.exe is used from within a program. When I do a CreateProcess on exiftool.exe it will create its own environment plus a console because it is a command line application. I yesterday checked that this console uses the proper code page. It does, so the file names should work there exactly the same. Yet it fails. ExifTool reports that the file cannot be found. Bugger.

Do you have any additional info about the wrapper you use to convert your library into a standalone executable? Which product is this, maybe theres a hint about a) code page usage and b) why it spawns an additional exiftool.exe process for each process I create.

Phil Harvey

The Windows .exe is packaged using PAR.

I share your frustration, and am sorry I can't be of more help.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

I've did some reading and apparently current Perl versions support UTF-8 for file names?
Since ExifTool itself handles all files names correctly, I wonder if there is a way to UTF-8 encode them before handing them over to Perl?

Phil Harvey

Yes, UTF-8 file names work great with ExifTool on Mac and Linux.  Not Windows though. :(

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

FixEUser

#6
BE WARNED: THIS MAY NOT BE WORKING FOR ALL YOUR FILENAMES!
(In the meantime I've found some cases, where this workaround still changes the orignal filename! Especially, when using exiftool with a directory instead of a single file)


Maybe, I have found a workaround for this problem.

Everytime I try to strip all the tags inside an image having a name with "strange" characters, like
Фактор страха (Театр военных действий, Акт 1).jpg
I get this error: "File not found".

But If I use this syntax inside a command line batch file:"C:\Program Files (x86)\Tools\Grafik\exiftool\exiftool.exe" -all= %~sf1 -overwrite_original
REN "%~sf1" "%~n1%~x1"
it seems to work!
Quote%~s1 "Expanded path contains short names only"
%~f1 "Expands %1 to a fully qualified path name"
%~n1 "Expands %1 to a file name"
%~x1 "Expands %1 to a file extension"
Every passed file name (drag & drop or commandline) will be automatically converted in a 8.3 syntax. This allows ExifTool to do the above -all= stripping command.
For the rare cases, where the filename will be switched to its 8.3 syntax, the following REN command will change it back to its original name.

Maybe this can help other windows user too?

Hayo Baan

With the latest versions of exiftool your workaround should not be necessary any longer as it should now  support special characters under windows too.
Hayo Baan – Photography
Web: www.hayobaan.nl

FixEUser

With the version 9.89 I still get this error:
QuoteFileName encoding not specified.  Use "-charset FileName=CHARSET"
No matching files
With the above use of the short file name and then rename it back to his full qualified name, it seems to work in almost all cases.
Another filename example: Counting Crows (Films About Ghostsː The Best Of...).jpg

Hayo Baan

Have you tried specifying a filename encoding already? You need to tell exiftool what encoding to use (to trigger wide character use on windows) E.g., exiftool -charset filename=Cyrillic or exiftool -charset filename=Latin (what encoding to use depends on how you have set up your command-line under windows).

Here's part of the relevant documentation:     -charset [[TYPE=]CHARSET]
         If TYPE is "ExifTool" or not specified, this option sets the ExifTool
         character encoding for output tag values when reading and input
         values when writing. The default ExifTool encoding is "UTF8". If no
         CHARSET is given, a list of available character sets is returned.
         Valid CHARSET values are:

             CHARSET     Alias(es)        Description
             ----------  ---------------  ----------------------------------
             UTF8        cp65001, UTF-8   UTF-8 characters (default)
             Latin       cp1252, Latin1   Windows Latin1 (West European)
             Latin2      cp1250           Windows Latin2 (Central European)
             Cyrillic    cp1251, Russian  Windows Cyrillic
             Greek       cp1253           Windows Greek
             Turkish     cp1254           Windows Turkish
             Hebrew      cp1255           Windows Hebrew
             Arabic      cp1256           Windows Arabic
             Baltic      cp1257           Windows Baltic
             Vietnam     cp1258           Windows Vietnamese
             Thai        cp874            Windows Thai
             MacRoman    cp10000, Roman   Macintosh Roman
             MacLatin2   cp10029          Macintosh Latin2 (Central Europe)
             MacCyrillic cp10007          Macintosh Cyrillic
             MacGreek    cp10006          Macintosh Greek
             MacTurkish  cp10081          Macintosh Turkish
             MacRomanian cp10010          Macintosh Romanian
             MacIceland  cp10079          Macintosh Icelandic
             MacCroatian cp10082          Macintosh Croatian

         TYPE may be "FileName" to specify the encoding of file names on the
         command line (ie. FILE arguments). In Windows, this triggers use of
         wide-character i/o routines, thus providing support for Unicode file
         names. See the "WINDOWS UNICODE FILE NAMES" section below for
         details.

and
WINDOWS UNICODE FILE NAMES
    In Windows, by default, file and directory names are specified on the
    command line (or in arg files) using the system code page, which varies
    with the system settings. Unfortunately, these code pages are not complete
    character sets, so not all file names may be represented.

    ExifTool 9.79 and later allow the file name encoding to be specified with
    "-charset filename=CHARSET", where "CHARSET" is the name of a valid
    ExifTool character set, preferably "UTF8" (see the -charset option for a
    complete list). Setting this triggers the use of Windows wide-character
    i/o routines, thus providing support for all Unicode file names. But note
    that it is not trivial to pass properly encoded file names on the Windows
    command line (see <https://exiftool.org/faq.html#Q18>
    for details), so placing them in a UTF-8 encoded -@ argfile is recommended
    if possible.

    When a directory name is provided, the file name encoding need not be
    specified (unless the directory name contains special characters), and
    ExifTool will automatically use wide-character routines to scan the
    directory.

    The filename character set applies to the FILE arguments as well as
    filename arguments of -@, -geotag, -o, -p, -srcfile, -tagsFromFile, -csv=,
    -j= and -TAG<=. However, it does not apply to the -config filename, which
    always uses the system character set. The "-charset filename=" option must
    come before the -@ option to be effective, but the order doesn't matter
    with respect to other options.

    Notes:

    1) FileName and Directory tag values still use the same encoding as other
    tag values, and are converted to/from the filename character set when
    writing/reading if specified.

    2) Unicode support is not yet implemented for other Windows-based systems
    like Cygwin.


Hope this helps,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

Newsky

If the process images in the folder with the Cyrillic name, has keywords in Cyrillic, is a space in the file name, in this situation, in any case, any operation must be specified -charset Cyrillic?

Or, depending on the situation? For example:

Cyrillic in folder name: -charset filename=Cyrillic
Cyrillic in key words: -charset IPTC=Cyrillic
a space in the file name: -charset filename=Cyrillic
Sorry for my english. I use Google translator

Phil Harvey

-charset filename=cyrillic is necessary if you pass file names to ExifTool with special characters in Cyrillic encoding.

-charset IPTC=cyrillic is necessary to read/write special characters in Cyrillic-encoded IPTC.

You don't need to use -charset just if there is a space in the file name.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).