Args files, UTF8, file names and character sets

Started by Mac2, November 28, 2013, 03:03:08 PM

Previous topic - Next topic

Mac2

I'm running Windows with German console settings (code page is 850).

I can manually run an exiftool command like this:

exiftool -iptc:all über.jpg

and the German Umlaut ü in the file name is correctly handled by ExifTool/Perl.

From my application, I communicate with ExifTool using -keep_open and by streaming ARG files to the ExifTool process. I encode these ARG files in UTF8 in order to be able to handle metadata in all languages. All this works well, with Czech, Russian and metadata in other languages. Thanks to UTF8, ExifTool can handle the parameters and produces proper metadata in IPTC and XMP.

The only problem are file names in this context.

I create a ARG file in Notepad:

-iptc:all
über.jpg


and save it as ANSI. I can run it from the command line

exiftool -@ test.args

without problems and ExifTool returns the IPTC data.

But when I save the file in UTF8 encoding, ExifTool no longer finds the file! The output is:

V:\ExifTool>exiftool -@ 1.args
File not found: ├╝ber.jpg


ExifTool takes the UTF8-encoded file name 'as-is' and sends it to Perl. Which obviously does not handle this (on Windows). Although the file name in non-UTF8 form could be processed!

Question: Would it be possible to add a parameter which tells ExifTool to convert file names from UTF-8 back into the current code page before sending the file names to Perl? This way we could use ARG files in UTF8 but work with file names in the system code page.

The "short 8.3 file name" trick I use to prepare file names for ExifTool in non-ASCII environment works less and less reliably, because a) Windows 8 has support for 8.3 file name functions disabled by default (me thinks) and the short path name functions do not work reliably on removable media, networks and file names shorter than 8.3.

I really hope that Perl on Windows handles UTF8 file names or UNICODE file names real soon, else we'll run into real problems processing files from users who use non ASCII characters in their file names.

Phil Harvey

Quote from: Mac2 on November 28, 2013, 03:03:08 PM
Question: Would it be possible to add a parameter which tells ExifTool to convert file names from UTF-8 back into the current code page before sending the file names to Perl?

Why not just write the argfile in mixed encoding?  This would solve your problem without requiring any new exiftool options.  ExifTool makes no assumptions about the encoding of the argfile.  It can, in fact, even contain binary data (although newlines must be escaped), which will come in very handy if you ever want to write a binary value to a tag.

QuoteI really hope that Perl on Windows handles UTF8 file names or UNICODE file names real soon, else we'll run into real problems processing files from users who use non ASCII characters in their file names.

Unfortunately, I think the chances of this are near zero.  More likely that hell will freeze over, or that I'll finally apply a Windows-specific patch to ExifTool to deal with this problem.

- Phil

P.S.  This is too late for your game, and I don't know what language you are programming in, but I'm currently working on a C++ interface for ExifTool, just in case you might be interested.  I haven't yet tested it in Windows or released it officially.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Quote from: Phil Harvey on November 28, 2013, 03:36:56 PM
Why not just write the argfile in mixed encoding?
Interesting idea. I usually produce rather large argfiles, processing 20 to 50 files in batch, with calls to external xmp2... and ...2xmp files. A lot of file names, hundreds of lines or arguments. So far I prepare everything in memory using Windows Unicode. Finally I convert the entire argfile in memory into UTF8 and send it to the ExifTool process. I will need to split all that up to encode each "line" separately, so the file names can remain ANSI and the rest will be UTF8. Quite a bit of work, but not complicated. I will set aside some time to try this.

Quote from: Phil Harvey on November 28, 2013, 03:36:56 PM
Unfortunately, I think the chances of this are near zero.  More likely that hell will freeze over, or that I'll finally apply a Windows-specific patch to ExifTool to deal with this problem.
I know you don't want to make ExifTool platform-dependent. Which is a good thing. I don't want ExifTool to change. I would like to see that when you hand over the UTF-8 file names I send to ExifTool to Perl unmodified, that Perl knows how to make them work with Posix/Windows. Windows supports Unicode for 20 years and Perl should be able to handle that transparently.

Quote from: Phil Harvey on November 28, 2013, 03:36:56 PM
P.S.  This is too late for your game, and I don't know what language you are programming in, but I'm currently working on a C++ interface for ExifTool, just in case you might be interested.
Sounds awesome!  :)
I can use C/C++ directly from all languages I use. And I would have to change only a small part of my code to switch from juggling external ExifTool processes to direct API calls. Should solve a lot of headaches immediately. Maybe add a few new ones as well  ;)

Let me know if you need support with testing or something. I'll be happy to give some time for this port.

Phil Harvey

Quote from: Mac2 on November 29, 2013, 02:23:24 AM
Sounds awesome!  :)
I can use C/C++ directly from all languages I use. And I would have to change only a small part of my code to switch from juggling external ExifTool processes to direct API calls. Should solve a lot of headaches immediately. Maybe add a few new ones as well  ;)

Let me know if you need support with testing or something. I'll be happy to give some time for this port.

You may just be a little too excited about this.  It isn't a port.  It is an interface that handles running and communicating with an external exiftool process, probably just like you are doing already.  All it would have done is to save you some work.  But if your offer to help with testing stands, let me know.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

I had somehow hoped for a wrapper which wraps ExifTool in A DLL or assembly or COM component  :) Silly me.
But a standard 'official' wrapper would be a good thing, especially for people starting using ExifTool in their apps or components. My wrapper is specific to Windows, but I assume your's would be cross-platform which would allow us to run ExifTool also on tablets or even mobile phones. Good thing, may need that some day  :)

Please send me an email if you have something I can run some tests against.