Unicode file names and how to make them work (Windows)

Started by Mac2, September 11, 2011, 05:02:46 AM

Previous topic - Next topic

Mac2

I often have to deal with folders and file names containing characters from the non-latin character set and which are not covered by the current code page. As we all know, the Perl language on which ExifTool is based has still problems with file names in Unicode, for various reasons. A quick look at the Perl web site indicates that this will not be solved in the foreseeable future. Or at all. And there's not much Phil can do about this :-\

A typical file name I have to handle is:

第142期定時株主総会を終えて.jpg

I can read metadata from such a file by using the Windows API function GetShortPathName() and supplying the resulting short name to ExifTool. The short file name for the above file name is:

142~1.JPG

This also works with folder names containing Unicode characters. The folder names are automatically shortened too. GetShortPathName has some obstacles in some environments (see http://msdn.microsoft.com/en-us/library/aa365247%28v=vs.85%29.aspx#short_vs._long_names) but usually is reliable.

As long as I'm only reading data, all is thus well and ExifTool works fine with these short names.

Problems start when I have to write data to files with Unicode characters in the file and/or folder name. Even when using the supposedly safe short names results are erratic.

I first used -overwrite_original  but this fails. After the write operation, the original file is gone and only the file with the short path name remains.

But switching to -overwrite_original_in_place in combination with GetShortPathName() made it work.
This combination allows ExifTool to write to files with Unicode file names or in folders with Unicode names  :)
I tested this on Windows Vista and Windows 7. No XP test for now but it should work there too.

I wonder if the Perl library exposes the GetShortPathName() function and Phil could use this. We then could have a parameter which tells exiftool.exe to internally apply GetShortPathName() to file names supplied via the command line or in an args file.

Phil Harvey

There are definitely windows-specific functions that I could use in exiftool to open a simple file with unicode characters in the name on Windows, but there is more than just this functionality that I need.  See this thread for more details.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, Phil

I know. I've read this thread and the others about Unicode file names and the troubles Perl causes in this area.
Just thought this may be useful for users who work simple things, without the need to directory traversal or on-the-fly creation.
If you already get garbled file names from Perl functions, you cannot do anything about it.

Anyway, I'm happy that it works at all, from within an application. Was just a bit worried about having to copy 200 MB TIFF files over the network to a "safe" folder and file name just in order to change the rating  ::)


Phil Harvey

One thing I think I should mention:  ExifTool doesn't care about the file name when you edit a file.

While this doesn't seem very important, it allows a functionality which may useful to you.  As well as acting on files, exiftool also works on pipes.  For example, the following works perfectly in Mac/Linux and I think it should also work reliably in Windows (provided the Windows "type" command doesn't try to modify the file somehow):

type 第142期定時株主総会を終えて.jpg | exiftool -rating=5 - > 第142期定時株主総会を終えて_modified.jpg

Then, as long as the Unicode file names are supported by the command shell (hopefully, but I wouldn't put it past Microsoft to drop the ball on this one too), this should allow you to use exiftool to edit a file with any name.  But if done from within your application, you could even read/write the file yourself and be guaranteed that it does what you want.  Then all you do is pipe the file to stdin of the exiftool command you execute, then retrieve stdout as the edited file from the exiftool command that you execute.

- Phil

BTW. The nickname "Mac2" is counter-intuitive since we're talking about Windows here. ;)
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Yes, I have pondered using streams to. But as always, there is never enough time. Metadata is only about 10% of my application, but very important 10%.

Since it works, I don't change it now. But I'll look into streaming when I have some extra time to spend on experiments. Just in case something creeps up.

QuoteBTW. The nickname "Mac2" is counter-intuitive since we're talking about Windows here.

Ah, that nik was given to me a loooong time ago by a Scottish colleague. ;)
Somewhen around the Apple ][ was hip.