Exiftool + Windows + UTF8. Again.

Started by ribtoks, March 23, 2016, 09:16:49 AM

Previous topic - Next topic

ribtoks

Hi

I have a directory (it's name contains unicode + cyrrylic symbols) with several files with latin1-only names.
Exiftool version is 10.13 (latest today). Windows 10 x64.

Exiftool is being launched via cmd.exe using arguments file encoded in UTF8: http://pastebin.com/fAW18YGN
command to start exiftool is exiftool -charset FileName=UTF8 -@ path_to_the_args_file

when launched, exiftool is able to read 5/6 files. 1 file is always not read. Usually it's file #3, but sometimes it's #1 or #2. Sounds like a race condition.

When same directory is renamed to latin1 characters only, everything (6/6 files) is read successfuly.

What else info can I provide for you with in order to resolve this asap?

Phil Harvey

#1
Thanks.  I'm not sure where to start on this, but maybe simply adding a re-try if a file can't be opened.  Unfortunately I'm going away today for a long weekend, but I'll take a look at this as soon as I can when I get back.

- Phil

Edit: If you want to try to debug this yourself, you can install ActivePerl and download the full ExifTool distribution, then play with the source code (use "perl exiftool ..." to run it via ActivePerl).   Search for "sub Open" in lib/Image/ExifTool.pm for the function that opens the files.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 23, 2016, 09:45:30 AM
Edit: If you want to try to debug this yourself, you can install ActivePerl and download the full ExifTool distribution, then play with the source code (use "perl exiftool ..." to run it via ActivePerl).   Search for "sub Open" in lib/Image/ExifTool.pm for the function that opens the files.

Thank you for quick response.

I can try to debug it in OS X or Linux as in Windows I only test if my software works.
When talking about debugging do you mean real debugging (anything for Perl?) or just debug prints?

Phil Harvey

#3
I am sure this problem won't occur in OS X or Linux because UTF-8 characters in filenames aren't handled differently on these platforms.  For debugging in Windows, I would just add print statements to the Open() function to 1. check to be sure the file names are passed correctly (use HexDump(\$file) to dump the file name in hexadecimal -- this may be useful for Unicode values), and 2. check the error return codes from the file i/o functions.  If you find out which function is failing, try putting it in a loop and retrying a few times, perhaps with a short delay (eg. select(undef,undef,undef,0.01) will delay for 0.01 seconds).

This is the ExifTool Open() function to look at, and here is the documentation for the Windows32API::File functions that it uses for opening files with Unicode names in Windows.

You may have to install Win32API::File if it isn't part of the standard ActivePerl installation.  ExifTool will give you a warning if it isn't installed when needed.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 23, 2016, 10:34:38 AM
You may have to install Win32API::File if it isn't part of the standard ActivePerl installation.  ExifTool will give you a warning if it isn't installed when needed.
- Phil

Ok, thank you. Will try it.

ribtoks

Quote from: Phil Harvey on March 23, 2016, 10:34:38 AM
For debugging in Windows
- Phil

I can't make it working in Windows. Perl script does not handle Unicode paths.. Is there anything special I should install except Win32::File?

bin\Image-ExifTool-10.13\exiftool.pl -charset FileName=UTF8 -@ arg_exiftool -> says all files not found

Phil Harvey

It will say this if Win32API::File isn't available, but then you should also get a warning or error message if you check the -warning and -error tags.

I'm just about to head out for the weekend.  I will be able to check the forum occasionally, but won't have access to a Windows machine or have much time to spend on this until Tuesday.

Another possibility is to debug using the temporary files extracted from the exiftool.exe package, since this bundle contains all of the necessary libraries.  The problem is that I think the package manager may rename them and possibly duplicate some files in the temp directory.  But the exe version is essentially running the Perl version out of the temp directory.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Hi Phil

Unfortunately I didn't manage to make it working with Perl and unicode and as a result - didn't manage to debug it. I tried changing codepages, launching it via different terminal (git bash), nothing helped.
Just a kindly reminder when you will have time to take a look into the bug.

Thank you

Phil Harvey

Thanks for the friendly reminder.

After spending most of my time trying to figure out how to create a directory named "лахемаа" in Windows, I finally was able to test this out.

I used the exact same command you used, and my argfile was identical except for the file names, which were:

tmp/лахемаа/a.jpg
tmp/лахемаа/b.jpg
tmp/лахемаа/c.jpg
tmp/лахемаа/d.jpg
tmp/лахемаа/e.jpg
tmp/лахемаа/f.jpg
tmp/лахемаа/g.jpg
tmp/лахемаа/h.jpg
tmp/лахемаа/i.jpg
tmp/лахемаа/j.jpg


I was using cmd.exe and Windows XP for my test.

I ran both the perl version of ExifTool and the Windows EXE version many times, but it always read all 10 files without any problems.

Are you running from a cmd.exe window?  If not, do you get this problem when using cmd.exe to run exiftool?

Do the file names matter?  Does it still give you problems if you use short names (a-j.jpg as I did)?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 31, 2016, 10:26:57 AM

I was using cmd.exe and Windows XP for my test.

I ran both the perl version of ExifTool and the Windows EXE version many times, but it always read all 10 files without any problems.

Are you running from a cmd.exe window?  If not, do you get this problem when using cmd.exe to run exiftool?

Do the file names matter?  Does it still give you problems if you use short names (a-j.jpg as I did)?

- Phil

I'm running exiftool from cmd of course. I used Windows 10 x64. Do you have ability to check it under Windows 7 or 8.1? A lot of APIs has changed since Windows XP and  even there are many (but not too many) users on Windows XP, testing there is useless because of a number of reasons (abandoning it by MS, photo software being developed for win 7 and higher etc.)

Phil Harvey

Unfortunately my access to Windows systems is very limited.  Perhaps one option would be to figure out how to run a Windows 10 emulator on one of my OS X machines, but that would be a lot of work to set up.

I would be disturbed if it worked on XP but showed this sort of inconsistent behaviour on Windows 10.  However, other people have seen different (and inconsistent) problems trying to ExifTool on Windows 10, so maybe this is the case here too.

I just had another thought:  Does the problem go away if you disable your antivirus software?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 31, 2016, 10:56:40 AM

I just had another thought:  Does the problem go away if you disable your antivirus software?

- Phil

I don't use antivirus software )

Phil Harvey

OK.  Do you have access to a Windows 7 or 8 system?  It would be useful to know if this problem is isolated to Windows 10.

I hate to suggest it, and I don't understand how this could help, but another user found that his ExifTool/Windows 10 problems went away when he did a clean install of Windows 10 (as opposed to an upgrade over an older system).  So do you have a clean Windows 10 system that you could test with?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 31, 2016, 11:02:05 AM
OK.  Do you have access to a Windows 7 or 8 system?  It would be useful to know if this problem is isolated to Windows 10.

I hate to suggest it, and I don't understand how this could help, but another user found that his ExifTool/Windows 10 problems went away when he did a clean install of Windows 10 (as opposed to an upgrade over an older system).  So do you have a clean Windows 10 system that you could test with?

- Phil

I have Win 7 at home as a virtual machine both x64 and x86. Unfortunately I don't know whether my win10 install was clean or not. It's not my own PC - I got it with win10. Also it's possible to try on win 8.1.

It is so weird for perl not working with utf-8 encoded file. If it would be working - I could possibly detect the problem myself.

Phil Harvey

Great.  Any help you can provide with testing on other Windows systems would be useful.  Also try the short file names in case that makes a difference.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).