Windows filesystem sync issues?

Started by ribtoks, March 16, 2016, 11:03:04 AM

Previous topic - Next topic

ribtoks

I've run into interesting problems using Exiftool 10.10+, Qt 5.5.1 and Windows 10.

I'm creating temporary file with QTemporaryFile, write there some data, save it, wait untill it's flushed using `FlushFileBuffers()` and afterwards pass this file as arguments file for `exiftool`. Data which I'm writing there is UTF-8 encoded paths to images:

    if (argumentsFile.open()) {
        QStringList exiftoolArguments = createArgumentsList();
        foreach (const QString &line, exiftoolArguments) {
            argumentsFile.write(line.toUtf8());
            argumentsFile.write("\r\n");
        }
    }

    // fsync stuff here...
    // starting exiftool with -@ argumentsFile.fileName() parameter here
    // also with -charset filename=UTF8

So the problem is the following: when filenames does not contain Unicode symbols, Exiftool reads images, imports Exif metadata and everything is fine.

But when filenames contain Unicode symbols, sometimes Exiftool does not catch up them, unless I will insert `QThread::sleep(msec)` call which will make current thread to switch context and possible give ability to sync buffers for other threads (writing to harddrive).

Exiftool run from cmd line with same file always reads metadata, unless started with QProcess with the way explained before. What can be the issue?

Phil Harvey

I don't know the solution, but the difference is that ExifTool opens the file using the Windows file i/o libraries instead of the standard C file i/o if the filename contains any Unicode characters.  My guess is that the buffering of the Windows file i/o is different.  Are there any Windows i/o calls that you can use to sync this?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 16, 2016, 02:19:05 PMAre there any Windows i/o calls that you can use to sync this?

- Phil

Yes, I was calling FlushFileBuffers https://msdn.microsoft.com/en-us/library/windows/desktop/aa364439%28v=vs.85%29.aspx

#ifdef Q_OS_WIN
            HANDLE fileHandle = (HANDLE)_get_osfhandle(argumentsFile.handle());
            bool flushResult = FlushFileBuffers(fileHandle);
            LOG_DEBUG << "Windows flush result:" << flushResult;
#else


FlushFileBuffers returned true, but that didn't help much. Exiftool still read input file (which possibly was out of sync) with random success.

Phil Harvey

I don't know if I can help here.  But can't you avoid this problem since you are creating the temporary file yourself? -- just use a name that doesn't contain Unicode characters.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 16, 2016, 03:47:27 PM
just use a name that doesn't contain Unicode characters.

- Phil

No-no-no, filename of course does not contain Unicode. It's close to 12345.tmp as under Linux. The Contents of the file contains Unicode paths to images which I fed to exiftool through -charset filename=UTF8 -@ tmp_argfile

Phil Harvey

Ah.  I wasn't paying close enough attention.  It must be reading your temporary argument file properly if regular file names work as arguments in that file.  And it sounds like it works when you read the same argument file from exiftool run at the command line, right?  Are you sure you are running the same version of ExifTool from QProcess as from the command line?

I don't see what the difference could be.  Did you look at the stderr output to see what messages exiftool was giving?  Presumably it just can't open the file?  You say that sometimes it doesn't work.  It is always with the same files?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 16, 2016, 04:10:20 PM
I don't see what the difference could be.  Did you look at the stderr output to see what messages exiftool was giving?  Presumably it just can't open the file?  You say that sometimes it doesn't work.  It is always with the same files?

- Phil

I only open 1 file which has several non-latin characters, say "жжжж" and then unicode character (e.g. letter A with accent - Á).
so the name is жжжжÁ.jpg

It's problem on Windows. Exiftool is not in PATH, just in the directory with .exe which spawns it later so I'm sure it's the same exiftool. STDERR says: "Can't open file C:/Blah-blah-blah/??????Á.jpg".

If I create a file (not temporary) with arguments and run the same way I do it programmatically from the command line, Exiftool does not exit with error.

If I paste sleep(X) after flushing to temporary file used as an -@ file, Exiftool does not exit with error (that depends on X of course).

It only fails if starts reading immediately after I close temporary file (which does not make it to be removed, don't worry).

Phil Harvey

Yes, I think I understand the symptoms, but I find it hard to believe that ExifTool would read different data from a file (only for non-ASCII characters!), depending on how long it has been closed.

But I just had a thought.  If, somehow, Windows doesn't have enough file descriptors to open another file, then the Windows open could fail until a descriptor is freed.  That doesn't make much sense though.  However, if there is some common resource that isn't available, it could cause a problem like this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 16, 2016, 08:44:12 PM
Yes, I think I understand the symptoms, but I find it hard to believe that ExifTool would read different data from a file (only for non-ASCII characters!), depending on how long it has been closed.

But I just had a thought.  If, somehow, Windows doesn't have enough file descriptors to open another file, then the Windows open could fail until a descriptor is freed.  That doesn't make much sense though.  However, if there is some common resource that isn't available, it could cause a problem like this.

- Phil

I thinkg running out of descriptors is not the case. I'm not blaming exiftool of reading wrong characters - more Windows API in non-consistent behavior. But still error persists and I need somehow to tackle it. If you will have any ideas - drop me a message

ribtoks

#9
Quote from: Phil Harvey on March 16, 2016, 08:44:12 PM
Yes, I think I understand the symptoms, but I find it hard to believe that ExifTool would read different data from a file (only for non-ASCII characters!), depending on how long it has been closed.
- Phil

Hi Phil

I have new data which might me interesting for you. Now I think the bug/feature is in Exiftool related to some environment issues.

Scenario:
- arguments file exists in utf-8 saved with notepad++
- exiftool is being launched from cmdline (cmd.exe) exiftool -charset FileName=UTF8 -@ C:/path/to/arg_exiftool. Everything is fine, Exiftool read data correctly.
- exiftool is being launched from QProcess with the same arguments. What happens:
Under one session of my application under 1st launch exiftool fails to read the file with error FileNotFound (encoding stuff), BUT in the same session after I try to import same file again, it's being read correctly! Exiftool reads correct encoding.

So I suppose in the first launch Exiftool sets some inner environment regarding unicode and I would like to know what is it? Can it be fixed on my side with some prerunning?

Phil Harvey

Could you post an example of an argfile that exhibits this behaviour?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 17, 2016, 07:12:34 AM
Could you post an example of an argfile that exhibits this behaviour?

- Phil

http://pastebin.com/NjU8m1iR - File contents
https://dl.dropboxusercontent.com/u/14391423/%D1%8E%D0%BD%D1%96%D0%BA%D0%BE%D0%B4%C3%81.jpg - Image
https://dl.dropboxusercontent.com/u/14391423/arg_exiftool - File

Again.. works OK if launched from 'cmd'..

Phil Harvey

Your argfile doesn't specify a -charset filename ?

- Phil

Edit:  Ah, I see.  It is run with:

exiftool -charset FileName=UTF8 -@ C:/path/to/arg_exiftool
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ribtoks

Quote from: Phil Harvey on March 17, 2016, 07:29:36 AM
Your argfile doesn't specify a -charset filename ?

If specified inside argfile it gives error There's no such TAG "-charset".

Phil Harvey

Quote from: ribtoks on March 17, 2016, 07:32:50 AM
If specified inside argfile it gives error There's no such TAG "-charset".

You need to put it on 2 lines:

-charset
filename=UTF8


And make sure there are no extra spaces on each line.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).