-@ filename and -common_args

Started by kornelix, December 17, 2011, 03:19:23 PM

Previous topic - Next topic

kornelix

I got -stay_open working after some failed attempts. I wanted to ask about performance and report a bug.

First the bug:
I develop with Linux.
I am running exiftool using popen() to start the exiftool process and receive its outputs.
The popen() command is: exiftool  -stay_open True  -@ inputfile
To the input file I am writing:
-s2
-m
-keyword1
-keyword2
...
filename.jpg
-execute

This is working OK. I can read back the keyword values and {ready} as expected.
I tried adding -common_args -s2 -m before the -stay_open and omitted these from the input file.
The result: exiftool went into a loop processing the same file over and over.

Now the performance question:
In the past, in order to get better performance in the special case of extracting the same keys for many files,
I used a command like this:
exiftool -keyword1 -keyword2 ... file1 file2 file3  ... file20
This allowed me to get the data for 20 files with one startup overhead.
This approach actually performs almost 2x faster than using -stay_open and -@ inputfile, which is surprising.
(by 2x I mean 0.010 seconds per file compared to 0.019, roughly).
(startup overhead is about 0.1 seconds)

Overall, I am very grateful for this powerful tool.

Phil Harvey

#1
Quote from: kornelix on December 17, 2011, 03:19:23 PM
I tried adding -common_args -s2 -m before the -stay_open and omitted these from the input file.
The result: exiftool went into a loop processing the same file over and over.

Yes, it will. :)

As you may have discovered, the -common-args must come after the -@ or else you run the risk of recursively processing your argfile.  (I don't want to think about what you actually did, because it makes my brain hurt.)  If you wish, we can have a debate about whether this was exiftool's bug, or yours. ;)

QuoteNow the performance question:
In the past, in order to get better performance in the special case of extracting the same keys for many files,
I used a command like this:
exiftool -keyword1 -keyword2 ... file1 file2 file3  ... file20
This allowed me to get the data for 20 files with one startup overhead.
This approach actually performs almost 2x faster than using -stay_open and -@ inputfile, which is surprising.
(by 2x I mean 0.010 seconds per file compared to 0.019, roughly).
(startup overhead is about 0.1 seconds).

I think you mean 0.01 seconds.  This makes sense because exiftool waits for 0.01 seconds after hitting the end of file when reading from an input argfile.  The delay is used to avoid chewing up 100% of your CPU bandwidth in a tight read loop.  (If there is another technique to do this I would like to know.)

However, there are 2 possible ways around this:

1) I'm not clear on the details (maybe you know more about this), but by experience I know that from within exiftool, the sysread() call will return 0 when it hits the end of a file (this is where the 0.01 second delay is used), but blocks until data is available when reading from a piped tty device (ie. the console).  I'm not 100% sure, but I think that you may be able to get the advantage of a blocking sysread() if you use "-@ -" and pipe the arguments to exiftool instead of using file.  I know this works from the console, but I'm not sure if this works when piped internally from another program.

2) You can write the arguments for the next command before the first command is done processing.  If you do this, there will be no delay when exiftool tries to read the argfile immediately after the first command completes.  Interleaving commands like this, however, may be more difficult for you.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

#2
I have run some tests, and technique 1) above does work.  Also, I have added a bit of code to exiftool which will allow a 3rd technique to be used with version 8.74 or later:

3) Send a "CONT" signal to the exiftool process after writing to the -stay_open ARGFILE to wake up exiftool and avoid the possible 0.01 second delay.

Below is the pair of Perl scripts that I used to test this out for both file and piped arguments.  Note that both scripts have a extra 0.001 second delay before sending the arguments to exiftool.  A normal application would not do this.  I am using this delay because without it my script is too fast and the next command has already been sent by the time that exiftool tries to read it (but in this case I can't test to see if exiftool's read would block or not).

Both scripts run at nearly the same speed.  After the time for the unnecessary delay is subtracted, the time to execute the 1000 commands is 1.21 seconds when reading from file, and 1.20 seconds when reading from pipe.  (The test was run on my 1.3 GHz iMac, using "t/images/Writer.jpg" from the ExifTool distribution as the test image.)

#!/usr/bin/perl -w
#
# Sample perl script to test -stay_open and reading from file
#
use strict;
use IO::File;

# make sure our ARGFILE exists, and is empty
open ARGFILE, ">a.args";
close ARGFILE;
open ARGFILE, ">a.args";

# enable autoflushing after each line written to ARGFILE
autoflush ARGFILE 1;

# start the exiftool process and pipe output to OUT
my $pid = open OUT, "./exiftool -stay_open true -@ a.args |";

# execute 1000 commands as a test
for (my $i=0; $i<1000; ++$i) {

    # the following 1 ms delay is normally totally unnecessary!
    # (we use it here only to make sure we don't beat exiftool to the
    # read delay so we can test the effectiveness of the CONT signal)
    select(undef,undef,undef,0.001);

    # send the command arguments to exiftool (via the ARGFILE)
    print ARGFILE "-aaa\na.jpg\n-execute\n";

    # send a "CONT" signal to stop exiftool from waiting
    # (note: requires ExifTool 8.74 or later)
    kill 19, $pid;

    # read the exiftool output
    for (;;) {
        my $line = readline(OUT);
        last if $line eq "{ready}\n";
        #
        # (this is where a normal application would process the exiftool output)
        #
    }
}

# terminate and close the exiftool process
print ARGFILE "-stay_open\nfalse\n";
close ARGFILE;
close OUT;

print "done\n";
# end


#!/usr/bin/perl -w
#
# Sample perl script to test -stay_open and reading a pipe
#
use strict;
use IPC::Open2;

# start the exiftool process, reading input from IN and piping to OUT
my $pid = open2(\*OUT, \*IN, "./exiftool -stay_open true -@ -");

# execute 1000 commands as a test
for (my $i=0; $i<1000; ++$i) {

    # the following 1 ms delay is normally totally unnecessary!
    # (we use it here only to compare times with the reading-from-file
    #  technique, but in this piped example it doesn't actually matter
    #  because the exiftool read will block when reading from a pipe,
    #  so the delay in exiftool's read loop will never get executed)
    select(undef,undef,undef,0.001);

    # send the command arguments to exiftool (via the pipe)
    print IN "-aaa\na.jpg\n-execute\n";

    # read the exiftool output
    for (;;) {
        my $line = readline(OUT);
        last if $line eq "{ready}\n";
        #
        # (this is where a normal application would process the exiftool output)
        #
    }
}

# terminate the exiftool process
print IN "-stay_open\nfalse\n";

# wait for exiftool to exit
waitpid($pid, 0);

print "done\n";
# end


(Note that I am not handling the stderr output of exiftool in either example, but a real application would need to do this.)

- Phil

Edit: Also note the different ways that I needed to wait for the exiftool process to terminate for the 2 techniques.  In the first example, closing the exiftool output pipe will wait for the process to terminate, but in the 2nd example I open the process differently and must explicitly wait for the process to terminate.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

#3
At the risk of hurting my brain, I sat down and thought this one through.  An infinite processing loop will occur if one uses -execute inside -common_args, regardless whether or not it is inside an argfile.

ExifTool 8.74 wiill add a check to protect against this.  It will issue a warning and ignore the -execute and subsequent arguments in the common arguments.

- Phil

Edit: Hmmm.  There is another infinite processing path with -common_args and -stay_open true, that is independent of the -execute option.  Avoiding these is difficult because I allow nested argfiles with exiftool.  My brain is hurting again...
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

That was a lot of work, and I'm not done testing yet, but I now have a working version which protects users against themselves by checking for 2 possible infinite recursion paths:

1) -execute inside -common_args

2) re-opening of a -stay_open ARGFILE from within the same ARGFILE

However, there is one remaining path that would be more difficult to guard against:

3) re-loading of an ARGFILE from within the same ARGFILE

Path 3) is very different from path 2) because there is only a single -stay_open ARGFILE, while other types of ARGFILEs may be deeply nested.

Adding these checks was more difficult than it may appear at first because of the fact that arguments may be loaded from deeply nested argument files, so it isn't as easy as just scanning the arguments to look for recursion paths.

- Phil

P.S. Thank god for Advil. :P
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).