exiftool performance benchmark

Started by Phil Harvey, April 16, 2010, 11:25:53 AM

Previous topic - Next topic

Phil Harvey

Christian Etter has done some timing tests with the Windows exiftool.exe application, and has determined that for his set of test images the startup overhead of running exiftool on each image separately accounts for 98.4% of the execution time.  This means that you can get a speed-up factor of 60x by running exiftool in batch mode on a large set of images, rather than executing it separately for each image.

I have always advocated using the exiftool batch-mode capabilities whenever possible, but this gives some concrete numbers to demonstrate why.

Read Christian's blog entry for all the details.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

BogdanH

Thank you for posting this valuable info. Even I have assumed something like this, I also had thoughts, that I simply don't know how to do it better in GUI.
I really don't know if it's possible (probably is) or how to do it, but it would be great if Perl would run as a "service" inside Windows... speed increase would be tremendous -just an idea.

If some GUI user is reading this: when multiple files are selected inside GUI, then (whenever possible) Exiftool is only called once (meaning: in batch mode).

Bogdan

MOL

Well, the most obvious solution would be to translate the Perl code into C/C++ and create a nice little DLL... ;)

Christian Etter

The idea behind the testing was to give developers some guidelines with regard to improving performance.

As Phil has mentioned, the overall conclusion was that the load/parse operation is strongly CPU-bound, while the actual extraction of information consumes only a small amount of processing time. Hence the only way to drastically increase speed is to use batch processing. The -fast and -fast2 options yield smaller improvements - and seem to be geared more towards accessing files over a network connection or slow storage system.

If you really have to process files one by one and have multiple cpu cores available, consider using several threads, which led to a 3x increase on my system (8 virtual cores).

For those interested, I have some code samples up on my web site: http://www.christian-etter.de/?tag=exiftool

As Bogdan has suggested, it would be great to have some kind of mechanism to keep ExifTool loaded and running after processing a file or batch. Although I doubt that turning it into a Windows service is the right way. Perhaps there is a way of keeping the program running (like the -k parameter) and listening for more input on stdin after processing the first file? That way the Unicode path problem would also be solved.

Christian

Phil Harvey

Quote from: Christian Etter on April 22, 2010, 09:10:49 AM
Perhaps there is a way of keeping the program running (like the -k parameter) and listening for more input on stdin after processing the first file? That way the Unicode path problem would also be solved.

Interesting idea.  I'll think about this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

Quote from: Phil Harvey on April 23, 2010, 07:13:53 AM
Interesting idea.  I'll think about this.

Are you still thinking about it? Such a feature would be awesome.

Phil Harvey

#6
I am now. ;)

I didn't come up with any ideas I was happy with.  My best idea was a new option which would cause exiftool to keep reading from a -@ input file (which could be stdin if you want), even after it hits the EOF.  This would involve a new option, called something like -stayOpen.  You would set -stayOpen true anytime before exiftool hits the EOF of an input -@ argfile, then exiftool would keep the file open and keep reading, executing a new command each time it read a -execute option, until it received a -stayOpen false option, after which it would close the argfile the next time it hits the EOF.  I know this would work on Mac and Linux (I have done similar things on these platforms before), but I think this idea is heavily dependent on the operating system, and from what I know about Windows I don't think it would work there. (Can you open and read a file as it is being written in Windows?)

But if anyone has any comment or ideas, I'm happy to listen to them.

- Phil

Edit: changed -keepOpen to -stayOpen
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

#7
Seems that simultaneous reading and writing to a file is possible in Windows:

http://www.windowsdevelop.com/windows-forms-general/reading-writing-fromto-a-file-at-the-same-time-in-c-23151.shtml

Speedwise, wouldn't it help to keep the PERL interpreter itself in an idle state and prevent it from killing its own process in memory after a script has been executed? Don't know if that's possible without changing the interpreter code, though. Just wondering why it should be so time-consuming to load EXIFTOOL again for each file if the interpreter stays in memory.

Phil Harvey

Even if Perl stays in memory it still must recompile exiftool when it runs.  So you must keep exiftool from exiting if you want to save the compile time.

I'll run some tests myself to see if my idea works in Windows.  Of course, this idea would only be useful for developers who want to use exiftool for their applications, and not useful at all for the average exiftool user.

Would the -stayOpen idea be useful for you?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

BogdanH

Hi,
As said, such things could be usefull for those, who use exiftool in their (windows) applications and in this case, it would be great, if exiftool's "response time" would be much shorter than it is now. Considering, that "regular" exiftool users wouldn't benefit at all... it's up to you to decide.

I must admit, that I can only barely imagine benefits of -stayOpen option... If it would be something like "wait (in memory) until you're called" once exiftool -stayOpen true is executed, then this is what I wish, of course :)

Bogdan

Phil Harvey

Quote from: BogdanH on October 24, 2010, 10:07:48 AM
If it would be something like "wait (in memory) until you're called" once exiftool -stayOpen true is executed, then this is what I wish, of course :)

Yes, this is basically what would happen.  The steps would be:

1) Call exiftool -stayOpen true -@ ARGFILE, where ARGFILE is the name of an existing (possibly empty) argument file or - to pipe arguments from stdin.

2) Write exiftool command-line arguments to ARGFILE (one argument per line as usual).

3) Write "-execute\n" to ARGFILE to get exiftool to execute the command.

4) Repeat steps 2 and 3 as many times as you wish.

5) Write "-stayopen\nfalse\n" to ARGFILE when you are done.  This will cause exiftool to process any remaining arguments then exit normally.

I have tested the feasibility of this on Windows, and it does work as mol indicated.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

@Christian:  I realize this isn't exactly what you were requesting (which I believe was the ability to pipe multiple input image files themselves to exiftool), but there were other problems with implementing your suggestion.  The current idea does fix the startup lag but doesn't help with the filename problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

Quote from: Phil Harvey on October 24, 2010, 07:46:21 AM
Even if Perl stays in memory it still must recompile exiftool when it runs.

I was under the impression that starting the interpreter was the most time-consuming part of the process.


Quote from: Phil Harvey on October 24, 2010, 07:46:21 AM
Would the -stayOpen idea be useful for you?

Absolutely! I'm currently working on an application which makes heavy use of EXIFTOOL and handles hundreds of files at a time. The -stayopen options would be godsent.

Phil Harvey

OK then.  This option will appear in the next release.

There will be a necessary but small delay before processing begins after each -execute argument is sent unless I can figure out how to get select to block properly until more data is available from the input file. But with my testing so far, this delay can be set to 1/100 sec with no appreciable drain on the CPU, so if 1/100 sec isn't too long to wait then this work-around should be acceptable.  (You can always send more arguments after -execute, before exiftool has finished, to avoid this delay.  But if you are doing this when extracting information you will need some way to tell that each command has finished, so I will write a "[ready]" message to stdout after processing is done for each command.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

That's great, Phil. Thank you so much for your support!

BogdanH

Quote from: Phil Harvey on October 24, 2010, 12:32:31 PM
...
But if you are doing this when extracting information you will need some way to tell that each command has finished, so I will write a "[ready]" message to stdout after processing is done for each command.)
-I assume "[ready]" will only be sent, if ExifTool is in -stayOpen "mode".
Anyway, I'm very curious on how much impact that feature will have :)

Bogdan

Phil Harvey

Quote from: BogdanH on October 29, 2010, 12:46:52 PM
-I assume "[ready]" will only be sent, if ExifTool is in -stayOpen "mode".

Yes. Only if -stay_open is used.

I changed a few details: The option is now -stay_open (with an underline), and the ready message is now "{ready}" (with curly brackets).

Quote
Anyway, I'm very curious on how much impact that feature will have :)

I plan to release this tomorrow so you will see then how much it speeds things up.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

#17
I've released 8.36 with the -stay_open option.  Already there are 2 bugs discovered by a user.  These will be fixed in the next release:

1) -stay_open false only works for all lowercase "false".

2) Comments in the ARGFILE aren't ignored as they should be when -stay_open is used.

- Phil

Edit:  And another thing I will change in the next release:  ExifTool currently prints the documentation page if nothing was done, but this doesn't make any sense if a -stay_open argfile was parsed (could be the application was quit before any exiftool commands were needed).  So I will disable the help page in this case.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

etcetera

Testing -stay_open, looks mostly fine to me...

Notes on usage:

1. Fire up exiftool: (todo.arg is an empty but existing file)
exiftool -stay_open True -@ todo.arg

2. writing a bunch of commands to a 2nd file called "todoparams.arg", and then writing this to the todo.arg:
   -@
   todoparams.arg
   filename.ext
   -execute

(make sure there's a <nl> after -execute!)
This means I can keep building new arg-file content to todoparams.arg, and then just adding those 4 lines to the original todo.arg

3. In the end I then add
   -stay_open
   false


to the todo.arg file, and I'm done...

Works for me. A wish could be a way to clear the initial file completely between new commands, such that it need not grow and/or leave a trace of all commands issued. If I clear the initial todo.arg file now, it seems to loose track of the current EOF-position of the file. I think the last checked file position should be reset to filesize after any cheking/reading of new arguments... (but not a big issue!)

-etc



Phil Harvey

Quote from: etcetera on October 31, 2010, 07:38:11 AM
A wish could be a way to clear the initial file completely between new commands, such that it need not grow and/or leave a trace of all commands issued.


Thanks for this suggestion.  I did think about this already.  You can always avoid the file growing by using a pipe, but I understand that this may not be possible or convenient for some applications.

I couldn't figure out how to reset the file length because there are race conditions which would make it too easy to miss commands this way. (ie. if you reset the file and write exactly the same commands again before exiftool reads anything, then it would have no way to tell that you actually wrote anything.)

But I have just come up with a mechanism to allow you to switch input ARGFILEs by writing the following to the currently open ARGFILE:

    -stay_open
    True
    -@
    NEWARGFILE


This will allow you to effectively reset the argfile by switching to a new one whenever you want.  ExifTool version 8.37 (just released) has this new feature.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

etcetera


MOL

#21
Is there a way to access an ARGFILE which has been created in memory from within EXIFTOOL? Just wondering.

Phil Harvey

Quote from: MOL on October 31, 2010, 05:58:18 PM
Is there a way to access an ARGFILE which has been created in memory from within EXIFTOOL? Just wondering.

There is no way to share memory directly, but going through a pipe avoids creating an ARGFILE:  Execute the exiftool command with -@ - and your standard output piped to exiftool's standard input, then print the contents of your memory to stdout.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mart

is the -recurse option shoud be also considered as "no Perl startup time penalty"?

Phil Harvey

Quote from: Mart on November 09, 2010, 07:20:59 AM
is the -recurse option shoud be also considered as "no Perl startup time penalty"?

The -recurse option processes all subdirectories as part of the same command, so there is no additional overhead for processing subdirectories (or additional files and/or directories specified on the same command line for that matter).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Christian Etter

A M A Z I N G

To see this new option...

Looking forward to publishing another benchmark soon.

I was already suspecting that it would be difficult to pipe more than one file into stdin due to a lack of file separators, so the missing of Unicode support in this case is something we need to accept.

Christian


jean

Hello

* I create an empty file named todo.arg
* i open an msdos box and i enter
     exiftool -stay_open True @ todo.arg
* i open another msdos box and i create (in the same folder) a file named test.arg
* i enter in test.arg:
-EXIF:All
c:\exif\test.jpg
-execute

(there is a new line after -execute)
* i copy test.arg on todo.arg
In the first msdos box the EXIF infos are displayed :-)
the last line is <ready>

* i reopen test.arg and i change -EXIF:All with -IPTC:All
* i copy test.arg on todo.arg
and... nothing's happening. (IPTC infos exist in the file)

Please help

jean

I found a solution (it's perhaps the 'normal' behaviour, sorry)
I added the second file after the first one.
It worked, the todo.arg was read but the two first chars were mangled.
eg:

-EXIF:All
c:\test\exif.jpg
-execute

OK

then i add

-IPTC:All
c:\test\iptc.jpg
-execute

File not found: PTC:All


Phil Harvey

Instead of copying test.arg onto todo.arg, do this:

type test.arg >> todo.arg

This adds the lines to todo.arg.  Exiftool continues reading from the last position in the file, so it won't work if you replace lines that it has already read.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean

I can use 'type' from a commandline but i don't think it's possible from a program

Phil Harvey

From a program, just open todo.arg in append mode.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean


jean

Is it possible to dialog with Exiftool.exe using pipes ?
I create a process with exiftool, then i create two pipes for reading from and writing to exiftool.
The read pipe uses stdoutput.
I can read the errors with the pipe but not the infos (Exif and so on).
Where are those infos returned by exiftool supposed to go  ???

Phil Harvey

If you use "-@ -", you should be able to handle all I/O with 3 pipes:

1) exiftool receives command-line arguments from STDIN

2) exiftool writes tag information to STDOUT

3) exiftool writes error messages to STDERR

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean

are you sure that stdin can be redirected ?

Phil Harvey

#35
Excerpt from a Bourne shell session to prove this works:

> ls -l a.jpg b.jpg
ls: b.jpg: No such file or directory
-rwxr-xr-x   1 phil  phil  281767 Nov 17 12:09 a.jpg

> cat a.arg
a.jpg
b.jpg
-filename

> cat a.arg | exiftool -@ - 1>std.out 2>err.out

> cat std.out
======== a.jpg
File Name                       : a.jpg
    1 image files read
    1 files could not be read

> cat err.out
File not found: b.jpg


- Phil

Edit: To make things easier to see, I could have redirected stdin directly from a file rather than piping the output of "cat".  The effect of this command is the same:

exiftool -@ - <a.arg 1>std.out 2>err.out
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean

I meant redirecting stdin under Windows, creating a process for exiftools.exe and giving it pipes for stdin, stdout and stderr.
I can get stdout if i create the process giving parameters such as "-EXIF:All test.jpg"
In that case i don't use stdin.

but if i just create the process with "-@ -" i can(t give parameters using the pipe redirecting stdin

Phil Harvey

I'm not a Windows programmer, and I don't even know what programming language you are using, but redirecting stdin is very common, and it should be possible from any platform/language.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean

Yes, i use redirecting since years with many programs, but i can''t make that work with Exiftools.
(I use C)

Phil Harvey

Sorry for the delay, but I had to wait until I got home to try this in windows:

exiftool -@ - <a.arg 1>std.out 2>err.out

This exact command works perfectly in the Windows cmd shell.  Undoubtedly this shell is written in C (or C++), so there must be some way for you to do this from C in Windows.

I'm thinking that maybe you are having problems because you aren't flushing the file piped to exiftool's stdin after writing?  Any write buffering by your program could put a wrench in the whole works (as noted in my -stay_open documentation).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean

I'm going to investigate a little more, i will post my results  :)

A small bug: -lang does not seem to work:

exiftools.exe -lang it -EXIF:All test.jpg

returns:

Invalid or unsupported language 'it'
then the list of Available languages (with it)

Phil Harvey

Quote from: jean on November 19, 2010, 12:01:03 AM
Invalid or unsupported language 'it'
then the list of Available languages (with it)

It sounds like your exiftool installation is bad.  My guess is that lib/Image/ExifTool/Lang/it.pm is missing or corrupted.  Try re-installing.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean

I recompile it, using pp.
I added it.pm in the file pp_build_exe.argc
When exiftool is launched it creates its temp folder, and in this temp folder it.pm is copied twice, one under
inc\lib\image\exiftool\lang and one under inc\lib

jean

I tried a lot of different manners, none works  ???
can you explain how you add a single line in the pp_build_exe.ergs ?

Phil Harvey

Attached is my current copy of pp_build_exe.args

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jean

Thank you Phil, i included the it lang  ;D