exiftool performance benchmark

Started by Phil Harvey, April 16, 2010, 11:25:53 AM

Previous topic - Next topic

Phil Harvey

Christian Etter has done some timing tests with the Windows exiftool.exe application, and has determined that for his set of test images the startup overhead of running exiftool on each image separately accounts for 98.4% of the execution time.  This means that you can get a speed-up factor of 60x by running exiftool in batch mode on a large set of images, rather than executing it separately for each image.

I have always advocated using the exiftool batch-mode capabilities whenever possible, but this gives some concrete numbers to demonstrate why.

Read Christian's blog entry for all the details.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

BogdanH

Thank you for posting this valuable info. Even I have assumed something like this, I also had thoughts, that I simply don't know how to do it better in GUI.
I really don't know if it's possible (probably is) or how to do it, but it would be great if Perl would run as a "service" inside Windows... speed increase would be tremendous -just an idea.

If some GUI user is reading this: when multiple files are selected inside GUI, then (whenever possible) Exiftool is only called once (meaning: in batch mode).

Bogdan

MOL

Well, the most obvious solution would be to translate the Perl code into C/C++ and create a nice little DLL... ;)

Christian Etter

The idea behind the testing was to give developers some guidelines with regard to improving performance.

As Phil has mentioned, the overall conclusion was that the load/parse operation is strongly CPU-bound, while the actual extraction of information consumes only a small amount of processing time. Hence the only way to drastically increase speed is to use batch processing. The -fast and -fast2 options yield smaller improvements - and seem to be geared more towards accessing files over a network connection or slow storage system.

If you really have to process files one by one and have multiple cpu cores available, consider using several threads, which led to a 3x increase on my system (8 virtual cores).

For those interested, I have some code samples up on my web site: http://www.christian-etter.de/?tag=exiftool

As Bogdan has suggested, it would be great to have some kind of mechanism to keep ExifTool loaded and running after processing a file or batch. Although I doubt that turning it into a Windows service is the right way. Perhaps there is a way of keeping the program running (like the -k parameter) and listening for more input on stdin after processing the first file? That way the Unicode path problem would also be solved.

Christian

Phil Harvey

Quote from: Christian Etter on April 22, 2010, 09:10:49 AM
Perhaps there is a way of keeping the program running (like the -k parameter) and listening for more input on stdin after processing the first file? That way the Unicode path problem would also be solved.

Interesting idea.  I'll think about this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

Quote from: Phil Harvey on April 23, 2010, 07:13:53 AM
Interesting idea.  I'll think about this.

Are you still thinking about it? Such a feature would be awesome.

Phil Harvey

#6
I am now. ;)

I didn't come up with any ideas I was happy with.  My best idea was a new option which would cause exiftool to keep reading from a -@ input file (which could be stdin if you want), even after it hits the EOF.  This would involve a new option, called something like -stayOpen.  You would set -stayOpen true anytime before exiftool hits the EOF of an input -@ argfile, then exiftool would keep the file open and keep reading, executing a new command each time it read a -execute option, until it received a -stayOpen false option, after which it would close the argfile the next time it hits the EOF.  I know this would work on Mac and Linux (I have done similar things on these platforms before), but I think this idea is heavily dependent on the operating system, and from what I know about Windows I don't think it would work there. (Can you open and read a file as it is being written in Windows?)

But if anyone has any comment or ideas, I'm happy to listen to them.

- Phil

Edit: changed -keepOpen to -stayOpen
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

#7
Seems that simultaneous reading and writing to a file is possible in Windows:

http://www.windowsdevelop.com/windows-forms-general/reading-writing-fromto-a-file-at-the-same-time-in-c-23151.shtml

Speedwise, wouldn't it help to keep the PERL interpreter itself in an idle state and prevent it from killing its own process in memory after a script has been executed? Don't know if that's possible without changing the interpreter code, though. Just wondering why it should be so time-consuming to load EXIFTOOL again for each file if the interpreter stays in memory.

Phil Harvey

Even if Perl stays in memory it still must recompile exiftool when it runs.  So you must keep exiftool from exiting if you want to save the compile time.

I'll run some tests myself to see if my idea works in Windows.  Of course, this idea would only be useful for developers who want to use exiftool for their applications, and not useful at all for the average exiftool user.

Would the -stayOpen idea be useful for you?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

BogdanH

Hi,
As said, such things could be usefull for those, who use exiftool in their (windows) applications and in this case, it would be great, if exiftool's "response time" would be much shorter than it is now. Considering, that "regular" exiftool users wouldn't benefit at all... it's up to you to decide.

I must admit, that I can only barely imagine benefits of -stayOpen option... If it would be something like "wait (in memory) until you're called" once exiftool -stayOpen true is executed, then this is what I wish, of course :)

Bogdan

Phil Harvey

Quote from: BogdanH on October 24, 2010, 10:07:48 AM
If it would be something like "wait (in memory) until you're called" once exiftool -stayOpen true is executed, then this is what I wish, of course :)

Yes, this is basically what would happen.  The steps would be:

1) Call exiftool -stayOpen true -@ ARGFILE, where ARGFILE is the name of an existing (possibly empty) argument file or - to pipe arguments from stdin.

2) Write exiftool command-line arguments to ARGFILE (one argument per line as usual).

3) Write "-execute\n" to ARGFILE to get exiftool to execute the command.

4) Repeat steps 2 and 3 as many times as you wish.

5) Write "-stayopen\nfalse\n" to ARGFILE when you are done.  This will cause exiftool to process any remaining arguments then exit normally.

I have tested the feasibility of this on Windows, and it does work as mol indicated.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

@Christian:  I realize this isn't exactly what you were requesting (which I believe was the ability to pipe multiple input image files themselves to exiftool), but there were other problems with implementing your suggestion.  The current idea does fix the startup lag but doesn't help with the filename problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

Quote from: Phil Harvey on October 24, 2010, 07:46:21 AM
Even if Perl stays in memory it still must recompile exiftool when it runs.

I was under the impression that starting the interpreter was the most time-consuming part of the process.


Quote from: Phil Harvey on October 24, 2010, 07:46:21 AM
Would the -stayOpen idea be useful for you?

Absolutely! I'm currently working on an application which makes heavy use of EXIFTOOL and handles hundreds of files at a time. The -stayopen options would be godsent.

Phil Harvey

OK then.  This option will appear in the next release.

There will be a necessary but small delay before processing begins after each -execute argument is sent unless I can figure out how to get select to block properly until more data is available from the input file. But with my testing so far, this delay can be set to 1/100 sec with no appreciable drain on the CPU, so if 1/100 sec isn't too long to wait then this work-around should be acceptable.  (You can always send more arguments after -execute, before exiftool has finished, to avoid this delay.  But if you are doing this when extracting information you will need some way to tell that each command has finished, so I will write a "[ready]" message to stdout after processing is done for each command.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MOL

That's great, Phil. Thank you so much for your support!