Christian Etter has done some timing tests with the Windows exiftool.exe application, and has determined that for his set of test images the startup overhead of running exiftool on each image separately accounts for 98.4% of the execution time. This means that you can get a speed-up factor of 60x by running exiftool in batch mode on a large set of images, rather than executing it separately for each image.
I have always advocated using the exiftool batch-mode capabilities whenever possible, but this gives some concrete numbers to demonstrate why.
Read Christian's blog entry (http://www.christian-etter.de/?p=458) for all the details.
- Phil
Thank you for posting this valuable info. Even I have assumed something like this, I also had thoughts, that I simply don't know how to do it better in GUI.
I really don't know if it's possible (probably is) or how to do it, but it would be great if Perl would run as a "service" inside Windows... speed increase would be tremendous -just an idea.
If some GUI user is reading this: when multiple files are selected inside GUI, then (whenever possible) Exiftool is only called once (meaning: in batch mode).
Bogdan
Well, the most obvious solution would be to translate the Perl code into C/C++ and create a nice little DLL... ;)
The idea behind the testing was to give developers some guidelines with regard to improving performance.
As Phil has mentioned, the overall conclusion was that the load/parse operation is strongly CPU-bound, while the actual extraction of information consumes only a small amount of processing time. Hence the only way to drastically increase speed is to use batch processing. The -fast and -fast2 options yield smaller improvements - and seem to be geared more towards accessing files over a network connection or slow storage system.
If you really have to process files one by one and have multiple cpu cores available, consider using several threads, which led to a 3x increase on my system (8 virtual cores).
For those interested, I have some code samples up on my web site: http://www.christian-etter.de/?tag=exiftool (http://www.christian-etter.de/?tag=exiftool)
As Bogdan has suggested, it would be great to have some kind of mechanism to keep ExifTool loaded and running after processing a file or batch. Although I doubt that turning it into a Windows service is the right way. Perhaps there is a way of keeping the program running (like the -k parameter) and listening for more input on stdin after processing the first file? That way the Unicode path problem would also be solved.
Christian
Quote from: Christian Etter on April 22, 2010, 09:10:49 AM
Perhaps there is a way of keeping the program running (like the -k parameter) and listening for more input on stdin after processing the first file? That way the Unicode path problem would also be solved.
Interesting idea. I'll think about this.
- Phil
Quote from: Phil Harvey on April 23, 2010, 07:13:53 AM
Interesting idea. I'll think about this.
Are you still thinking about it? Such a feature would be awesome.
I am now. ;)
I didn't come up with any ideas I was happy with. My best idea was a new option which would cause exiftool to keep reading from a -@ input file (which could be stdin if you want), even after it hits the EOF. This would involve a new option, called something like -stayOpen. You would set -stayOpen true anytime before exiftool hits the EOF of an input -@ argfile, then exiftool would keep the file open and keep reading, executing a new command each time it read a -execute option, until it received a -stayOpen false option, after which it would close the argfile the next time it hits the EOF. I know this would work on Mac and Linux (I have done similar things on these platforms before), but I think this idea is heavily dependent on the operating system, and from what I know about Windows I don't think it would work there. (Can you open and read a file as it is being written in Windows?)
But if anyone has any comment or ideas, I'm happy to listen to them.
- Phil
Edit: changed -keepOpen to -stayOpen
Seems that simultaneous reading and writing to a file is possible in Windows:
http://www.windowsdevelop.com/windows-forms-general/reading-writing-fromto-a-file-at-the-same-time-in-c-23151.shtml
Speedwise, wouldn't it help to keep the PERL interpreter itself in an idle state and prevent it from killing its own process in memory after a script has been executed? Don't know if that's possible without changing the interpreter code, though. Just wondering why it should be so time-consuming to load EXIFTOOL again for each file if the interpreter stays in memory.
Even if Perl stays in memory it still must recompile exiftool when it runs. So you must keep exiftool from exiting if you want to save the compile time.
I'll run some tests myself to see if my idea works in Windows. Of course, this idea would only be useful for developers who want to use exiftool for their applications, and not useful at all for the average exiftool user.
Would the -stayOpen idea be useful for you?
- Phil
Hi,
As said, such things could be usefull for those, who use exiftool in their (windows) applications and in this case, it would be great, if exiftool's "response time" would be much shorter than it is now. Considering, that "regular" exiftool users wouldn't benefit at all... it's up to you to decide.
I must admit, that I can only barely imagine benefits of -stayOpen option... If it would be something like "wait (in memory) until you're called" once exiftool -stayOpen true is executed, then this is what I wish, of course :)
Bogdan
Quote from: BogdanH on October 24, 2010, 10:07:48 AM
If it would be something like "wait (in memory) until you're called" once exiftool -stayOpen true is executed, then this is what I wish, of course :)
Yes, this is basically what would happen. The steps would be:
1) Call
exiftool -stayOpen true -@ ARGFILE, where
ARGFILE is the name of an existing (possibly empty) argument file or
- to pipe arguments from stdin.
2) Write exiftool command-line arguments to ARGFILE (one argument per line as usual).
3) Write "-execute\n" to ARGFILE to get exiftool to execute the command.
4) Repeat steps 2 and 3 as many times as you wish.
5) Write "-stayopen\nfalse\n" to ARGFILE when you are done. This will cause exiftool to process any remaining arguments then exit normally.
I have tested the feasibility of this on Windows, and it does work as mol indicated.
- Phil
@Christian: I realize this isn't exactly what you were requesting (which I believe was the ability to pipe multiple input image files themselves to exiftool), but there were other problems with implementing your suggestion. The current idea does fix the startup lag but doesn't help with the filename problem.
- Phil
Quote from: Phil Harvey on October 24, 2010, 07:46:21 AM
Even if Perl stays in memory it still must recompile exiftool when it runs.
I was under the impression that starting the interpreter was the most time-consuming part of the process.
Quote from: Phil Harvey on October 24, 2010, 07:46:21 AM
Would the -stayOpen idea be useful for you?
Absolutely! I'm currently working on an application which makes heavy use of EXIFTOOL and handles hundreds of files at a time. The -stayopen options would be godsent.
OK then. This option will appear in the next release.
There will be a necessary but small delay before processing begins after each -execute argument is sent unless I can figure out how to get select to block properly until more data is available from the input file. But with my testing so far, this delay can be set to 1/100 sec with no appreciable drain on the CPU, so if 1/100 sec isn't too long to wait then this work-around should be acceptable. (You can always send more arguments after -execute, before exiftool has finished, to avoid this delay. But if you are doing this when extracting information you will need some way to tell that each command has finished, so I will write a "[ready]" message to stdout after processing is done for each command.)
- Phil
That's great, Phil. Thank you so much for your support!
Quote from: Phil Harvey on October 24, 2010, 12:32:31 PM
...
But if you are doing this when extracting information you will need some way to tell that each command has finished, so I will write a "[ready]" message to stdout after processing is done for each command.)
-I assume "[ready]" will only be sent, if ExifTool is in
-stayOpen "mode".
Anyway, I'm very curious on how much impact that feature will have :)
Bogdan
Quote from: BogdanH on October 29, 2010, 12:46:52 PM
-I assume "[ready]" will only be sent, if ExifTool is in -stayOpen "mode".
Yes. Only if
-stay_open is used.
I changed a few details: The option is now
-stay_open (with an underline), and the ready message is now "
{ready}" (with curly brackets).
Quote
Anyway, I'm very curious on how much impact that feature will have :)
I plan to release this tomorrow so you will see then how much it speeds things up.
- Phil
I've released 8.36 with the -stay_open option. Already there are 2 bugs discovered by a user. These will be fixed in the next release:
1) -stay_open false only works for all lowercase "false".
2) Comments in the ARGFILE aren't ignored as they should be when -stay_open is used.
- Phil
Edit: And another thing I will change in the next release: ExifTool currently prints the documentation page if nothing was done, but this doesn't make any sense if a -stay_open argfile was parsed (could be the application was quit before any exiftool commands were needed). So I will disable the help page in this case.
Testing -stay_open, looks mostly fine to me...
Notes on usage:
1. Fire up exiftool: (todo.arg is an empty but existing file)
exiftool -stay_open True -@ todo.arg
2. writing a bunch of commands to a 2nd file called "todoparams.arg", and then writing this to the todo.arg:
-@
todoparams.arg
filename.ext
-execute
(make sure there's a <nl> after -execute!)
This means I can keep building new arg-file content to todoparams.arg, and then just adding those 4 lines to the original todo.arg
3. In the end I then add
-stay_open
false
to the todo.arg file, and I'm done...
Works for me. A wish could be a way to clear the initial file completely between new commands, such that it need not grow and/or leave a trace of all commands issued. If I clear the initial todo.arg file now, it seems to loose track of the current EOF-position of the file. I think the last checked file position should be reset to filesize after any cheking/reading of new arguments... (but not a big issue!)
-etc
Quote from: etcetera on October 31, 2010, 07:38:11 AM
A wish could be a way to clear the initial file completely between new commands, such that it need not grow and/or leave a trace of all commands issued.
Thanks for this suggestion. I did think about this already. You can always avoid the file growing by using a pipe, but I understand that this may not be possible or convenient for some applications.
I couldn't figure out how to reset the file length because there are race conditions which would make it too easy to miss commands this way. (ie. if you reset the file and write exactly the same commands again before exiftool reads anything, then it would have no way to tell that you actually wrote anything.)
But I have just come up with a mechanism to allow you to switch input ARGFILEs by writing the following to the currently open ARGFILE:
-stay_open
True
-@
NEWARGFILEThis will allow you to effectively reset the argfile by switching to a new one whenever you want. ExifTool version 8.37 (just released) has this new feature.
- Phil
yup, works!
thx
-etc
Is there a way to access an ARGFILE which has been created in memory from within EXIFTOOL? Just wondering.
Quote from: MOL on October 31, 2010, 05:58:18 PM
Is there a way to access an ARGFILE which has been created in memory from within EXIFTOOL? Just wondering.
There is no way to share memory directly, but going through a pipe avoids creating an ARGFILE: Execute the exiftool command with
-@ - and your standard output piped to exiftool's standard input, then print the contents of your memory to stdout.
- Phil
is the -recurse option shoud be also considered as "no Perl startup time penalty"?
Quote from: Mart on November 09, 2010, 07:20:59 AM
is the -recurse option shoud be also considered as "no Perl startup time penalty"?
The
-recurse option processes all subdirectories as part of the same command, so there is no additional overhead for processing subdirectories (or additional files and/or directories specified on the same command line for that matter).
- Phil
A M A Z I N G
To see this new option...
Looking forward to publishing another benchmark soon.
I was already suspecting that it would be difficult to pipe more than one file into stdin due to a lack of file separators, so the missing of Unicode support in this case is something we need to accept.
Christian
Hello
* I create an empty file named todo.arg
* i open an msdos box and i enter
exiftool -stay_open True @ todo.arg
* i open another msdos box and i create (in the same folder) a file named test.arg
* i enter in test.arg:
-EXIF:All
c:\exif\test.jpg
-execute
(there is a new line after -execute)
* i copy test.arg on todo.arg
In the first msdos box the EXIF infos are displayed :-)
the last line is <ready>
* i reopen test.arg and i change -EXIF:All with -IPTC:All
* i copy test.arg on todo.arg
and... nothing's happening. (IPTC infos exist in the file)
Please help
I found a solution (it's perhaps the 'normal' behaviour, sorry)
I added the second file after the first one.
It worked, the todo.arg was read but the two first chars were mangled.
eg:
-EXIF:All
c:\test\exif.jpg
-execute
OK
then i add
-IPTC:All
c:\test\iptc.jpg
-execute
File not found: PTC:All
Instead of copying test.arg onto todo.arg, do this:
type test.arg >> todo.arg
This adds the lines to todo.arg. Exiftool continues reading from the last position in the file, so it won't work if you replace lines that it has already read.
- Phil
I can use 'type' from a commandline but i don't think it's possible from a program
From a program, just open todo.arg in append mode.
- Phil
It's a good idea, thank you Phil :)
Is it possible to dialog with Exiftool.exe using pipes ?
I create a process with exiftool, then i create two pipes for reading from and writing to exiftool.
The read pipe uses stdoutput.
I can read the errors with the pipe but not the infos (Exif and so on).
Where are those infos returned by exiftool supposed to go ???
If you use "-@ -", you should be able to handle all I/O with 3 pipes:
1) exiftool receives command-line arguments from STDIN
2) exiftool writes tag information to STDOUT
3) exiftool writes error messages to STDERR
- Phil
are you sure that stdin can be redirected ?
Excerpt from a Bourne shell session to prove this works:
> ls -l a.jpg b.jpg
ls: b.jpg: No such file or directory
-rwxr-xr-x 1 phil phil 281767 Nov 17 12:09 a.jpg
> cat a.arg
a.jpg
b.jpg
-filename
> cat a.arg | exiftool -@ - 1>std.out 2>err.out
> cat std.out
======== a.jpg
File Name : a.jpg
1 image files read
1 files could not be read
> cat err.out
File not found: b.jpg
- Phil
Edit: To make things easier to see, I could have redirected stdin directly from a file rather than piping the output of "cat". The effect of this command is the same:
exiftool -@ - <a.arg 1>std.out 2>err.out
I meant redirecting stdin under Windows, creating a process for exiftools.exe and giving it pipes for stdin, stdout and stderr.
I can get stdout if i create the process giving parameters such as "-EXIF:All test.jpg"
In that case i don't use stdin.
but if i just create the process with "-@ -" i can(t give parameters using the pipe redirecting stdin
I'm not a Windows programmer, and I don't even know what programming language you are using, but redirecting stdin is very common, and it should be possible from any platform/language.
- Phil
Yes, i use redirecting since years with many programs, but i can''t make that work with Exiftools.
(I use C)
Sorry for the delay, but I had to wait until I got home to try this in windows:
exiftool -@ - <a.arg 1>std.out 2>err.out
This exact command works perfectly in the Windows cmd shell. Undoubtedly this shell is written in C (or C++), so there must be some way for you to do this from C in Windows.
I'm thinking that maybe you are having problems because you aren't flushing the file piped to exiftool's stdin after writing? Any write buffering by your program could put a wrench in the whole works (as noted in my -stay_open documentation).
- Phil
I'm going to investigate a little more, i will post my results :)
A small bug: -lang does not seem to work:
exiftools.exe -lang it -EXIF:All test.jpg
returns:
Invalid or unsupported language 'it'
then the list of Available languages (with it)
Quote from: jean on November 19, 2010, 12:01:03 AM
Invalid or unsupported language 'it'
then the list of Available languages (with it)
It sounds like your exiftool installation is bad. My guess is that lib/Image/ExifTool/Lang/it.pm is missing or corrupted. Try re-installing.
- Phil
I recompile it, using pp.
I added it.pm in the file pp_build_exe.argc
When exiftool is launched it creates its temp folder, and in this temp folder it.pm is copied twice, one under
inc\lib\image\exiftool\lang and one under inc\lib
I tried a lot of different manners, none works ???
can you explain how you add a single line in the pp_build_exe.ergs ?
Attached is my current copy of pp_build_exe.args
- Phil
Thank you Phil, i included the it lang ;D