-@ Arg file size when using -stay_open

Started by ryerman, July 26, 2011, 12:04:31 PM

Previous topic - Next topic

ryerman

While searching the forum to learn more about the -stay_open option, I saw a post that raises a concern about the number of lines in the arg file.  My aim is to process 70,000 audio files (extracting image size of embedded picture). Before starting my usual tortuous trial & error scripting (VBS) I'd like some advise to avoid any problems when I finally apply my solution to the entire music collection.
These 12 lines (2x(open, write 6 lines, close) per file) provide the required information:
-b
-picture
F:\anyfile.mp3
-w
out
-execute
-p
${imagewidth}x${imageheight}  ${directory}
F:\anyfile.out
-w
txt
-execute


When processing all 70,000 files, the resulting arg file will have over 800,000 lines.
Is that many lines a problem for Exiftool?  For Windows?
I saw Phil's response that tells how to limit the number of lines or avoid the arg file altogether, and I may be able to figure that out, but is it necessary to limit the number of lines?  Or maybe only desired?
My estimate for size of the arg file is 3 GB, at most.  Does that matter, assuming I have the disk space?

Any comments, observations or suggestions are also welcome.

Jim

PS: This post doesn't concern the explicit processing of any image type file. I hope that doesn't bother any of the photographers in the crowd. ;)
      That is an example of the versatility and usefulness of Exiftool!

Edit: Fixed bad URL
Windows 10 Home 64 bit, Exiftool v12.61

Phil Harvey

If the -stay_open option is not used the -@ argfile is loaded completely into memory, so a very large file may be a problem because it may result in excessive memory usage.  But you can avoid this problem by using -stay_open true -@ my.args in the command, then putting "-stay_open" and "false" as the last 2 lines of the argfile.  Then the only thing you have to worry about is filling up the disk with a big argfile.

This behaviour is totally undocumented, and therefore may change at some point in the future, but this is the only problem I can see with a huge argfile, and at least you have a work-around.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ryerman

Hi Phil

I ended up writing a 42 MB argfile with 1.5 million lines and used it with the -stay_open option.
exiftool -stay_open true -@ ArgFile.txt
It took 41 minutes to process 67,000 files, without a problem.
When the same argfile was used without stay_open, it took 39 minutes and that puzzled me because I thought stay_open would reduce the processing time.  Then, I remembered a post concerning the case of the True/False flag.  So I changed true to 1, ran the command again, and this time it took 40 minutes.

A little background:
Earlier, I had used a call to exiftool for each file:
exiftool -picture -b "somefile.mp3" | exiftool - -p ${imageheight}x${imagewidth} > "resultfile.txt"
That took over 3 minutes for 500 files.  I estimated that 67,000 files would take around 8 hours, which I wanted to reduce.  So I investigated -stay_open, which led to -@.  When I finally realized that sourcefile paths could be included in an argfile (I had assumed only options were allowed.  It really pays to read the documentation carefully :).), I created one for 500 files and used it with -stay_open. I had to change the options in my commands to conform to the argfile requirements but the commands were essentially the same and I even added one to check for no picture tag.  When it took 18 secs instead of the previous 3 minutes I was content to use it for a large number of files.

It seems that using an argfile greatly reduced processing time for my task, but -stay_open had no benefit for that particular argfile.

Is that atypical, maybe because of the particular command and options?

Maybe -stay_open has other uses that are not clear to me.

Thanks,

Jim
Windows 10 Home 64 bit, Exiftool v12.61

BogdanH

Hi Jim,

As you assume, -stay_open has other uses: everytime you call exiftool, it takes some time to load/initialize it first. Let's say, it takes 1sec... if we call exiftool five times, then we lose 4sec for additional calls. If -stay_open is used (properly!), then once exiftool is called, it stays in memory and is ready to execute next time it's called.
However, if (single) ArgFile is used, then exiftool is called only once anyway (if I'm not mistaken) -thus no benefit of using -stay_open. ...well, something like that.

Btw. you've made nice benchmark  :)

Bogdan

Phil Harvey

Bogdan is correct.  Using -stay_open only saves time if you were otherwise executing exiftool separately for each command.  The processing time is identical to using one large -@ argfile for all commands.  The only difference is memory usage, which may turn into a performance difference only if your system is memory limited and the extra memory usage results in virtual memory swapping.

Likewise, the using the argfile by itself won't improve performance over a normal command-line invocation.  But if you use -execute to reduce the number of invocations in one case and not the other, this is the reason for the difference.  It is always faster if you can avoid invoking exiftool unnecessarily.

But 67,000 files in 41 minutes is about 27 files per second, which isn't too bad.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ryerman

Thanks Bogdan and Phil for your replies.

Quote from: Phil Harvey on July 28, 2011, 07:50:43 PM
...Likewise, the using the argfile by itself won't improve performance over a normal command-line invocation.  But if you use -execute to reduce the number of invocations in one case and not the other, this is the reason for the difference.  It is always faster if you can avoid invoking exiftool unnecessarily....
I thought -execute did invoke exiftool! :o
Now that I know better, it seems my efforts should be directed to improving my command line and abandoning the -@ argfile, in favour of -r recursion.  I've learned some more options that should help but this was a one time project (famous last words) so I may just leave well enough alone.
I wish there was some sort of modified -if option that allowed one of two different courses of action depending on some condition.  For example, "if a tag exists do this, otherwise do that".

Jim

Windows 10 Home 64 bit, Exiftool v12.61

Phil Harvey

Hi Jim,

The terminology is tricky, but what I am calling an "invocation" is when exiftool is launched from the command line.  The -execute option is used to split a single command line into separate commands.

Quote from: ryerman on July 28, 2011, 11:14:02 PM
I wish there was some sort of modified -if option that allowed one of two different courses of action depending on some condition.

I really like this idea.  It could be implemented through a -else option.  However, the implementation would be complex because I would have to maintain a parallel set of all command-line variables for the alternate condition, and there are lots of these.  So unfortunately this idea fails the cost/benefit test.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ryerman

Hi Phil

Let's see if I understand correctly:

1. There is no essential difference between a command line typed at the console and one read from an arg file.
    Both cause one and only one invocation of exiftool (loading into memory).
2. Any command line can be re-written as an arg file, excluding any non-exiftool arguements like piping and redirection.
3. Any arg file can be re-written as a single command line, although length limitations may make it impossible to use.
4. Any command line or arg file can use -execute to separate commands without "opening" and "closing" exiftool.

Jim
Windows 10 Home 64 bit, Exiftool v12.61

Phil Harvey

Excellent.  You have learned well, grasshopper.

The only minor nit-picky point I can make is that 3. the line length limitation is Windows-only.  Mac and Linux have no such limitation.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).