speedup/optimization: order multiple passes by image instead of by pass...

Started by f_lynx, October 09, 2019, 12:00:20 PM

Previous topic - Next topic

f_lynx

Hi,

Currently, when running exiftool with multiple passes (example below) it runs each pass semi-independently which for a large number of images will lead to each image dropping out of OS cache and being re-read and re-cached on each pass.

It should be substantially faster to run the passes "interweaved", i.e. group the passes by input directory, reading in the input image once and sequentially doing each pass's operations on it, when done read the next image, etc.


Here is a real example from my workflow that would benefit significantly from this optimization:

# some configuration...
METADATA_DIR="metadata"
RAW_PREVIEW_DIR="preview (RAW)"

ARCHIVE_ROOT="./some/path/"

# output patterns...
PREVIEW_NAME="%-:1d/${RAW_PREVIEW_DIR}/%f.jpg"
JSON_NAME="%-:1d/${METADATA_DIR}/%f.json"

exiftool -if '$jpgfromraw' -b -jpgfromraw -w "$PREVIEW_NAME" \
   -execute -if '$previewimage' -b -previewimage -w "$PREVIEW_NAME" \
   -execute '-FileModifyDate<DateTimeOriginal' -addtagsfromfile @ \
      -srcfile "$PREVIEW_NAME" '-all>all' '-xmp' \
      -overwrite_original \
   -execute -j -G -w "$JSON_NAME" \
   -common_args --ext jpg -r "./$ARCHIVE_ROOT" -progress


For reference, here is the full script:
https://github.com/flynx/ImageGrid/blob/master/scripts/process-archive.sh


Thanks!
Alex A. Naanou
https://github.com/flynx
https://flic.kr/f_lynx

Phil Harvey

The first two commands may be combined into one using the BigImage user-defined tag from the sample config file instead of JpgFromRaw and PreviewImage.

As for the combining the others, interleaving commands like this would require a major change to the way ExifTool operates, and would require an additional option since the commands separated by -execute are advertised as being independent.

I don't see this as being a feature that would be worthwhile to add.

You could work around this yourself by using a script to loop through the files.  You would want to use -@ and -stay_open to avoid the startup overhead, and you would have to mimic the -progress option yourself, so it wouldn't be trivial.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

f_lynx

Well, I would not be against an -interleave option as an alternative to -execute and this would make it directly available to everyone, but I get where you are coming from, the list of people running 3+ -execute commands in one go on very large batches of files all the time is likely not that long ;)

This (reading commands from file/pipe approach) would require some experimentation but looks quite promising, if all goes well it's not going to be more complex than piping find to exiftool... the long-term cost for me would just be maintaining the list of raw file extensions to support all the formats exiftool knows but that's not a big deal.

I'll post an example as soon as I get it working.

BTW, can exiftool print the list of file extensions it supports (did a glance through the man pages and found nothing)?


And thanks for the tip!
Alex A. Naanou
https://github.com/flynx
https://flic.kr/f_lynx

Hayo Baan

Here's the full list of supported file/metadata types: https://www.exiftool.org/index.html#supported

Have a look at the various -listxx options (especially -listf) to have exiftool produce a list of supported files at run-time.

Instead of using e.g. -execute to perform various commands on the same file, have you considered using the exiftool perl API directly in a perl script? That is extremely powerful and flexible. It does require some perl knowledge, but I think you should be able to get something working quite quickly.
Hayo Baan – Photography
Web: www.hayobaan.nl

f_lynx

Quote from: Hayo Baan on October 12, 2019, 04:58:13 AMhave you considered using the exiftool perl API directly in a perl script?

Nope -- as tempting as that is, I'm trying to limit the number of languages/dependencies in a project to as small a number as possible...
...every time I dive into updating an old bash script I think of porting it to perl or python, every time I package the project and organize a new group of volunteer testers, I hate myself for not resisting the temptation to move from POSIX shell to bash -- I really do not like these two balancing acts: keeping the stand-alone distribution manageable and portable yet easy to maintain/develop and keeping the thing scriptable and extensible and at the same time not alienating non-power users ;)


And thanks for the -listf tip, I did not get to it yet but it will make long-term maintenance quite a bit simpler.
Alex A. Naanou
https://github.com/flynx
https://flic.kr/f_lynx

f_lynx

Spent some time today on this and the "elegant" way out seems to have failed, but before letting it go and moving onto semi-brute-force or (hopefully not) brute-force approaches I thought I'd ask...

My logic was to find all the images, stage a pipeline in exiftool as before and then run them through in small-ish batches with minimal changes to the above code -- essentially instead of -r "./some/path" I'm piping in a batch of paths followed by -execute then the next batch, etc.

...but as a result, I get the first batch processed correctly then exiftool stops with:
Ignoring -common_args from -execute onwards to avoid infinite recursion

I understand where this error is coming from but since I'm handling the input list myself and keeping the input and output files separate...

Is there a possibility to disable this guard?


P.S. if at all possible I'd like to avoid either re-staging the pipeline for each batch, or fully manually controlling exiftool for each op essentially re-implementing part of its own logic that already works quite well...
Alex A. Naanou
https://github.com/flynx
https://flic.kr/f_lynx

Phil Harvey

Quote from: f_lynx on October 13, 2019, 08:17:14 PM
Ignoring -common_args from -execute onwards to avoid infinite recursion

Is there a possibility to disable this guard?

Just don't use -common_args.  These arguments are the arguments to repeat in each -execute command.  It doesn't make any sense to have -execute as a common argument.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

f_lynx

Quote from: Phil Harvey on October 13, 2019, 10:04:22 PM
Just don't use -common_args.

We were thinking of this a bit differently, I've been looking at -common_args and -execute as a sort-of explicit closure and an action/call curry respectively... and in this context, recursion makes perfect sense (though yes, it can get complicated) -- one of the things I really like about how you organized the exiftool commandline interface is that it's almost its own domain-specific language and this concept fits it quite well.

I would argue for this concept, at least as an experiment, but after I deal with the task at hand ;)

...now guess it's back to manual argument threading.
Alex A. Naanou
https://github.com/flynx
https://flic.kr/f_lynx