Run Exiftool using multiple cores

Started by RossTP, October 17, 2016, 02:01:14 AM

Previous topic - Next topic

RossTP

Does Exiftool run on multiple cores, if they're available? If not, is there a way to do this using something like GNU Parallel? I'm trying to clear GPS metadata from >50,000 images (a task I run almost every week), but it takes quite long to run.

To do this I first need to clear the Makernotes (because there are major and minor errors in many of the images I need to work with):

exiftool -r -all= -tagsfromfile @ -all:all -unsafe -icc_profile -overwrite_original -ext jpg .

And then I clear the GPS metadata using this:

exiftool -r -gps:all= -xmp:geotag= -overwrite_original -ext jpg .

Any ideas on how to speed this process up?
FYI - I always work on a copy of my data, hence why I overwrite originals in the above code.

Thanks in advance!

Phil Harvey

You could separate your images into groups and run multiple exiftool commands simultaneously, one on each group.

What platform are you on?  It shouldn't be too hard to create a script to do this without the need to physically separate the images into different directories.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

RossTP

Hi Phil,

I run both Mac (OSX El Capitan) and Windows-based machines. It would be great if some code could work for both, but if not then a Windows (OS 10) system would be the most convenient.

Appreciate any help.

Cheers,
Ross

Phil Harvey

Hi Ross,

If scripted, the script would have to be different for Windows and Mac.  I wouldn't know how to write the Windows script (or .bat file).

But if you could find a clear way to divide the images then it could work for both.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

RossTP

Hi Phil,

So I've managed to figure out how to split the image folder up into smaller folders using terminal/bash, but once I've done that, how to I run multiple exiftool commands simultaneously on these folders? I've currently got three smaller folders, containing ±10,000 images in each.

Thanks in advance.
Ross

Hayo Baan

On a Mac/Linux system, you can easily start any command in the background by putting an & at the end of the command. E.g. exiftool ARGS DIR & will run it as a background process, allowing you to do this multiple times. The output of each of these background processes will mingle with your current process so you'd probably like to capture the output of the background processes in a different output file. If your main script needs to wait for the background processes to finish, you can make use of e.g. the wait function, please see the man page of your shell for more information on this.

On windows, I don't know of a way to do this (at least the standard command-line does not support it, but the powershell might have ways).
Hayo Baan – Photography
Web: www.hayobaan.nl

Alan Clifford

Quote from: RossTP on October 18, 2016, 12:33:29 AM
Hi Phil,

... but once I've done that, how to I run multiple exiftool commands simultaneously on these folders? I've currently got three smaller folders, containing ±10,000 images in each.


You can open three terminal windows, cd to a different directory in each window, then type the appropriate exiftool command in each window.


RossTP

Thanks Phil, Hayo and Alan,

Really appreciate all the help. So in summary, I managed to split the image folder into smaller sub-folders of 10,000 images using this code (for mac):

#!/bin/bash
x=0
y=0
for i in `ls -1`
do
if [ $x = 10000 ]; then
x=0
fi
if [ "$x" = "0" ]; then
y=`expr $y + 1`
mkdir $y.folder
echo -n "."
fi
x=`expr $x + 1`
mv $i $y.folder
done


Then I opened up three terminal windows and ran the exiftool scripts. A bit of a round-about way of achieving the objective, but it works.

Thanks again.
Ross

Hayo Baan

From your original question, I gather you do this often? If so, I'd suggest changing the script so it automatically calls exiftool on the created directories. As said if you add an & at the end of the command, it will be run in the background, so it will then process all directories in parallel.
Hayo Baan – Photography
Web: www.hayobaan.nl

Alan Clifford

Quote from: Hayo Baan on October 19, 2016, 01:34:08 AM
From your original question, I gather you do this often? If so, I'd suggest changing the script so it automatically calls exiftool on the created directories. As said if you add an & at the end of the command, it will be run in the background, so it will then process all directories in parallel.

I'd possibly agree with the background '&' method but suggested the three terminals because it is conceptually easier if someone is not guru level with unix style terminal commands. Personally, I'd be happier with the three windows.

Phil Harvey

Since you are running a script anyway, an alternative would be to create separate lists of files to process instead of moving the files to separate directories.  Then you could use the exiftool -@ option to process files in each list.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Quote from: Phil Harvey on October 19, 2016, 07:09:56 AM
Since you are running a script anyway, an alternative would be to create separate lists of files to process instead of moving the files to separate directories.  Then you could use the exiftool -@ option to process files in each list.

Actually this is best as it saves you from having to move the files altogether. If you tell me the exiftool command you'd like to run on the files and how you specify the location of the files/dirs to process, I'll look into creating a little script for you to do this.
Hayo Baan – Photography
Web: www.hayobaan.nl

RossTP

Hi all,

Thanks very much for this assistance, I really do appreciate it.

Hayo – there are two commands that need to be run. First, I need to rewrite all the metadata using:

exiftool -r -all= -tagsfromfile @ -all:all -unsafe -icc_profile -overwrite_original -ext jpg .

Then I need to clear the GPS metadata using:

exiftool -r -gps:all= -xmp:geotag= -overwrite_original -ext jpg .

Lets assume that the folder (which can consist of 30,000 to 150,000 images) is on my desktop /Users/Ross/Desktop/images

Thanks again in advance!

Phil Harvey

I'll just point out that these two commands may be done in a single operation:

exiftool -r -all= -tagsfromfile @ -all:all --gps:all --xmp:geotag -unsafe -icc_profile -overwrite_original -ext jpg .

- Phil

(2x the speed without any effort)
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

A little late to the thread but I really think that disk i/o is going to be more of a bottleneck than CPU use.  I've tried running a bunch of command simultaneously and after about 4-5 commands things slow down on my old computer due to disk use while there's still plenty of cpu power available.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype