News:

2023-03-15 Major improvements to the new Geolocation feature

Main Menu

Image::ExifTool vs exiftool performance on multiple images

Started by raj, November 06, 2013, 05:44:39 AM

Previous topic - Next topic

raj

Hi Phil,

Great set of tools !! I've written a script to parse a directory of jpg files. It makes the following call in a for loop:

my $exiftool = new Image::ExifTool;
for my $img(@images) { # IO::All objects
  [..] # to get full path $filename from $img->file
  my $data = $exiftool->ImageInfo($filename, \@tags);
  [..] # do something with $data
}


I notice it starts off fast - several images per second - then slows down as the CPU hits 100%. It's taking about 90 secs for a directory of 60 images.

So then I tried using exiftool in a system call in batch mode and found it to be way faster (a few seconds to do the same job):

qx/$exiftool $img_dir $tags -s -j/;

It's a bit of a faff outputting to json then using JSON::Parse to get a hash, but is well worth it for the performance improvement. But is there any way to get the same performance directly from the Image::ExifTool object? It doesn't seem to accept a list of filenames. Apologies if this has been asked before - I couldn't find anything on a forum search.

Phil Harvey

Your script is exactly the way it is done in the exiftool application.  The poor performance of your script may be one of two things:

1) A different version of Perl with worse memory handling.

2) Your script allocates variables that are never freed, resulting in excessive memory usage over time.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

raj

OK, thanks for quick reply.

Perl is 5.10.1 (Debian 6)

The full loop is:

my @images = io($img_dir)->all_files;
my $exiftool = new Image::ExifTool;

my %exifdata;
for my $img(@images) { # warn $img; # stringifies to ->name;
   my $filename = $img->filename;
   my $data = $exiftool->ImageInfo($img_dir . '/' . $filename, \@tags);
   
   my $ImgWidth  = $data->{ImageWidth};
   my $ImgHeight = $data->{ImageHeight};
   
   my $orientation = $ImgWidth > $ImgHeight
      ? 'landscape' : 'portrait';
   push @{ $exifdata{$orientation} }, $data;
}


The only thing I can see that would cause memory to be retained is pushing the $data hashref into %exifdata in the loop, but that's pretty standard perl, and I only grep 15 elements in @tags so each memory allocation should be tiny. I'm running on a 1GB mem linode, so I would not expect memory issues here, and monitoring top I don't see any change in free or swap memory over the duration the script runs. It seems to be more of a CPU utilisation issue. Creating a new Image::ExifTool object for each image is actually slower.

Timing each cycle of the loop using Time::HiRes shows what is happening:

IMG_2611.JPG: 0.122 sec
IMG_2612.JPG: 0.050 sec
IMG_2613.JPG: 0.056 sec
IMG_2614.JPG: 0.061 sec
IMG_2615.JPG: 0.065 sec
IMG_2616.JPG: 0.074 sec
IMG_2617.JPG: 0.085 sec
IMG_2618.JPG: 0.089 sec
IMG_2619.JPG: 0.094 sec
IMG_2620.JPG: 0.100 sec
IMG_2621.JPG: 0.108 sec
IMG_2622.JPG: 0.117 sec
IMG_2623.JPG: 0.138 sec
IMG_2624.JPG: 0.166 sec
IMG_2625.JPG: 0.179 sec
IMG_2626.JPG: 0.187 sec
IMG_2627.JPG: 0.194 sec
IMG_2628.JPG: 0.207 sec
IMG_2629.JPG: 0.259 sec
IMG_2630.JPG: 0.240 sec
IMG_2631.JPG: 0.247 sec
IMG_2632.JPG: 0.265 sec
IMG_2633.JPG: 0.292 sec
IMG_2634.JPG: 0.293 sec
IMG_2635.JPG: 0.322 sec
IMG_2636.JPG: 0.312 sec
IMG_2637.JPG: 0.335 sec
IMG_2638.JPG: 0.364 sec
IMG_2639.JPG: 0.397 sec
IMG_2640.JPG: 0.428 sec
IMG_2641.JPG: 0.453 sec
IMG_2642.JPG: 0.550 sec
IMG_2644.JPG: 0.539 sec
IMG_2647.JPG: 0.548 sec
IMG_2648.JPG: 0.602 sec
IMG_2649.JPG: 0.630 sec
IMG_2651.JPG: 0.672 sec
IMG_2652.JPG: 0.686 sec
IMG_2653.JPG: 0.680 sec
IMG_2654.JPG: 0.705 sec
IMG_2655.JPG: 0.746 sec
IMG_2656.JPG: 0.830 sec
IMG_2657.JPG: 1.001 sec
IMG_2660.JPG: 0.935 sec
IMG_2661.JPG: 0.946 sec
IMG_2662.JPG: 0.825 sec
IMG_2663.JPG: 0.965 sec
IMG_2664.JPG: 0.942 sec
IMG_2665.JPG: 1.041 sec
IMG_2666.JPG: 1.150 sec
IMG_2667.JPG: 1.421 sec
IMG_2668.JPG: 1.383 sec
IMG_2669.JPG: 1.271 sec
IMG_2670.JPG: 1.348 sec
IMG_2671.JPG: 1.450 sec
IMG_2672.JPG: 1.558 sec
IMG_2673.JPG: 2.251 sec
IMG_2674.JPG: 1.894 sec
IMG_2675.JPG: 1.882 sec
IMG_2676.JPG: 1.852 sec
IMG_2681.JPG: 2.188 sec
IMG_2682.JPG: 2.623 sec
IMG_2683.JPG: 2.324 sec
IMG_2684.JPG: 2.323 sec

That is something in the region of a 40-fold difference between the first few images and the final few. Have you ever observed or heard of this kind of behaviour before?

Phil Harvey

Ah.  That's the problem.  Take a look at the size of your @tags array for each iteration.

When ImageInfo() returns, @tags will be filled with the corresponding tag keys, not tag names.  You shouldn't re-use this array.

Instead of passing the reference \@tags, pass the tags themselves: @tags

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

raj

That did it - many thanks :) Not sure why but my initial use of a tags array (on a very small dir of images) failed so I tried passing a ref to it as the 2nd arg and it worked. I didn't realise the tags array was growing on each cycle. Now it's so much faster. Thanks again.

mrbrahman

I had this exact problem today, and thanks to this thread was able to solve it.  :D

Being new to Perl, I'm not sure I understand what's happening with @tags vs \@tags. Can someone shed some light?

BTW, I think I picked the syntax (to use \@tags) from http://search.cpan.org/~exiftool/Image-ExifTool-10.80/lib/Image/ExifTool.pod ImageInfo Section. If that's not what is preferred, may be it's a good idea to update the doc at some point?

Thanks!

Hayo Baan

Quote from: mrbrahman on May 29, 2018, 11:44:43 PM
I had this exact problem today, and thanks to this thread was able to solve it.  :D

Being new to Perl, I'm not sure I understand what's happening with @tags vs \@tags. Can someone shed some light?

BTW, I think I picked the syntax (to use \@tags) from http://search.cpan.org/~exiftool/Image-ExifTool-10.80/lib/Image/ExifTool.pod ImageInfo Section. If that's not what is preferred, may be it's a good idea to update the doc at some point?

Thanks!

The difference between @tags and \@tags is that the former is the whole list, the latter is just a reference to that list. This difference becomes important when calling functions. When you call a function with a list argument (e.g. @tags), the function basically gets a copy of the original list. Any changes the function makes to the list will not be reflected outside of the function. A reference on the other hand does allow the function to change the original list.

Re-reading that particular section in the documentation, I think you are right: it fails to mention that instead of an list reference, you can also provide a list (in which case the original tag list will not be modified).
(usually you'd like the reference way though since that will tell you exactly what tags it found).
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

Quote from: Hayo Baan on May 30, 2018, 04:51:46 AM
Re-reading that particular section in the documentation, I think you are right: it fails to mention that instead of an list reference, you can also provide a list (in which case the original tag list will not be modified).

This is the relevant excerpt from the documentation:

        The remaining scalar arguments are names of tags for requested information.

But then you have to know that passing an entire array (eg. "@tags") is exactly the same as passing all (scalar) items separately.

Usually this is how it is done.  For example

$exifTool->ImageInfo($file, 'Copyright', 'Description');

is the same as

my @tags = qw(Copyright Description);
$exifTool->ImageInfo($file, @tags);


But if you do this (pass as a reference):

my @tags = qw(Copyright Description);
$exifTool->ImageInfo($file, \@tags);


then the @tags list is updated with a list of the tag keys that were actually extracted.  Here is another way to do the same thing:

my @tags;
$exifTool->ImageInfo($file, 'Copyright', 'Description', \@tags);


Here @tags will also return a list of extracted tag keys.

The main purpose of passing a list reference is to receive a list of extracted tag keys:

        On return, this list is updated to contain an ordered list of tag keys for the returned information.

As a convenience it may also be used to pass in a list of requested tags:

        On entry, any elements in the list are added to the list of requested tags.

- Phil

Edit:  Done my edits now.  Hayo, you are very quick in responding!
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Quote from: Phil Harvey on May 30, 2018, 07:27:02 AM
Quote from: Hayo Baan on May 30, 2018, 04:51:46 AM
Re-reading that particular section in the documentation, I think you are right: it fails to mention that instead of an list reference, you can also provide a list (in which case the original tag list will not be modified).

This is the relevant excerpt from the documentation:

The remaining scalar arguments are names of tags for requested information.

Ah, I overlooked that one! (and you clarify the difference between @arg and \@arg better than me too)
Hayo Baan – Photography
Web: www.hayobaan.nl