ExifTool Forum

ExifTool => The Image::ExifTool API => Topic started by: raj on November 06, 2013, 05:44:39 AM

Title: Image::ExifTool vs exiftool performance on multiple images
Post by: raj on November 06, 2013, 05:44:39 AM
Hi Phil,

Great set of tools !! I've written a script to parse a directory of jpg files. It makes the following call in a for loop:

my $exiftool = new Image::ExifTool;
for my $img(@images) { # IO::All objects
  [..] # to get full path $filename from $img->file
  my $data = $exiftool->ImageInfo($filename, \@tags);
  [..] # do something with $data
}


I notice it starts off fast - several images per second - then slows down as the CPU hits 100%. It's taking about 90 secs for a directory of 60 images.

So then I tried using exiftool in a system call in batch mode and found it to be way faster (a few seconds to do the same job):

qx/$exiftool $img_dir $tags -s -j/;

It's a bit of a faff outputting to json then using JSON::Parse to get a hash, but is well worth it for the performance improvement. But is there any way to get the same performance directly from the Image::ExifTool object? It doesn't seem to accept a list of filenames. Apologies if this has been asked before - I couldn't find anything on a forum search.
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: Phil Harvey on November 06, 2013, 07:41:14 AM
Your script is exactly the way it is done in the exiftool application.  The poor performance of your script may be one of two things:

1) A different version of Perl with worse memory handling.

2) Your script allocates variables that are never freed, resulting in excessive memory usage over time.

- Phil
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: raj on November 06, 2013, 09:19:44 AM
OK, thanks for quick reply.

Perl is 5.10.1 (Debian 6)

The full loop is:

my @images = io($img_dir)->all_files;
my $exiftool = new Image::ExifTool;

my %exifdata;
for my $img(@images) { # warn $img; # stringifies to ->name;
   my $filename = $img->filename;
   my $data = $exiftool->ImageInfo($img_dir . '/' . $filename, \@tags);
   
   my $ImgWidth  = $data->{ImageWidth};
   my $ImgHeight = $data->{ImageHeight};
   
   my $orientation = $ImgWidth > $ImgHeight
      ? 'landscape' : 'portrait';
   push @{ $exifdata{$orientation} }, $data;
}


The only thing I can see that would cause memory to be retained is pushing the $data hashref into %exifdata in the loop, but that's pretty standard perl, and I only grep 15 elements in @tags so each memory allocation should be tiny. I'm running on a 1GB mem linode, so I would not expect memory issues here, and monitoring top I don't see any change in free or swap memory over the duration the script runs. It seems to be more of a CPU utilisation issue. Creating a new Image::ExifTool object for each image is actually slower.

Timing each cycle of the loop using Time::HiRes shows what is happening:

IMG_2611.JPG: 0.122 sec
IMG_2612.JPG: 0.050 sec
IMG_2613.JPG: 0.056 sec
IMG_2614.JPG: 0.061 sec
IMG_2615.JPG: 0.065 sec
IMG_2616.JPG: 0.074 sec
IMG_2617.JPG: 0.085 sec
IMG_2618.JPG: 0.089 sec
IMG_2619.JPG: 0.094 sec
IMG_2620.JPG: 0.100 sec
IMG_2621.JPG: 0.108 sec
IMG_2622.JPG: 0.117 sec
IMG_2623.JPG: 0.138 sec
IMG_2624.JPG: 0.166 sec
IMG_2625.JPG: 0.179 sec
IMG_2626.JPG: 0.187 sec
IMG_2627.JPG: 0.194 sec
IMG_2628.JPG: 0.207 sec
IMG_2629.JPG: 0.259 sec
IMG_2630.JPG: 0.240 sec
IMG_2631.JPG: 0.247 sec
IMG_2632.JPG: 0.265 sec
IMG_2633.JPG: 0.292 sec
IMG_2634.JPG: 0.293 sec
IMG_2635.JPG: 0.322 sec
IMG_2636.JPG: 0.312 sec
IMG_2637.JPG: 0.335 sec
IMG_2638.JPG: 0.364 sec
IMG_2639.JPG: 0.397 sec
IMG_2640.JPG: 0.428 sec
IMG_2641.JPG: 0.453 sec
IMG_2642.JPG: 0.550 sec
IMG_2644.JPG: 0.539 sec
IMG_2647.JPG: 0.548 sec
IMG_2648.JPG: 0.602 sec
IMG_2649.JPG: 0.630 sec
IMG_2651.JPG: 0.672 sec
IMG_2652.JPG: 0.686 sec
IMG_2653.JPG: 0.680 sec
IMG_2654.JPG: 0.705 sec
IMG_2655.JPG: 0.746 sec
IMG_2656.JPG: 0.830 sec
IMG_2657.JPG: 1.001 sec
IMG_2660.JPG: 0.935 sec
IMG_2661.JPG: 0.946 sec
IMG_2662.JPG: 0.825 sec
IMG_2663.JPG: 0.965 sec
IMG_2664.JPG: 0.942 sec
IMG_2665.JPG: 1.041 sec
IMG_2666.JPG: 1.150 sec
IMG_2667.JPG: 1.421 sec
IMG_2668.JPG: 1.383 sec
IMG_2669.JPG: 1.271 sec
IMG_2670.JPG: 1.348 sec
IMG_2671.JPG: 1.450 sec
IMG_2672.JPG: 1.558 sec
IMG_2673.JPG: 2.251 sec
IMG_2674.JPG: 1.894 sec
IMG_2675.JPG: 1.882 sec
IMG_2676.JPG: 1.852 sec
IMG_2681.JPG: 2.188 sec
IMG_2682.JPG: 2.623 sec
IMG_2683.JPG: 2.324 sec
IMG_2684.JPG: 2.323 sec

That is something in the region of a 40-fold difference between the first few images and the final few. Have you ever observed or heard of this kind of behaviour before?
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: Phil Harvey on November 06, 2013, 10:16:11 AM
Ah.  That's the problem.  Take a look at the size of your @tags array for each iteration.

When ImageInfo() returns, @tags will be filled with the corresponding tag keys, not tag names.  You shouldn't re-use this array.

Instead of passing the reference \@tags, pass the tags themselves: @tags

- Phil
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: raj on November 06, 2013, 10:41:05 AM
That did it - many thanks :) Not sure why but my initial use of a tags array (on a very small dir of images) failed so I tried passing a ref to it as the 2nd arg and it worked. I didn't realise the tags array was growing on each cycle. Now it's so much faster. Thanks again.
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: mrbrahman on May 29, 2018, 11:44:43 PM
I had this exact problem today, and thanks to this thread was able to solve it.  :D

Being new to Perl, I'm not sure I understand what's happening with @tags vs \@tags. Can someone shed some light?

BTW, I think I picked the syntax (to use \@tags) from http://search.cpan.org/~exiftool/Image-ExifTool-10.80/lib/Image/ExifTool.pod (http://search.cpan.org/~exiftool/Image-ExifTool-10.80/lib/Image/ExifTool.pod) ImageInfo Section. If that's not what is preferred, may be it's a good idea to update the doc at some point?

Thanks!
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: Hayo Baan on May 30, 2018, 04:51:46 AM
Quote from: mrbrahman on May 29, 2018, 11:44:43 PM
I had this exact problem today, and thanks to this thread was able to solve it.  :D

Being new to Perl, I'm not sure I understand what's happening with @tags vs \@tags. Can someone shed some light?

BTW, I think I picked the syntax (to use \@tags) from http://search.cpan.org/~exiftool/Image-ExifTool-10.80/lib/Image/ExifTool.pod (http://search.cpan.org/~exiftool/Image-ExifTool-10.80/lib/Image/ExifTool.pod) ImageInfo Section. If that's not what is preferred, may be it's a good idea to update the doc at some point?

Thanks!

The difference between @tags and \@tags is that the former is the whole list, the latter is just a reference to that list. This difference becomes important when calling functions. When you call a function with a list argument (e.g. @tags), the function basically gets a copy of the original list. Any changes the function makes to the list will not be reflected outside of the function. A reference on the other hand does allow the function to change the original list.

Re-reading that particular section in the documentation, I think you are right: it fails to mention that instead of an list reference, you can also provide a list (in which case the original tag list will not be modified).
(usually you'd like the reference way though since that will tell you exactly what tags it found).
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: Phil Harvey on May 30, 2018, 07:27:02 AM
Quote from: Hayo Baan on May 30, 2018, 04:51:46 AM
Re-reading that particular section in the documentation, I think you are right: it fails to mention that instead of an list reference, you can also provide a list (in which case the original tag list will not be modified).

This is the relevant excerpt from the documentation:

        The remaining scalar arguments are names of tags for requested information.

But then you have to know that passing an entire array (eg. "@tags") is exactly the same as passing all (scalar) items separately.

Usually this is how it is done.  For example

$exifTool->ImageInfo($file, 'Copyright', 'Description');

is the same as

my @tags = qw(Copyright Description);
$exifTool->ImageInfo($file, @tags);


But if you do this (pass as a reference):

my @tags = qw(Copyright Description);
$exifTool->ImageInfo($file, \@tags);


then the @tags list is updated with a list of the tag keys that were actually extracted.  Here is another way to do the same thing:

my @tags;
$exifTool->ImageInfo($file, 'Copyright', 'Description', \@tags);


Here @tags will also return a list of extracted tag keys.

The main purpose of passing a list reference is to receive a list of extracted tag keys:

        On return, this list is updated to contain an ordered list of tag keys for the returned information.

As a convenience it may also be used to pass in a list of requested tags:

        On entry, any elements in the list are added to the list of requested tags.

- Phil

Edit:  Done my edits now.  Hayo, you are very quick in responding!
Title: Re: Image::ExifTool vs exiftool performance on multiple images
Post by: Hayo Baan on May 30, 2018, 07:30:27 AM
Quote from: Phil Harvey on May 30, 2018, 07:27:02 AM
Quote from: Hayo Baan on May 30, 2018, 04:51:46 AM
Re-reading that particular section in the documentation, I think you are right: it fails to mention that instead of an list reference, you can also provide a list (in which case the original tag list will not be modified).

This is the relevant excerpt from the documentation:

The remaining scalar arguments are names of tags for requested information.

Ah, I overlooked that one! (and you clarify the difference between @arg and \@arg better than me too)