Memory Usage Issue

Started by gerlin, April 06, 2010, 11:31:56 PM

Previous topic - Next topic

gerlin

I have a script that watches a directory and runs the following code when an image file is found.  Every time a new file is processed by the script (which runs constantly), the memory used by the script increases.  After several days a very large amount of memory is being used.

Here is the relevant code that is in the loop.  If I comment this whole section out, the memory growth stops, so I know it is related to my usage of ExifTool.

my $eTool = new Image::ExifTool;
$eTool->SetNewValue('EXIF:*' => );  # delete all EXIF
$eTool->SetNewValuesFromFile("$file", 'EXIF:*', Protected => 1); # Reimport EXIF from file
$eTool->SetNewValue(ImageDescription =>); # clear ImageDescription data
$eTool->WriteInfo("$file"); # write data back to file
undef $eTool;


I am probably just doing something dumb, do I need to do something else to destroy the extol object to free up memory after each loop?

I have tried the following just to see where the memory issue is, and the memory usage grows with this code as well, so I assume the issue is with the creation of a new object, not the actual manipulation of the data.

my $eTool = new Image::ExifTool;
#$eTool->SetNewValue('EXIF:*' => );  # delete all EXIF
#$eTool->SetNewValuesFromFile("$file", 'EXIF:*', Protected => 1); # Reimport EXIF from file
#$eTool->SetNewValue(ImageDescription =>); # clear ImageDescription data
#$eTool->WriteInfo("$file"); # write data back to file
undef $eTool;



Or, perhaps I should not keep creating a new ExifTool object and just clear it between loops? If so, what is the syntax to accomplish this?

FYI. The code is used to remove the "ImageDescription" data from the EXIF segment, let me know if there is a better way of doing this.

Any help is appreciated.

Lou

Phil Harvey

You can call SetNewValue() with no arguments to reset all values and re-use the ExifTool object.  However, I suspect you will have the same problem.  I'm not sure how Perl decides to do a garbage collection, but the memory use will grow until it does.

If you figure this out, let me know.  It hasn't been a problem for many people, but I have noticed that a long running script consumes a lot of memory.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Quote from: gerlin on April 06, 2010, 11:31:56 PM
FYI. The code is used to remove the "ImageDescription" data from the EXIF segment, let me know if there is a better way of doing this.

I should have looked at what you are trying to do.  If you just want to delete ImageDescription, you can execute this code once:

    my $eTool = new Image::ExifTool;
    $eTool->SetNewValue(ImageDescription); # clear ImageDescription data


then execute this code for each file:

    $eTool->WriteInfo($file); # delete ImageDescription from file


This will certainly help the situation because copying all tags takes a lot of memory if there is a lot of metadata in the image.   (Unless there was a specific reason that you wanted to rebuild the EXIF.)

- Phil

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

gerlin

Phil,

Thanks for you quick replay and your suggestions.

If I try this:

my $eTool = new Image::ExifTool;
$eTool->SetNewValue(ImageDescription); # clear ImageDescription data
$eTool->WriteInfo($file); # delete ImageDescription from file


I get the following message:

Bareword "ImageDescription" not allowed while "strict subs" in use at C:\Users\gerlin\Documents\SVNgerlin\PrPhotoHeader\prphotoheader.pl line 846.
Execution of C:\Users\gerlin\Documents\SVNgerlin\PrPhotoHeader\prphotoheader.pl aborted due to compilation errors.


I did not want to disable "strict", so I substituted the following:

my $eTool = new Image::ExifTool;
$eTool->SetNewValue(ImageDescription =>);
$eTool->WriteInfo($file); # delete ImageDescription from file


Can you confirm that this performs the same function?

As you thought, this does not stop the memory "growth" issue. Neither did trying having one ExifTool object and resetting it with SetNewValue().

Also as expected, I have had no luck (with much googling) to find a way to have perl give back the memory.

I tried forking the ExifTool code into another process, however since I am running on Windows it really goes into another thread and it seems that keeps building up memory usage as well.

I did not really want to do this, though at this point I have put my "ImageDescription removal" ExifTool code in another script which I call for each image file.  It does solve the issue.

FYI: I am calling ExifTool elsewhere in my main looping script to extract tag information like this:

sub getXMP
{
    my $file = shift;
    my $xRef = shift;
   
    # Get xmp info for file
    my $exifTool = new Image::ExifTool;
    my $info = $exifTool->ImageInfo("$file");
   
    foreach my $tag ($exifTool->GetFoundTags('Group0')) {
       
        # Unless IPTC field skip
        unless ($exifTool->GetGroup($tag) eq "XMP") { next }
       
        # Assign value
        my $val = $info->{$tag};
       
        # Unless there is value, skip
        unless (defined $val && $val ne "") { next }
       
        # Get real field name
        my $tname = Image::ExifTool::GetTagName($tag);
       
        # If value is array combine
        if (ref $val eq 'ARRAY') {
            my $combined = "";
            foreach (@$val) { $combined .= "${_}::" }
            $combined =~ s/::$//m;
            $val = $combined;
        }
       
        $xRef->{"$tname"} = "$val";
    }
}


So far as I have noticed, this seems to not cause any memory growth over time. So I guess it depends what ExifTool functions are being used.

I am using perl 5.8.9 on windows, ExifTool 8.15 (the latest available via activestate PPM).

Thanks again for you help with this.

Lou



Phil Harvey

#4
Sorry, yes.  I forgot to quote ImageDescription.  The following two commands are equivalent:

    $eTool->SetNewValue('ImageDescription');
    $eTool->SetNewValue(ImageDescription =>);


I did some more googling myself and ran some tests on OS X and it seems to be a system-dependent thing.  On OS X memory is reclaimed by the operating system periodically but my googling suggests that this isn't the case on all operating systems.  I don't understand why just creating and deleting the exiftool object in one section of your code causes memory to grow while using an exiftool object in the other section of your code doesn't.

You solution of calling exiftool for each file will add an overhead that it would be nice to avoid.   You maybe try forking into the exiftool process and have that process run for a while then terminate, and restart it again periodically from the main thread.

What sort of memory growth are we talking about?  In my tests on OS X I ran the exiftool application with "exiftool -imagedescription= DIR" on a directory containing two thousand images.  Attached is the plot of the memory usage during the run.  The memory usage jumped up periodically (probably when processing some of the larger TIFF images, some of which are close to 100MB) with a peak of 450MB, but it would usually drop back down although it seems that it gets stuck at certain levels.  After a thousand images it seemed to stabilize at about 200-250MB.

Also, what Perl are you using?  ActivePerl has had some know memory problems for some versions.  I don't know of any issues with 5.8.9 specifically, but if I were you I'd repeat these tests with 5.10.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

gerlin

I did try forking the process as I mentioned in the previous post (though I probably explained it poorly).  In Windows this actually creates a thread not an actual new process.  My results were the same as when I ran it without the fork, the memory grew a little each time I ran it.  It does not look like it cleans up memory when the thread is done.

I have two instances of my script running.  Each one is processing around 5,000 jpg photos a day.  Since I made the change about 20 hours ago, the memory has been sitting at 14MB.  It was a little higher at some point yesterday, so there is some memory cleanup going on.

Previous to the new version (that calls the external exif manipulation code), I was seeing memory usage over 1GB.  This was after the script had been running for a week or so and at that point I continued to see the memory usage grow.   As far as I could tell,  it did not look like the memory ever stopped growing. The reason I noticed the memory problem is that my script would exit after several days of use.  I have not found any reason for this (no perl errors, just stop).  I then noticed the memory and thought I should try to get this under control to see it it is the reason the script quits.

I am running ActivePerl 5.8.9.  I have thought about using 5.10, though have not tried it yet.

Lou

Lou