Out of memory error when using stdin and stdout

Started by notmetadataguru, July 01, 2021, 11:50:30 AM

Previous topic - Next topic

notmetadataguru

Hi,

I have a very large TIFF file that I'm writing metadata to using stdin and stdout. I have exiftool running within a AWS Lambda function. The function has 3GB of memory allocated to it and the TIFF file is 2GB in size. It's my understanding that when streaming a file into exiftool and then streaming it back out (AWS S3 in my case) that we should see minimal memory usage, since the data is read and written in chunks. However, we're seeing exiftool consuming more memory than currently available and returning an "Out of Memory!" error.

These are the arguments that we're using:

-config /tmp/tags.config -v3 -j=/tmp/update_metadata.json -o - -

Is there anything that we're doing incorrectly?

UPDATES:
Some additional observations that I've made. I see exiftool buffering the entire file into memory (2GB), then it writes the metadata (at this point memory consumption doubles to 4GB) to the file and finally the data is piped to stdout.

Performing the same operation using the same 2GB file, but pointing to the file on disk, I see the memory consumption is low (around 200MB).

Thanks in advance!

Phil Harvey

If the file format requires the ability to randomly seek in the file then the entire file must be buffered in memory when rewriting.  TIFF is such a format.  There is no way around this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).