HEIC 'Out of memory' in fast-mode

Started by lon9man, September 27, 2024, 11:28:56 AM

Previous topic - Next topic

lon9man

Hello Phil Harvey and everyone!

system type: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
exiftool version: 12.96 (also tried latest)
command line: described further
console output: described further
image file: sent to you directly (philharvey66 at gmail.com). email subject "memory issue IMG_1388.HEIC"


trying to use exiftool to get metadata from image files.

wanted to use exiftool in fast-mode to prevent complete download, because files located in S3.

i have development-server (RAM 2 Gb, free 1 Gb).
i have several HEIC-files, which throw error Out of memory!.
will show 1 file IMG_1388.HEIC - 2,1 MB (2 095 868 bytes).
tried different scenarios using PHP, Python in AWS Lambda, bash. results are same.

simplified examples in bash:
1. local file
exiftool -fast -all -s -json /home/user/heic/IMG_1388.HEICworks normally

2. local file using stream (BUT other ways using stream in PHP/Python also throws same error)
cat /home/user/heic/IMG_1388.HEIC | exiftool -fast -all -s -json -Out of memory!

tried using CURL with and without pv-utility to change size of stream chunk (tried different chunk size).
curl -s http://xxxxxx.com/heic/IMG_1388.HEIC | pv -q -L 1024000  | exiftool -fast -all -s -json -Out of memory!

after that i added to cli-command option -v2/-v3.
results in the attachments.

i noticed records with high memory-size at the end of output is for -v2:
  Unknown_0xa44e
  - Tag '\x00\x00\xa4N' at offset 0x1c200 (4248569128 bytes)

for -v3:
  Unknown_0xb3b6e000
  - Tag '\xb3\xb6\xe0\x00' at offset 0x1fefb3 (3343623727 bytes)
  Warning = Truncated '\xb3\xb6\xe0\x00' data at offset 0x1fefab


on desktop with RAM 48 Gb both examples works as expected. BUT it seems it should work USING lower RAM also for both cases (local file or fast-mode using URL).

thanks!


Phil Harvey

For HEIC format, ExifTool is likely buffering the entire file in memory when reading from a pipe.

Maybe there is something I can do about this.  I'll take a closer look when I get time.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

It definitely looks like a bug in ExifTool.  I'll let you know when I know more.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

This was a tricky problem.  I should maybe not allow FastScan in HEIC files, but instead what I'll do is issue a "Seek error" warning and some of the metadata may not be returned if it required seeking backwards in the pipe.  Expect this patch in ExifTool 12.98.

To fix this properly would require a complete restructuring of the QuickTime-format file parsing code, which isn't going to happen.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

lon9man

#4
hello Phil, thanks for the fastest reaction!

issue questions:
1. so it will work in version 12.98 without error, but will return metadata partially?
2. will it skip some concrete subset of possible tags OR it is random list of tags?
3. is it issue related strictly to HEIC or other file extensions also?

abstract questions:
a) i noticed that exiftool in fast-mode returns lower count of metadata than in basic-mode. is it true?
b) if a) == yes - then how exiftool understands that it can stop looping through file? and what subset of tags it potentially can skip (concrete or random)?
c) is it possible to predict needed memory usage for exiftool (for example AWS Lambda needs setting MemorySize. if it will be low - exiftool will fail, high - wasting resources/cost)? is it related to the fact which file-type or mode used (fast or basic)?

thanks!

StarGeek

Quote from: lon9man on October 03, 2024, 05:00:14 AMa) i noticed that exiftool in fast-mode returns lower count of metadata than in basic-mode. is it true?

For the most part, yes, but it depends upon the command. The docs on the -fast option explain each level. But if you include a tag name that requires ignoring the -fast option, then that overrides it.

At the default level which you are using, the docs say
QuoteExifTool will not scan to the end of a JPEG image to check for an AFCP or PreviewImage trailer, or past the first comment in GIF images or the audio/video data in WAV/AVI files to search for additional metadata
which implies that using -fast is only going to affect JPEGs, GIFs, WAVs, and AVIs. HEIC files are MP4 based (QuickTime-format), you would have to step up to -fast2 for a speed increase which might not extract all the data.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

lon9man

StarGeek, thanks.

maybe you have some test image files with different file types, which have big set of different metadata already populated to use for tests?

StarGeek

You can find a jpeg in this post. It is zipped because there is so much text that the forum didn't recognize it as an image.

There is an ARGS file (see the -@ (Argfile) option) in that post as well, so you can create your own test files for any file type.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Quote from: lon9man on October 03, 2024, 05:00:14 AM1. so it will work in version 12.98 without error, but will return metadata partially?

Yes.

Quote2. will it skip some concrete subset of possible tags OR it is random list of tags?

I think this may depend on the structure of the specific HEIC file, but for your specific example only the MediaData tags are missing (which is to be expected anyway with the -fast option).

Quote3. is it issue related strictly to HEIC or other file extensions also?

HEIC only.

Quotea) i noticed that exiftool in fast-mode returns lower count of metadata than in basic-mode. is it true?

StarGeek answered this.

Quotec) is it possible to predict needed memory usage for exiftool (for example AWS Lambda needs setting MemorySize.

No.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

lon9man

big thanks for your valuable and fastest help!
will check results slightly later