Exiftool script is killed before ending

Started by jgautier2, August 27, 2020, 02:24:34 AM

Previous topic - Next topic

jgautier2

Hello,

I use a bash script (on debian) with an exiftool command to find all files matching one condition in a very big directory : I want a list of images whose Credit tag is equal to "STRING".

The directory contains about 1 million jpeg files.

Here's my exiftool command :
exiftool -m -q -q -fast2 -if '$Credit eq "STRING"' -p '$filename' -ext jpg -r DIR > results.txt

The script does what is expected, but It seems to be killed before ending : all images are not processed, and results.txt usually ends with truncated filename.
Question 1 : for my purpose, is it the right / best way to do it ?
Question 2 : why this script stops before scanning the whole directory ? is there a way to fix that ?

Thanks a lot for answering.

JG

StarGeek

There's nothing obviously wrong with your command.  You'll need to do some troubleshooting on what's happening.  While I haven't run exiftool on a million files, I have run much more complex commands on over 500,000 files in one go without any errors.

Try running the command directly on the command line without the -m (ignoreMinorErrors) option (so you actually can see any errors) and redirect both the STDOUT and STDERR into a file with 2>&1 >results.txt.  Then take a look at the end of the file to see what might be going on at the end.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

jgautier2

Thanks for your help.

I run this command in my terminal :
exiftool -q -q -fast2 -if '$Credit eq "STRING"' -p '$filename' -ext jpg -r DIR 2>&1 > results.txt

The command stopped before scanning the whole directory.
In results.txt, I just got the list of founded files (no other message). The last line (the last filename) is truncated.
And in terminal, just a line containing "Killed" :

exiftool -q -q -fast2 -if '$filename !~ /MBERTRAND/' -if '$Credit eq "Martin\ Bertrand"' -p '$filename' -ext jpg -r /home/hansluca/hidef 2>&1 > corrupted_images8.txt
Killed


JG

StarGeek

From some quick googling, it looks like the Killed response the bash killing the process because it's using up too many resources or is too CPU intensive.  Understandable result for running over so many files.  Is this on a shared server of some kind?

You'll have to do some Googling to figure it out exactly.  This isn't something caused by exiftool directly.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

jgautier2

Yes, you're right (and I'm sorry : I should have googled this "killed" by myself...).

In different log files (kern.log, messages, syslog), I've found this kind of line :
Out of memory: Kill process 31282 (exiftool) score 814 or sacrifice child

It means that my exiftool command is getting too much memory, so the kernel kills it by the "OOM Killer".
About OOM Killer, you can read this for example : https://lwn.net/Articles/317814/
There's some ways to avoid that, besides increasing RAM : https://backdrift.org/oom-killer-how-to-create-oom-exclusions-in-linux

Thanks again for helping me to solve this (non exiftool) problem.

JG

StarGeek

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

dae65

Quote from: jgautier2 on August 29, 2020, 04:00:02 AM
Out of memory: Kill process 31282 (exiftool) score 814 or sacrifice child
This sounds like a memory leak in exiftool.

StarGeek

Well, he does say it's running on about a million files.  Might not be so much a leak as using a hell of a lot of memory to store all those file paths.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

dae65

@StarGeek: Yes, it might be that too. A Perl array of 1 million 255-character items take up ~300 MB of RAM, which is pretty much what my Firefox instance is using right now. Also, we don't know what else that bash script is doing.

@jgautier2: What do the following commands say, please?


$ average_path_length () { declare -i bytes; declare -i count; while read line; do count+=1; bytes+=${#line}; done< <(find "$@" -type f); echo bytes: $bytes; echo count: $count; echo average: $((bytes/count)); }
$ average_path_length /home/hansluca/hidef

Phil Harvey

Storing all of the file names would take a lot of memory, which will contribute to the problem, but ExifTool makes it past this stage and scans a number of files before it gets killed.  I have noticed that the Perl garbage-collection memory management tends to result in a lot of memory usage.  I don't think it is due to ExifTool memory leaks, but it would be a good idea for me to check this again just to be sure.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dae65

Perl's garbage collection system won't free up allocated memory when both hash A and hash B go out of scope but hash A holds a reference to hash B, and hash B holds a reference to hash A.

Phil Harvey

Yes.  That is the reason for an old (fixed) memory leak in ExifTool.  I had cleaned all of these up, but I should check again to be sure I haven't introduced more of these.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).