Perfomance question

Started by sebutzu, June 17, 2025, 04:05:24 PM

Previous topic - Next topic

sebutzu

I am comparing the performance of exiftool (with ImageDataHash computation) with a brute-force read all file and compute md5 (done in C#). Somehow in my case exiftool is at least 5 times slower (on multiple machines). I am just wondering, I assume exiftool reads just the bytes needed most of the times, doing more disk seeks, would it not help maybe in case of ImageDataHash to just read buffered the entire content of the file, and then do the rest of the processing on that buffer instead.
Also another question, I am running this on windows, any ideas if running it on linux would work much faster?

StarGeek

What is the command you are using? Make sure you're not looping exiftool, running it once per file. See Common Mistake #3, "Over-scripting".
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

ExifTool is extracting all the rest of the metadata as well. Adding -api ignoretags=all will help.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

sebutzu

#3
I use a command like:
-struct -m -q -q -charset filename=UTF8 -d "%Y.%m.%d %H:%M:%S" -c "%.8f" -a -use mwg -api largefilesupport=1 -s -P -api structformat=JSONQ -j -G:0:1 -all -ImageDataHash -n  "C:\Work\test photos\2011-12-06 11.08.37 0001.jpg" "C:\Work\test photos\2014-02-13 18.26.34 0001.mp4" "C:\Work\test photos\2022-03-05 12.16.14 0001.jpg" "C:\Work\test photos\2022-03-05 12.16.16 0001.jpg" "C:\Work\test photos\2022-03-05 12.16.18 0001.jpg" "C:\Work\test photos\2022-03-05 12.16.19 0001.jpg" "C:\Work\test photos\2022-03-05 12.16.28 0001.jpg" "C:\Work\test photos\2022-03-05 18.25.58 0001.jpg" "C:\Work\test photos\2022-03-05 18.25.59 0001.jpg" "C:\Work\test photos\2022-03-05 18.26.03 0001.jpg"

usually with 125 files or so.

I do need to extract all the other tags as well...so I don't want to use -api ignoretags=all

I can imagine doing the ImageDataHash is slow (because it needs almost all the file content), but I did not expect it to be like 5 times slower than a full md5 hash.

Am I doing something wrong here?

Does exiftool work on multiple threads?
Is there a way to speed up this?

Phil Harvey

You can run as many instances of ExifTool as you want simultaneously.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).