How many files is too many files?

Started by KSA, April 20, 2022, 05:05:09 PM

Previous topic - Next topic

KSA

Hi there!

I'm pretty new to exiftool and I am trying to grab some exif metadata from around 970,000 TIFF files in 21440 folders and write it to one json file, that I then plan on parsing using OpenRefine. I used the following:

exiftool -filename -createdate -filecreatedate -filemodifydate -filesize# -r -j "path\path\path\files" >C:\Users\path\exifscrape.json

I went ahead and ran this starting at the top most folder, optimistically hoping this would run for a couple hours and eventually work. I started it ‎Tuesday, ‎April ‎19, ‎2022, ‏‎11:51:22 AM and it looks like it is still running now, April ‎20, ‎2022, at 5:00:00 PM. My json file though has a last modified time of 5:57 AM this morning.

Do you think I asked too much of exiftool? Could it possibly still be running despite the date modified being several hours ago? I can't really create new folders and divvy up the folders into smaller chunks to run this in smaller batches. Any advice would be appreciated :)

StarGeek

I have run exiftool on about 100k files before and it took some time, but did complete.

TIFF files are often very big, and that will impact the speed.  Basically, running exiftool should take only a little more time than it would to copy all the files to another drive.  If all those files add up to a terabyte or more, it could takes quite some time.

Since you seem to be on Windows, you might open the Task Manager and look for exiftool's listing there.  Click the the "More Details" if it isn't already open and look for the "Disk" column for exiftool.  If it's still reading/writing, then I'd give it some more time.

If you want to re-start, then you might want to add the -fast option.  Specifically -fast2Phil's post here seems to indicate that -fast2 might work well with TIFF files
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

KSA

Thanks so much for your input. Just checked Task Manager and sadly exiftool isn't using Disk space at all, so I guess it did get hung up. I will try again using -fast2, thanks for the idea!

Phil Harvey

What system are you running?  On Mac/Linux this shouldn't be a problem.  On Windows, the restricted memory may cause slow-downs like this when processing a large number of files.  Running the alternate version of ExifTool for Windows may help.

In general, you should expect a processing speed of at least 10 files/sec for normal files.  I typically get about 20 files/sec on my 7-year-old Mac system here when run on 20 MB TIFF-based files.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

KSA

Hi Phil!

I'm on Windows. I just redid it using -fast as previously suggested and it got hung up again. It's interesting, the json files for my two attempts tapped out around the same size, 265 MB, I'm guessing because of the memory limit issue you mentioned. I'll give the Exiftool for Windows a try, thanks for the tip!

KSA

Breaking news: it appears to have actually worked without using the Windows version for Exiftool! My command prompt window never looked "done" (i.e. return to my username), but pulling the json file into OpenRefine and comparing against a directory of the files, it looks like all the files are there. Very exciting!

Phil Harvey

Just to clarify: You were already using the Windows version of ExifTool ("exiftool(-k).exe" downloadable from the ExifTool home page)?  I suggested using the alternate Windows version.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).