Writing IPTC Metadata to 500000+ Images

Started by wcams, March 07, 2014, 11:51:00 AM

Previous topic - Next topic

wcams

Ahoy ahoy,

I'm posting this to make sure I have the right idea before I venture down a dangerous road...

I'm working for a small archive and have been asked to 'backfill' 500000+ of our older images with our IPTC metadata. I have done some work with ExifTool for video metadata, and I thought it was relatively pleasant and efficient. And as I'm on something of a deadline, and I find csImage, what we started using in 2011 to process new images, to be pretty slow (3 seconds to add all IPTC fields to any single image of any size), ExifTool seems the way to go. At least I hope it is far more efficient than 3s per image.

From what I've researched the best way to go about my task is to correctly format all of the data I want to embed into these images into a document, CSV or JSON. I'm unfamiliar with JSON so I'll stick with CSV (unless you think JSON would be vastly better?). From there, I would have ExifTool reference the CSV using -tagsFromFile and write the data from the CSV into the images. Conceptually, simple. But is this the right thing to do? And if so, how long would this likely take? A few days? Weeks? Months? With csImage I calculated it would likely take 83 days. That's...not good.

Some other basics that might influence the answer:

- Windows environment.
- Some images may already have bits and pieces of IPTC metadata.
- Images are scattered in a file hierarchy on a single drive.

Thanks for your help.

edit:

Also, is it possible to read information from images that are in lots of different folders and write that information to a single output with one ExifTool command? For example, as a test I'm trying to read and write dimensions to a text file for a lot of images that are scattered throughout our folder structure. Ideally, I would feed ExifTool a CSV of the sourcefile, and it would return the height and width to a text document. I can see doing this by writing a batch script that contains individual commands for each individual image, but that would require using ExifTool for each command, which seems like a waste. Am I missing something simple, or do I need to stick with the batch file idea?

Maybe...


exiftool -csv=source.csv > out.csv


Where the source contains headers SourceFile, ImageWidth, and ImageHeight.

Phil Harvey

Sorry for the delay in responding.  I have just returned from vacation.

Since you are on a deadline, you may have already found another solution.

If not, ExifTool will probably do what you want.  It isn't the fastest out there, but I average somewhere around 0.1 sec per image when writing on my system.

Using a CSV file to deal with a huge number of images is maybe not the best solution, but I am sure we could find something that would work.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

wcams

Thanks for the response.

The project was sidetracked for a few weeks so I haven't worked on it since a day or two after that post. Still open to suggestions on how to tackle it.

Yea, CSV never felt right for this. I've heard it's somewhat poor when you're dealing with > 100000 rows. What would you suggest instead? Or more broadly, what would you suggest I do for any of this?

Thanks in advance.

Phil Harvey

You can do this in many different ways, whatever suits you best.  The trick for speeding things up is to do it all with a single exiftool command.  So using the -@ option combined with -execute to write different things to different images.  Then the simplest way to get the data into exiftool is to write the appropriate command-line arguments to the -@ argfile.

- PHil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).