Main Menu

Normalizing dates

Started by Tom Cunningham, February 23, 2013, 11:30:29 PM

Previous topic - Next topic

Tom Cunningham

I have a large set of JPEGs dating from the 1920s to current.  Pictures prior to 2000 are scanned with Epson SCAN via GIMP, then post-processed in Picasa.  Pictures during and after 2000 were taken with various models of digital cameras and may also be post-processed with Picasa (for face recognition, geotagging, etc.).  What I am trying to do is get as much content information into the metadata as I can so that a variety of software on various platforms will be able to display, manipulate, and manage the images.  So redundancy in the metadata is not an issue with me, if it means more packages will be able to handle these images.

I use a filenaming convention that looks like YYYY[-MM[-DD]] NNN XXXXXXXX.jpg, where YYYY-MM-DD is the date the picture was originally taken (with missing months or days filled with spaces), NNN is an arbitrary 3-digit sequence number, and XXXXXXXX is descriptive information (who, where).  I have used exiftool to duplicate the filename (w/o extension) into exif:imagedescription, exif:usercomment, iptc:caption-abstract, and xmp:description.  I have also used ExifTool_config_convert_regions to duplicate region information for face data.

Now I am at a point where I want to normalize metadata dates, of which there are myriad, but I am really only interested in a subset.  The main thing I want to do is put the date the picture was taken (the date at the beginning of the filename) into the metadata so that its semantics are preserved, e.g. associate the date with a tag that means "this is the date the picture was originally taken," at least as interpreted by a preponderance of software packages.  If that means associating the date with more than one tag that's fine, as long as it doesn't become ambiguous.  The digital camera dates usually match the filename date, but the scanned photos have dates that record the scan, not when the picture was taken.  Still, it would be nice to preserve the scan date as well.  I'm not that concerned with modification dates, as those might change whenever editing software touches the file.  BTW, I have tried the nifty feature of setting a date using the exiftool filename parsing capability (-datetimeoriginal<filename) but the sequence number screws up the parsing, since exiftool thinks that is part of the date/time.

I have also read several posts about partial dates.  Since many of my pictures have only partial dates (no times except from digital cameras), one suggestion was to use zero-fill for missing fields in the date and '99'-fill for missing fields in the time, so as not to confuse software reading the metadata.  Is this a reasonable approach, or would that further confuse metadata-processing software?  Thanks.

Phil Harvey

There are a number of ways to do this.  XMP supports partial dates, so you could truncate the filename at the first space then write this directly to XMP-photoshop:DateCreated:

exiftool "-xmp:photoshop:datecreated<${filename;s/ .*//}" DIR

(the above quoting is for Windows.  Use single quotes instead if you are on Mac or Linux)

To fill the corresponding EXIF fields, I wouldn't suggest writing 99's to the time fields because this would give an invalid time.  Better to write 00's I think.  You fill missing fields with 0's like this:

exiftool "-exif:datetimeoriginal<${filename;s/ .*//} 00000000" DIR

technically, missing fields in EXIF date/time values may be filled with spaces.  You can do this if you want, but you must use the -n option in this case to avoid ExifTool's date/time parsing.

Also, I would suggest taking a look at the MWG tags if you are looking for maximum compatibility with various applications.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Tom Cunningham

Phil, thanks for the pointer to the MWG tags, that's interesting.  I noticed you used DIR in your example, does that mean exiftool will work on an entire directory of files?  Right now I am using find in a shell script to isolate the JPEGs recursively through subdirectories.  Does exiftool have equivalent functionality?  Also, because of the output of find I'm using rather arcane shell syntax to strip off the leading path and sed to remove the extension.  Looks like with exiftool I could do:

    exiftool "-mwg:description<${filename;s/\.[Jj][Pp][Gg]//}" DIR

to set EXIF:ImageDescription, IPTC:Caption-Abstract, and XMP:Description in one shot, is that correct?  Looks like I would still have to add -exif:usercomment independently, though.

I'll probably use your nifty trick for zero-filling missing date and time fields.  8)  I'm still a little concerned about writing over dates included by scanning software, or even cameras, since that would be the "digitized" date and might be useful.  But I'll play with the MWG tags and see what happens with my various files.  Thanks again, and thanks for a great tool.

Phil Harvey

Quote from: Tom Cunningham on February 24, 2013, 04:38:20 PM
I noticed you used DIR in your example, does that mean exiftool will work on an entire directory of files?

Yes, including sub-directories if you add the -r option.

QuoteRight now I am using find in a shell script to isolate the JPEGs recursively through subdirectories.  Does exiftool have equivalent functionality?

Yes.  -ext jpg

QuoteLooks like with exiftool I could do:

    exiftool "-mwg:description<${filename;s/\.[Jj][Pp][Gg]//}" DIR

Yes, or s/\.jpg$//i may be a bit easier to type (I also included the "$" to match only at the end).

Quoteto set EXIF:ImageDescription, IPTC:Caption-Abstract, and XMP:Description in one shot, is that correct?

Sort of.  It will write EXIF and XMP, but only write IPTC if there was already IPTC in the file.

QuoteThanks again, and thanks for a great tool.

You're welcome. :)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).