News:

2023-03-15 Major improvements to the new Geolocation feature

Main Menu

Best Practices for Scanned Historical Slide Metadata

Started by Sr Mas Alto, March 03, 2023, 07:10:26 AM

Previous topic - Next topic

Sr Mas Alto

Please review my best practice strategy and give me your thoughts on future-proofing, mapping to fields, technical evaluation, non-standard tags, how to simplify, etc.

I am scanning historical family slides. I want to add metadata in such a way that the self-documenting image can be used now and many years in the future. The EXIF standard is annoyingly not a standard implementation. For example, cloud image sharing apps such as Amazon Photos and Google Photos use different fields to display Date Taken. Adobe and Windows File Properties displays yet another set of fields. https://exiftool.org/forum/index.php?topic=6589.msg32862#msg32862

It is unknown what metadata future cloud image apps may use but I hope more powerful metadata searching and sorting will be common.

My strategy is to use standard file formats. First scan to TIFF, save edited files to JPG, and then tag the data in many different fields so that the various current and future apps and humans may find them. I want to run an easily-edited single non-procedural exiftool command that will process sets of files. I am not a programmer so I found the detailed examples in the forum to be very helpful. Here is the functional data I wish to store.

Filename - Example is "1970-08-01 101300 Andy Riding Cinnamon Our Horse at 1255 Wren St.jpg"
  • Date and time - Date is estimated and time is a batch serial number that will be converted to HH:MM:SS so the images sort correctly.
  • Names of the people in the photo
  • Context of the photo
  • Location of the photo

Metadata I wish to store.
  • Date and time from the filename. https://exiftool.org/faq.html#Q5
  • Location - 80% of these photos are in a limited number of locations so I put the address in each file name. Then I manually enter the GPS X,Y coordinates in the exiftool command that will run for all files matching that address. https://exiftool.org/faq.html#Q14
  • Image Title - The filename without the date and time and extension. https://exiftool.org/forum/index.php?topic=4779.0
  • Image Sub-Group - "Scanned from the slide collection of Warren and Roberta Heyer 1958-1987"
  • Image Group - "Heyer-Miller Family 1952-1996"
  • Photographer - "Photo Credit - Warren Heyer"
  • Copyright - "(C) 2023 Andrew Heyer CC BY-NC-SA 555.555.5555" https://creativecommons.org/share-your-work/
  • Scanned by - "Image scanned by Andrew Heyer"

I decided that I don't need to store keywords in the metadata since they can be included in the filename and title and are easily searchable that way. I can't think of any other metadata I need to add.

Now how should I map these functional requirements to the available fields? Are these the best fields to use? Note this command will update all files that match wildcard 1255 Wren St. For a different address I will manually change the GPS coordinates and wildcard and run the command again.

exiftool
"-EXIF:XPTitle<${filename;$_=substr($_,18);s/\.[^.]*$//}"
"-xmp-dc:title<${filename;$_=substr($_,18);s/\.[^.]*$//}"

-EXIF:XPsubject="Heyer-Miller Family 1952-1996"
-xmp-dc:description="Heyer-Miller Family 1952-1996"

-EXIF:XPComment="Scanned from the slide collection of Warren and Roberta Heyer"
-XMP-photoshop:Headline="Scanned from the slide collection of Warren and Roberta Heyer"
-XMP-iptcExt:Headline="Scanned from the slide collection of Warren and Roberta Heyer"

-exif:artist="Photo Credit - Warren Heyer"
-iptc:by-line="Photo Credit - Warren Heyer"
-xmp-dc:creator="Photo Credit - Warren Heyer"

-Xmp-dc:Contributor="Image scanned by Andrew Heyer"
-XMP-photoshop:CaptionWriter="Image scanned by Andrew Heyer"

"-alldates<filename"

-copyright="(C) 2023 Andrew Heyer CC BY-NC-SA 555.555.5555"
-xmp-dc:rights="(C) 2023 Andrew Heyer CC BY-NC-SA 555.555.5555"

-gpsposition="32.7187496, -117.0587441"
"*1255 Wren St.jpg"

I found this article useful in deciding which fields to use. Hopefully it is still current. https://exiftool.org/gui/articles/where_what.html

I really appreciate all the hard work that went into making exiftool such a powerful tool and to maintain such a useful forum. If I may suggest more detailed best practice example posts. Thank you!

StarGeek

Quote from: Sr Mas Alto on March 03, 2023, 07:10:26 AMThe EXIF standard is annoyingly not a standard implementation. For example, cloud image sharing apps such as Amazon Photos and Google Photos use different fields to display Date Taken.

While I haven't checked Amazon, Google Photos does read the main three EXIF time stamps and their XMP equivalents, in addition to seven other possible time stamps.  It is the most comprehensive with regards to date/time data of any program/site that doesn't give you access to all the individual tags.  It also does pretty well on the few other tags it reads, as well, IIRC, being one of the only apps that will read EXIF data in a PNG file.

QuoteAdobe and Windows File Properties displays yet another set of fields. https://exiftool.org/forum/index.php?topic=6589.msg32862#msg32862

The problem is that a lot of programs try to provide a simpler interface, reading from multiple tags to display a piece of metadata.  Trying to figure out all the tags that each program simplifies is a lot of work.  I know, I've tried.

But Adobe follows the standards and gives you the best access to the various tags and properly write corrisponding tags in different groups.  Even Bridge, which is free, allows writing of a lot of different metadata tags, though you may need to changes some settings to display all of them.

Throw out Windows as a consideration.  They do not follow the standards and also write to their Windows specific EXIF:XP* tags (XPTitle, XPSubject, XPKeywords, etc) which are not supported by pretty much any program except Windows.  I think I've only found one program, other than tools like exiftool and exiv2, that would read any of the XP* tags.

QuoteIt is unknown what metadata future cloud image apps may use but I hope more powerful metadata searching and sorting will be common.

To be honest, I believe this to be unlikely.  Cloud image storage sites are designed to make things easy and simple for the greatest number of people.  They are more likely to simplify things in the future rather than give greater control over the searches.

Flickr and maybe SmugMug (which now owns Flickr), are exceptions because they were originally designed for photographers who want to know things like ISO, FStop, etc.  Fun fact, Flickr uses exiftool on the back end.

There are a variety of self-hosted photo programs that can be installed on a web site.  These will usually give much greater control over displaying and searching through metadata.  I know there are several Google Photo clones that people started working on after Google changed their photo policies a few years ago, but I haven't check them out yet.

QuoteMy strategy is to use standard file formats. First scan to TIFF, save edited files to JPG

You might want to look at the NARA (National Archives and Records Administration) guidelines for digitizing (Google search) as a starting point, and make some changes from their. I believe they use uncompressed tiffs but I would either the LZW or ZIP compression to save some space. Definitely don't use JPG compression in a tiff files.  Here's example of Irfanview's tiff save options


For archival purposes, I would use tiffs, especially if you're able to scan them to 16 bit depth. While it significantly increases the size, it allows for better fine tuning of the image with programs  such as Lightroom.  Output developed images to jpegs, which could be used for quickly share/displaying/etc of the images.

Quoteand then tag the data in many different fields so that the various current and future apps and humans may find them.

The best and most future proof here is to follow the standards.  The IPTC Photo Metadata Standard would be the most important.  The EXIF standard would be less important for scanned images, but I would give priority to placing the date/time tags, copyright, and GPS tags in EXIF.  Most of that document is technical on the format of EXIF, but around page 32 is where it talks about the tags.  Related is the EXIF 2.31 standard for XMP.  While the Metadata Working Group appears to be defunct, their standard is still worth looking over.

QuoteI decided that I don't need to store keywords in the metadata since they can be included in the filename and title and are easily searchable that way. I can't think of any other metadata I need to add.

There are people who swear by this technique, including one very vocal person on reddit.  Their's a 30 minute or so video by someone on this that gets linked regularly, though I can't remember the details.

But there is one very large problem with this, especially if you use Windows.  A single crowd scene with many identifiable people, such as a wedding, leads to extremely long filenames.  For example this image (Cannes Photocall for "The Jury") has only nine people in it.  Adding the event leads to a filename like this only using "keywords"
The Jury Photocall, 75th Annual Cannes Film Festival, Ladj Ly, Vincent Lindon, Jasmine Trinca, Joachim Trier, Rebecca Hall, Deepika Padukone, Asghar Farhadi, Noomi Rapace, Jeff Nichols.jpg
That's 190 characters long by itself.  And Windows has limitations on how long a complete filepath, filename and directories, can be.  Google lists it as ~256 characters.  So in this case, there's only 66 characters left for directories/subdirectories, as well as the other info you want to put into the title.

Another limitation is that it disallows for a hierarchical structure for keywords, though it is up to you if you want to use this.  My own example here is that I take a lot of pictures of Cosplayers at comic conventions. I use a hierarchical to keep track of the various costumes and their sources.  Additionally, it allows me to add tags for other details, such as cosplayers playing characters of the opposite sex (gender bent), which are often among the most creative and interesting ones.  Here's an example of the hierarchy


QuoteNow how should I map these functional requirements to the available fields? Are these the best fields to use?

As I mentioned above, throw out the XP* tags.  That's locking you into an outdated standard that nobody uses.

Quote"-xmp-dc:title<${filename;$_=substr($_,18);s/\.[^.]*$//}"

Try to keep the tag names simple.  Exiftool will write them to the correct locations and it means a lot less bookkeeping for you.  In this case, write to just Title or include the main group name with XMP:Title.  Including the specific group name can lead to problems if your not careful, especially in the case of EXIF tags, as their specific groups aren't very obvious.  Especially if you base it upon faulty data.  For example, some cameras will write some EXIF data to incorrect specific locations and if you use that as the basis for future commands, it would end up as a lot of work to fix it when you realize the mistake.  I rarely use anything beyond EXIF:TAGNAME/XMP:TAGNAME/IPTC:TAGNAME.

Also, take a look at the MWG tags.  Using them will write data to the correct locations in multiple groups.  For example, writing to MWG:Creator will correctly write to EXIF:Artist, IPTC:By-line, and XMP:Creator.

QuoteI found this article useful in deciding which fields to use. Hopefully it is still current. https://exiftool.org/gui/articles/where_what.html

It's been a long time since I've seen that page. It's specific to Bogdan ExiftoolGUI, but giving it a quick look, it's probably still pretty relevant.  I'll have to double check when I get a chance.

One point which I agree heavily with is "Conclusion: Old IPTC is dead... time to move on" Unless a specific part of your workflow needs the old IPTC/IIM tags, skip explicitly writing to them.  In my case, I have to go the extra step because I use Irfanview and even though it has been requested now for some 15+ years, they still haven't included the ability to read XMP data.  So I end up writing IPTC data in addition to the XMP data.  IPTC so I can take a quick glance at the data, XMP because it's more future proof and most Digital Asset Management (DAM) programs will use it.

I'd also look into using a DAM program. It makes things a lot easier for most people.  And it will give you a lot more options for filtering metadata.  DigiKam is a good free and open source option.  Good paid options include Lightroom and IMatch, which uses exiftool on the back end and is what I use.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

(That response would make a full chapter in a book!)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Sr Mas Alto

Thank you, this is very helpful and gives me a lot to think about and research. I appreciate your insight.

Sr Mas Alto

#4
Below is my updated command based on your recommendations.
  • I removed all Windows EXIF:XP* tags and now use hopefully more standard tags.
  • Vuescan does compress my TIFF files via LZW.
  • I will run this exiftool command against sets of my TIFF files and let the metadata flow to the edited JPGs via PhotoShop Elements. I will save/archive both the unedited TIFF and the smaller JPG files.
  • I used the much more powerful MWG tags to tag groups of fields.
  • Putting keywords/names in the filename has been manageable so I won't be using keywords for these files. I plan to use keywords for some artwork images in the future.
  • I did misuse the -MWG:Keywords fields by adding a single phrase.
  • I will look into a DAM which would be very helpful. At some point I will be handing off my archive to a future administrator, a nephew or granddaughter so the handoff factors into the DAM decision.

Thank you again for your advice. I feel that I now have a future-proof method for self-documenting my scanned images.

Example filename "1970-08-01 101300 Andy Riding Cinnamon Our Horse at 1255 Wren St.tif"

exiftool
"-XMP:title<${filename;$_=substr($_,18);s/\.[^.]*$//}"

-MWG:Description="Heyer-Miller Family 1952-1996"

-MWG:Keywords="Slide collection of Warren and Roberta Heyer"

-MWG:Creator="Photo Credit - Warren Heyer"

-XMP:Contributor="Image scanned by Andrew Heyer"
-XMP:CaptionWriter="Andrew Heyer"

"-alldates<filename"

-MWG:Copyright="(C) 2023 Andrew Heyer CC BY-NC-SA 555.555.5555"

-gpsposition="32.7187496, -117.0587441"
"*1255 Wren St.jpg"


StarGeek

Quote from: Sr Mas Alto on March 07, 2023, 07:01:27 AMI will look into a DAM which would be very helpful. At some point I will be handing off my archive to a future administrator, a nephew or granddaughter so the handoff factors into the DAM decision.

One thing to make sure of is that the DAM saves the data into the file.  Some DAMs instead save the data into their own database.  Additionally, some DAMs (*cough*ACDSee*cough) save the data into their own programs specific tags. See Phil's comment on ACDSee tags.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).