Writing metadata for digitisation projects

ozbigben · February 09, 2011, 03:52:10 AM

Not so much of a question as letting you know what I do with EXIFTool. I work for a large university in their digitisation service and have compiled a Filemaker database to run a number of commandline utilities for processing large volumes of files. I started looking at EXIFTool to try and get metadata into files before they went to archive but found a few other uses for it as well.

Firstly, we collect basic EXIF metadata from multiple computers driving scanners and use this to determine subsequent commands to generate, as well as PDF files from our OCR queues.

The other main functions we have for EXIFTool are sorting files (creates subdirectories if they don't exist and doesn't overwrite existing files) and writing whatever publication metadata we can get our hands on into the TIFFs and PDFs (mostly IPTC and XMP-PRISM). I recently rearranged things to do both of these at the same time, with the database generating customised metadata for every file as an fmt file for use with the -@ option. One of the advantages of splitting tasks between database scripts and commandline utilities is that it makes it easy to improve performance by simply splitting the list of files into 2 batch files and running them simultaneously.

Love EXIFTOol

Phil Harvey · February 09, 2011, 07:16:51 AM

Thanks for the post. ExifTool is very well suited to automated tasks such as yours. It is good to know that you are making such good use of it.

- Phil

ozbigben · February 22, 2011, 12:50:31 AM

Reading through the rest of these posts and learning lots

. I'm currently looking at what metadata fields are applicable/practical to library catalogue records and publications in general... and what tags to use to write them.

It's all relatively straightforward in theory but it can get messy in practise. In particular I'm trying to do as little parsing of data from our catalogues as possible, since the metadata we get is usually collected by someone simply copying and pasting into a spreadsheet. To give an example, author names are typically entered as "surname, given names" (often with birth/death dates, aliases etc...) and the best way to populate this without having it split into multiple authors is with -mwg:creator.

The mwg hierarchical keywords looks good for our hierarchical subjects (after I strip out the commas

) and the XMP:PRISM namespace has lots of publication-related fields that look useful. It's a pity our library catalogues usually don't contain the data nicely in separate fields.

News:

Writing metadata for digitisation projects

ozbigben

Phil Harvey

ozbigben