Looking for some advice on developing a process to take information out of a spreadsheet and embed it into JPG files as (notionally) custom XMP tags.
This pertains to my scanned film library of ~ 15000 jpgs originating from ~ 500 film rolls.
The JPGs are named in the basic format of YYYY-MM-Roll_ID-Neg#.jpg
The spreadsheet is basically a descriptive inventory of the film rolls (Roll_ID) (rows) containing such text parameters (columns) as:
Year
Time Frame (e.g. "April -May", "Jun 1 to xx", etc.)
Subject / Comments (up to ~ 512 characters, including special characters)
Film Type (e.g., slide or negative)
Camera
Slide/Negative Location (e.g., "Binder Volume 4")
Print Location (e.g., "Photo Box 1986-87", "Lost", etc.)
etc.
Where each spreadsheet row pertains to a specific Roll_ID.
The common denominator between the jpg filenames and the spreadsheet data rows being Roll_ID, which is generally an "alpha prefix + number + optional alpha suffix", e.g., "P47", "P315a", "P315b".
In a nutshell, what I want to do is leverage the common Roll_ID parameter to copy selective spreadsheet parameters into corresponding Roll_ID jpg file custom tags. Basic reasons for wanting to do this.
1. To get that spreadsheet data into the files, rather then just in spreadsheet limbo.
2. Some of it can potentially be leveraged later into fine-grained DateTimeOriginal, filecomments, geotagging, etc. (IF I really want to put that sort of extra effort into it.)
Note that for every Roll_ID in the spreadsheet, there can be up to 37 jpg files, corresponding to how many frames are on the roll. That "EXPANSION" from ~ 500 Roll_IDs to ~ 15,000 jpgs is a significant consideration in the process.
The experience base I'm coming from is.
Rather a whiz at Excel (actually OpenOffice Calc)
Pretty good at working with "regular" tags in ExifTool, IF/AND conditions, Date/Time manipulation
Done quite a bit of ExifTool CSV output/input with spreadsheet manipulation in the middle.
Novice at -Tagsfromfile (which I expect I may be using in this process)
Barely literate in Powershell scripts
Enough background (hopefully). I'll now try to explain my notional overview of the process in simple terms.
Top level steps - Ignoring the EXPANSION consideration
1. Prototype in a SANDBOX folder(s) with a small sample.
2. Install the ExifTool "example.config file as .Exiftool_config to allow to "add new EXIF, IPTC, XMP, PNG, MIE and Composite tags"
3. Take a iniitial small set of JPGs and create desired new XMP (or EXIF?) tags with dummy data in them.
4. Export those tags to CSV file, import to OpenOffice Calc.
5. Puzzle out how to most efficiently match existing spreadsheet data cells to tag cells and do it.
6. Save applicable spreadsheet data to another CSV for ExifTool input.
7. CSV import back to original jpgs.
8. Scale up to the entire set of files.
Will the new tags get imported even if they don't already exist in the file?
CONSIDERING THE EXPANSION PROBLEM:
BASELINE: As outlined above I'll need an ~15,000 row CSV out > spreadsheet > CSV in, and have to fill all those rows with ~ 500 sets of Roll_ID unique sets of data.
ALTERNATIVE 1: Variation of the BASELINE to create 'just' ~ 500 Roll_ID template.jpg files, one for each roll of film, and run those through the CSV out > spreadsheet > C in process. Then use those as the -TagsFromFile source back to the ~ 15,000 real jpgs.
BUT that brings me to the question of:
"How am I going to match the ROLL_ID of each real jpg file back to one of the ~ 500 template jpg files?"
My answer at the moment is:
"I don't know. I think it's probably beyond inherent ExifTool capabilities. Probably going to take some scripting working utilizing an array (or hashtable?) of ROLL_ID to Template filename association about 500 x 2 in size. Something I have no experience in."
ALTERNATIVE 2: I figure that some whiz at shell scripting could bypass a whole lot of the the steps by just taking an array of data (maybe 500 x 5) I could generate directly from the spreadsheet and calling ExifTool to write the corresponding tag data directly to corresponding filenames. But that ain't me.
BOTTOMLINE: Sanity Check / Suggestions / Advice anyone?
Quote"How am I going to match the ROLL_ID of each real jpg file back to one of the ~ 500 template jpg files?"
My answer at the moment is:
"I don't know. I think it's probably beyond inherent ExifTool capabilities. Probably going to take some scripting working utilizing an array (or hashtable?) of ROLL_ID to Template filename association about 500 x 2 in size. Something I have no experience in."
Duh, wake up brain. I think it's as simple (more of less) as parsing the ROLL_ID out of the target filename into a variable and parsing that variable into the -tagsfromfile source filename.