Some IPTC/XMP related questions in developing CSV workflow

Started by 3design, November 10, 2012, 02:55:05 PM

Previous topic - Next topic

3design

Well that's something I didn't try. Thanks for following up on this; hopefully others will find it useful as well.

3design

Finally had a chance to sit down and almost finalize my workflow.. I have one additional question which I haven't been able to solve so far.. I'm using the following command to extract a specific list of tags, from a specific list of files, into a csv:

exiftool -if "$filename =~ /^CONSTANT.*STRING.*\.tif$$/" -csv -f -a -s -G -charset UTF8 -Composite:DateTimeCreated -Composite:DateTimeOriginal -Composite:ImageSize -EXIF:Artist -EXIF:BitsPerSample -r . -ext tif > MASTER_TagList.csv

Even with the -f and -G switches, the group name is still not included for empty tags. So for, example, the resulting list of field headers for this list, is:

SourceFile,DateTimeCreated,DateTimeOriginal,Composite:ImageSize,Artist,EXIF:BitsPerSample

Is there a way to also force group names for tags without content? IOW, in the above list, to force the field header EXIF:Artist instead of just Artist...

EDIT:
Another problematic situation which I think might be related.. When running the same command above, on brand new images which haven't yet been tagged:

exiftool -if "$filename =~ /^CONSTANT.*STRING.*\.tif$$/" -csv -f -a -s -G -charset UTF8 -IPTC:Headline -XMP:Headline -r . -ext tif > MASTER_TagList.csv

The resulting csv contains fields:
SourceFile,Headline

So in this instance there are two problems.. The IPTC: and XMP: group names aren't being added to the field headers, because those fields are empty in the file. AND, no matter what I've tried, the field is only being created once since the description is the same, even though I'm including the -a switch. Ideally, I'd like to see the following output in the CSV:
SourceFile,IPTC:Headline,XMP:Headline

Even if both headline tags are empty in the images themselves.

Phil Harvey

Sorry, but what you are asking is not possible in general.  ExifTool doesn't know the group names until it reads the file.  The groups are dynamic, and even though you specify a group when you request the tag, this is no guarantee that the group you print will be the same.  For example:

exiftool -exif:artist -G1 -f FILE

will likely return "IFD0:Artist", or maybe "ExifIFD:Artist" since some software erroneously stores Artist in the EXIF IFD.  If the Artist tag doesn't exist, ExifTool doesn't know the group, so it can't print it.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

I see.. and using a command such as

exiftool -if "$filename =~ /^CONSTANT.*STRING.*\.tif$$/" -csv -f -a -s -G -charset UTF8 -1IPTC:Headline -2XMP:Headline -r . -ext tif > MASTER_TagList.csv

wouldn't work? I recall reading someplace that tag groups could be specified in that way. But I see how both tags being empty (as yet untagged) in the first place is causing the problem.

I had convinced myself that I'd be able to pull a consistent set of CSV field headers from all the files I'm working with, and it's looking like that isn't going to work, so it's a setback in creating a master csv to store everything in. Ideally I'd like to be able to just append new csv files into an existing csv that contains the master set of fields, as I read additional directories, without having to hunt down which columns exist and which don't and which need to be shifted and which need to be removed and which need to be added or renamed. I'm thinking there might not be a way to do that.

Phil Harvey

You mean -0IPTC:Headline -0XMP:Headline.  Technically, yes, in this case ExifTool does have enough information to know the -G0 group names (aside from the case, which would have to be taken from the command line).  However, this is a very special case and would require dedicated code (and try to explain this one in the documentation!).  Also, I'm not sure if it makes sense to allow ExifTool to output arbitrary group names (ie. -0SomeNon-ExistentGroup:Headline).

Instead, why not create a (small) dummy file with all of the information you want, then parse this along with the files you are interested in.  Then the output is consistent and all you have to do is delete one row.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

Actually, that's essentially what I've done.. I created a template image in the top folder of any directory structure I read in. However, when reading-in different folder structures with images created over many years, additional tags are constantly popping up that don't exist in the template. It's a headache when a master csv file contains columns from A - EQ and you think you grabbed all the tags, then you read in a new folder and the resulting columns go from A - ES.. now you have an extra 2 columns in the new data.. what are they? Where are they? Now you have to hunt all 150 or so columns and find which ones are missing, then create them in the master csv so columns don't get skewed if you copy/paste the new data into the existing file.

I've also tried running that template image file against an explicit command line containing all the tags I want to grab.. the command line is basically a huge paragraph of tags

exiftool -csv -f -G -charset UTF8 [MEGA LIST OF TAGS HERE] -r . -ext tif > MASTER_OUTPUT.csv

But if it attempts to read tags (i.e. IPTC:Headline and XMP:Headline) from the template *and* from brand-new untagged files, then instead of just:

IPTC:Headline,XMP:Headline

the resulting field headers in the csv become:

Headline,IPTC:Headline,XMP:Headline

because while it pulls the IPTC: and XMP: field headers from the template image, it also combines those same two nonexistent Headlines into one 'Headline' field for all the untagged files. That kind of situation just throws a wrench into streamlining the import of new data because additional columns in the new data keep popping up to skew the total # of columns.

I suppose I can just read all ~20,000 files in one run, so all the possible tags are aggregated at once, but the problem is new images are created on an almost daily basis, so the potential for "rogue" additional columns is always there. What I was hoping for was a way in which to read, for example, IPTC:Headline and XMP:Headline from an image and create field headers entitled IPTC:Headline and XMP:Headline in the csv (whether those tags are in the image or not) but if they're not in the image, *do not* create the plain old "Headline" field in the csv. I think that would be a step closer to creating a consistent output each time.

(sorry for the long post, trying to brainstorm a solution to this as I run through it)

Phil Harvey

Quote from: 3design on November 21, 2012, 04:11:50 PM
But if it attempts to read tags (i.e. IPTC:Headline and XMP:Headline) from the template *and* from brand-new untagged files, then instead of just:

IPTC:Headline,XMP:Headline

the resulting field headers in the csv become:

Headline,IPTC:Headline,XMP:Headline

This won't happen without the -f option.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).