Some IPTC/XMP related questions in developing CSV workflow

Started by 3design, November 10, 2012, 02:55:05 PM

Previous topic - Next topic

3design

Hi,
I've been trying to develop a workflow to pull image data (descriptions, keywords, etc) from text files, into a csv, which I could then apply to all the images by inserting the tags directly from the csv. I have a few questions on structure and procedure. Hopefully someone here can help.

I'm using the following command to pull pre-existing data from images in subfolders into a temp csv file:
exiftool -csv -f -iptc:all . -r *.tif > out.csv
but it's not restricting its output to only the TIF files and I'm not sure what I'm doing wrong. Something is wrong in my syntax.

Next, I've output separate IPTC and XMP CSVs so I could poke around at the available tags. I found that *all* of the XMP fields
SourceFile,XMPToolkit,CreatorWorkEmail,CreatorWorkURL,CaptionWriter,Instructions,TransmissionReference,Marked,WebStatement,UsageTerms,Creator,Title,Rights,Rating

also exist in the IPTC fields, but that the list of IPTC fields also contains another ~50-75 tags which don't exist in XMP.

SourceFile,ApplicationRecordVersion,Artist,BitDepth,BitsPerSample,By-line,CaptionWriter,CodedCharacterSet,ColorComponents,ColorType,Compression,Contact,CopyrightNotice,Creator,CreatorWorkEmail,CreatorWorkURL,Credit,CurrentIPTCDigest,DateCreated,DateTimeCreated,DateTimeOriginal,Directory,EditStatus,EncodingProcess,ExifByteOrder,ExifToolVersion,FileAccessDate,FileModifyDate,FileName,FilePermissions,FileSize,FileType,Filter,Headline,ImageHeight,ImageSize,ImageWidth,Instructions,Interlace,JFIFVersion,Marked,MIMEType,ObjectName,OriginalTransmissionReference,OriginatingProgram,PhotometricInterpretation,PixelsPerUnitX,PixelsPerUnitY,PixelUnits,PlanarConfiguration,Prefs,ProgramVersion,Rating,ReleaseDate,ReleaseTime,ResolutionUnit,Rights,RowsPerStrip,SamplesPerPixel,Software,Source,SpecialInstructions,StripByteCounts,StripOffsets,TimeCreated,Title,TransmissionReference,UsageTerms,WebStatement,Writer-Editor,XMPToolkit,XResolution,YCbCrSubSampling,YResolution

Does this mean that if I fill my CSV with all the IPTC fields that also exist as XMP fields, I can write both IPTC and XMP at the same time and they would be synchronized? Or do I need to maintain a separate CSV for IPTC and for XMP?

It seems to me, since I'm just beginning this process, that it's safe to output just the IPTC fields, then edit the CSV to populate all the fields I need, and then use the one IPTC CSV to populate both IPTC and XMP on the way back in... but I'm not sure how I would go about writing IPTC and XMP in one shot...

I'm using OpenOffice Calc to manipulate my CSVs. When OO exports the CSV, the fields are comma-delim but also wrapped in double quotes. Are those quotes going to be inserted into the metadata fields as quotes or will they be stripped and only the content between the quotes will be written to the fields?

Sorry for the long first post... I found exiftool last night and lost half a night of sleep not being able to stop reading about it. :) It seems like it will be immensely helpful with this workflow. I have approx 20k images to tag, and the text file > CSV > exiftool method might just save me a few weeks of work!

Phil Harvey

Quote from: 3design on November 10, 2012, 02:55:05 PM
exiftool -csv -f -iptc:all . -r *.tif > out.csv
but it's not restricting its output to only the TIF files

The *.tif gives you all ".tif" files in the current directory only, and -r . gives you all files in the current directory and sub-directories.  Drop the *.tif and add -ext tif to do what you want.

QuoteNext, I've output separate IPTC and XMP CSVs so I could poke around at the available tags. I found that *all* of the XMP fields
SourceFile,XMPToolkit,CreatorWorkEmail,CreatorWorkURL,CaptionWriter,Instructions,TransmissionReference,Marked,WebStatement,UsageTerms,Creator,Title,Rights,Rating

also exist in the IPTC fields, but that the list of IPTC fields also contains another ~50-75 tags which don't exist in XMP.

I think you are confused about the difference between IPTC and XMP here.  The new IPTCCore and IPTCExt tags actually use XMP already.  ExifTool calls this XMP.  What ExifTool calls IPTC is the old IPTC-IIM format information.  Use -G with ExifTool to see where the tags are really stored.  If you really want to maintain synchronization with the old IPTC (IIM), then the "xmp2iptc.args" and "iptc2xmp.args" files in the full distribution may be useful to you.

QuoteI'm using OpenOffice Calc to manipulate my CSVs. When OO exports the CSV, the fields are comma-delim but also wrapped in double quotes. Are those quotes going to be inserted into the metadata fields as quotes or will they be stripped and only the content between the quotes will be written to the fields?

If the fields are properly quoted, then the quotes will be stripped.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

Quote from: Phil Harvey on November 11, 2012, 07:32:45 AM
The *.tif gives you all ".tif" files in the current directory only, and -r . gives you all files in the current directory and sub-directories.  Drop the *.tif and add -ext tif to do what you want.

Hi Phil,
Thanks for clarifying the syntax. I'll give that a try this afternoon and then read up some more on the command line syntax so I can learn what I was doing wrong the first time.

Quote from: Phil Harvey on November 11, 2012, 07:32:45 AM
I think you are confused about the difference between IPTC and XMP here.  The new IPTCCore and IPTCExt tags actually use XMP already.  ExifTool calls this XMP.  What ExifTool calls IPTC is the old IPTC-IIM format information.  Use -G with ExifTool to see where the tags are really stored.  If you really want to maintain synchronization with the old IPTC (IIM), then the "xmp2iptc.args" and "iptc2xmp.args" files in the full distribution may be useful to you.

No doubt I'm confused, but learning. :) A few questions:

When I output metadata using -iptc:all > out.csv and the resulting CSV shows me approx 75 tags, those are the old IPTC-IIM tags only?

When I output metadata using -xmp:all > out.csv the resulting CSV shows me ~15 tags. Am I correct that those 15 tags do not represent the total amount of possible XMP tags, but only the tags that are present in at least 1 image from the set of images I read? If I want to add XMP tags which are not already filled in the files, it's simply a matter of adding a column to the CSV file with the correct IPTC or XMP field name? Does that column need to be in a specific order, or can I simply add it to the end of the columns?

This is the result I got when running this command: exiftool -csv -G -f -r . -ext tif > output.csv

SourceFile,Composite:DateTimeCreated,Composite:DateTimeOriginal,Composite:ImageSize,EXIF:Artist,EXIF:BitsPerSample,EXIF:Compression,
EXIF:ImageHeight,EXIF:ImageWidth,EXIF:PhotometricInterpretation,EXIF:PlanarConfiguration,EXIF:ResolutionUnit,EXIF:RowsPerStrip,EXIF:SamplesPerPixel,
EXIF:Software,EXIF:StripByteCounts,EXIF:StripOffsets,EXIF:XResolution,EXIF:YResolution,ExifTool:ExifToolVersion,File:CurrentIPTCDigest,File:Directory,
File:ExifByteOrder,File:FileAccessDate,File:FileModifyDate,File:FileName,File:FilePermissions,File:FileSize,File:FileType,File:MIMEType,IPTC:ApplicationRecordVersion,
IPTC:By-line,IPTC:CodedCharacterSet,IPTC:Contact,IPTC:CopyrightNotice,IPTC:Credit,IPTC:DateCreated,IPTC:EditStatus,IPTC:Headline,IPTC:ObjectName,
IPTC:OriginalTransmissionReference,IPTC:OriginatingProgram,IPTC:Prefs,IPTC:ProgramVersion,IPTC:ReleaseDate,IPTC:ReleaseTime,IPTC:Source,
IPTC:SpecialInstructions,IPTC:TimeCreated,IPTC:Writer-Editor,XMP:CaptionWriter,XMP:Creator,XMP:CreatorWorkEmail,XMP:CreatorWorkURL,XMP:Instructions,
XMP:Marked,XMP:Rating,XMP:Rights,XMP:Title,XMP:TransmissionReference,XMP:UsageTerms,XMP:WebStatement,XMP:XMPToolkit


So my existing images contain a mixture of tag types. Up to now I was using a combination of xnView and Photo Mechanic to apply tags. Photo Mechanic applies "IPTC/XMP" but now I'm unclear as to whether its the old IPTC or the new.

Am I correct that if I simply maintain this same column structure / field headers in the CSV, I'll be safe in writing these tags to the images when using the CSV as the source?

Is there some list which shows a translation of XMP tags to IPTC tags? For example, the IPTC ObjectName tag doesn't exist in XMP.. Is there an easy way to find equivalent tag names?

Phil Harvey

Quote from: 3design on November 11, 2012, 12:00:27 PM
When I output metadata using -iptc:all > out.csv and the resulting CSV shows me approx 75 tags, those are the old IPTC-IIM tags only?

Yes.  Also add -a to be sure you get them all (see FAQ 3).

QuoteWhen I output metadata using -xmp:all > out.csv the resulting CSV shows me ~15 tags. Am I correct that those 15 tags do not represent the total amount of possible XMP tags, but only the tags that are present in at least 1 image from the set of images I read?

Yes.  If you want to see all XMP tags, do this:

exiftool -list -xmp:all

You will get a list of 884 tag names. (I know, more than you bargained for.)

QuoteIf I want to add XMP tags which are not already filled in the files, it's simply a matter of adding a column to the CSV file with the correct IPTC or XMP field name? Does that column need to be in a specific order, or can I simply add it to the end of the columns?

Any order will do.

QuoteAm I correct that if I simply maintain this same column structure / field headers in the CSV, I'll be safe in writing these tags to the images when using the CSV as the source?

It depends on what you mean by "safe".  Existing information will be overwritten.  But then ExifTool creates a "_original" backup for you, so in that sense you're always safe.

QuoteIs there some list which shows a translation of XMP tags to IPTC tags? For example, the IPTC ObjectName tag doesn't exist in XMP.. Is there an easy way to find equivalent tag names?

The IPTCCore specification lists this in detail (I think).  Or you could look at what I am doing in iptc2xmp.args and xmp2iptc.args.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

Hi Phil,

Thanks for the reply. I'm getting a somewhat better grasp on how this all works, but am having some difficulty with writing a test folder\subfolder from a csv I exported. I did a basic export using:

e:\img_output\INCOMING\TEST\exiftool -csv -G --filename --directory -charset type=UTF8 -r . -ext tif > testoutput.csv

(I used the --filename and --directory switches after having read on this page: http://www.exiftool.org/exiftool_pod.html
Quote-args (-argFormat)
    Output information in the form of exiftool arguments, suitable for use with the -@ option when writing. May be combined with the -G option to include group names. This feature may be used to effectively copy tags between images, but allows the metadata to be altered by editing the intermediate file (out.args in this example):
        exiftool -args -G1 --filename --directory src.jpg > out.args
        exiftool -@ out.args dst.jpg
    Note: Be careful when copying information with this technique since it is easy to write tags which are normally considered "unsafe". For instance, the FileName and Directory tags are excluded in the example above to avoid renaming and moving the destination file. Also note that the second command above will produce warning messages for any tags which are not writable.

So that gave me a CSV file. I opened it up in a text editor and made some random changes, adding alphabetical and numerical sequences in a bunch of fields, just as a test. I then saved the CSV  and ran the following command:

E:\img_output\INCOMING\TEST>exiftool -csv=testoutput.csv -r e:\img_output\INCOMING\TEST

and I get a long string of:

No SourceFile 'e:/img_output/INCOMING/TEST/test_filename.tif' in imported CSV database

I'm not sure what the error is. I'm just generating a CSV, making a couple edits, and trying to write it straight back to the files. I read a few other threads where people had similar problems, so I tried including the full path, not including, etc etc, and always the same error. Is it normal that the forward slashes and backslashes seem to be inverted? Do I even need to be removing the --filename and --directory as I'm doing?

Phil Harvey

The -csv option is exactly reversible:

exiftool -csv -r . > out.csv

is the inverse of

exiftool -csv=out.csv -r .

If you are in the same directory and specify your file names in the same way then it should work.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

Thanks Phil, I modified the command line as you suggested and it properly wrote my 7 test files in 3 subdirectories.

Is it possible to specify a filename string using wildcards when reading to csv and writing from csv? For example, if I want to restrict the read/write to a subset of the tif files. Something like this: "constant_string* 01*.tif"
Notice the double wildcard and the space within the name.

Also, the cataloging that I'm working on requires a lot of custom csv fields for my own internal use (i.e. fields which are not necessarily image related in the obvious sense, but pertain more so to categories, merchant ID#'s, prices, related images, etc etc). I'd like to be able to keep these custom fields in the same csv file, using my own tags (i.e. internal_cat, internal_merchID, internal_price, etc etc) but I don't want to run the risk of having those fields written to my images. They're just for my own administrative use in the CSV file. I'll admit once more to my lack of understanding of IPTC/XMP tagging in general and ask, is it possible for completely custom fields to be written to images? How do I avoid doing so, if I'm writing the entire CSV to my directories of images?

Phil Harvey

Quote from: 3design on November 11, 2012, 10:20:41 PM
Is it possible to specify a filename string using wildcards when reading to csv and writing from csv? For example, if I want to restrict the read/write to a subset of the tif files. Something like this: "constant_string* 01*.tif"
Notice the double wildcard and the space within the name.

The shell globbing isn't powerful enough to handle this sort of thing in a directory hierarchy.  You can use ExifTool's -if option to do this, but there is a performance penalty because it will still read the metadata from all files:

exiftool -if '$filename =~ /^constant_string.*01.*\.tif$/' ...

QuoteAlso, the cataloging that I'm working on requires a lot of custom csv fields for my own internal use (i.e. fields which are not necessarily image related in the obvious sense, but pertain more so to categories, merchant ID#'s, prices, related images, etc etc). I'd like to be able to keep these custom fields in the same csv file, using my own tags (i.e. internal_cat, internal_merchID, internal_price, etc etc) but I don't want to run the risk of having those fields written to my images. They're just for my own administrative use in the CSV file. I'll admit once more to my lack of understanding of IPTC/XMP tagging in general and ask, is it possible for completely custom fields to be written to images? How do I avoid doing so, if I'm writing the entire CSV to my directories of images?

To write custom tags you need to first define them as a user-defined tag.  But if you get unlucky your tags could have the same name as existing writable tags.  To avoid this, just use a bogus group name for each of these tags (ie. "MyGroup:MyTag").  Then you can guarantee that they won't be written.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

Quote from: Phil Harvey on November 12, 2012, 07:10:19 AM
The shell globbing isn't powerful enough to handle this sort of thing in a directory hierarchy.  You can use ExifTool's -if option to do this, but there is a performance penalty because it will still read the metadata from all files:
exiftool -if '$filename =~ /^constant_string.*01.*\.tif$/' ...

Is this only able to be run from the perl library version? I'm currently using the windows standalone and I must have tried 2 dozen variants of the command. I get "File not found: ~"

This is a more exact example of a filename structure I want to isolate:
constant1 constant2 - category sub category ABC123.tif

In this case, the terms "constant1 constant2" will be constant across hundreds of directories.. everything from the hyphen onwards will be variable except for the TIF extension.

This is the command I'm executing:
exiftool -if '$filename =~ /^constant1.constant2.*\.tif$/' -csv -G --filename --directory -r . -ext tif > test.csv

I've tried single quotes, double quotes, including only the first constant, removing the extension... everything I could think of.

Phil Harvey

Quote from: 3design on November 12, 2012, 09:36:37 AM
Is this only able to be run from the perl library version? I'm currently using the windows standalone and I must have tried 2 dozen variants of the command. I get "File not found: ~"

Sorry.  In Windows you must use double quotes, not single.  You say you've tried them, but it really should work like this in Windows:

exiftool  -if "$filename =~ /^constant_string.*01.*\.tif$/" ...

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

Hmm.. this definitely isn't working. I've lost count of how many variations I've tried now.

Also, as a test, I shortened the filename to just AAA.tif and it still doesn't find it, so I think it has to be something with that syntax in general.


Phil Harvey

I'll see if I can try this in Windows if I get a chance.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3design

Thanks! I also tried it via the perl version under strawberry perl and it's the same result.
(Also encountered an error in the second step of the 'make' procedure when attempting to install the Image::ExifTool package)
This is the error:
D:\utility\EXIFtool>make
to undefined at D:/utility/StrawberryPerl/strawberry-perl-5.14.2.1-64bit-portable/perl/lib/ExtUtils/Install.pm line 1208
make: *** [pm_to_blib] Error 2

A quick question re custom tags and config file... Theoretically, if my tags in the CSV were some completely crazy string (i.e. 4398ew650e43e6re0:gibberish_image_customtag) then I wouldn't necessarily even need a config file, right? In other words, if the group name is something completely unique, like a trademarked name or a phonetic spelling from another language (just as examples), and the field name likewise, then the chances of that tag actually existing in any group is for all intents and purposes zero, and would not get written anyway... So in that situation is a config file still unnecessary..?

Phil Harvey

Re the "make" problem:  You can run exiftool directly.  It doesn't need to be built.

You only need the config file if you want to write custom tags.  Since you don't want to write them, you don't need a config file.  I just mentioned the config file earlier to try to make this point.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Quote from: Phil Harvey on November 12, 2012, 03:03:25 PM
I'll see if I can try this in Windows if I get a chance.

Yes, it doesn't work in Windows.  When I add -v I see that problem is "Search pattern not terminated".

By trial and error, it seems that doubling the "$" fixes this:

exiftool  -if "$filename =~ /^constant_string.*01.*\.tif$$/" ...

I tried this and it is also a problem on the Mac.  Ah, right.  ExifTool translates the "$/" to a newline, which messes up the search expression.  But "$$" is translated to a single "$".  Sorry for not realizing this sooner.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).