Support for Darwin Core XMP Metadate?

Started by Mac2, September 27, 2012, 09:01:19 AM

Previous topic - Next topic

Phil Harvey

Hi Frank,

I've got the DarwinCore module added to my working version now (OK, so I was a bit quicker than a couple of days), and have been spending some time trying to tame the wild tag names to make them a bit more user-friendly.  By default, ExifTool generates the flattened tag names by combining the structure name with the field name, but in some cases this gives unwieldy names like ResourceRelationshipRelationshipEstablishedDate, so I have tweaked some of the names by hand to shorten them, and some to avoid conflicts with other ExifTool tags.

I did discover one error in the sample you sent:  "occurenceRemarks" was misspelt, and should have been "occurrenceRemarks".  I found this error by chance, and haven't attempted to validate all of the property names.

Anyway, here is the ExifTool Darwin Core tag name documenation as it currently stands.  Let me know if you are unhappy at all with the liberties I have taken with the tag names.  To be clear: The ExifTool tag names are NOT what is written to the XMP file -- they are just what the ExifTool interacts with when reading/writing this information.  (You may want to read the preamble to the XMP Tags documentation to understand more about how this documentation is organized.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Frank Bungartz

Hi Phil,
wow - that was quick!
Now, I will actually probably need a few days to check thoroughly. I had a quick glance but would like to make sure there are no major glitches ;-)
From what I did see, I do have one question:
You mentioned before that you configured most fields to be strings and few to be dates or time fields. When I put together the XMP using IDImager, I believe I also used string for almost all fields. There are some fields, however, where it makes sense to permit entry of several language options. An example is "dwc:taxon:vernacularName". A vernacular name of an animal could be "Eagle", "Aguila", "Adler" - each language has its own vernacular name of an animal. So, the field type is still string, but you have different alternatives that you can enter. In Idimager the property type would be "alternatives", and the data type "string".
There are a few other fields like that. I believe I sent you a summary of how the fields were defined?
Since I am no expert but simply used IDImager to set this up, I am not sure how important these details are... Do you need me to go through those field types too?

>>> I did discover one error in the sample you sent:  "occurenceRemarks" was misspelt, and should have been "occurrenceRemarks"

Good catch. I do hope there are no other spelling errors like that.

Will try to get back to you quickly...

Thanks so much,
Frank

Frank Bungartz

PS: One more thing - IDI distinguishes basically these these three:

Property type
Data type
Edit type

Most fields I have defined as
Property type: variable
Data type: string
Edit type: default

Multi-language fields:
Property type: variable
Data type: string
Edit type: alternative

And very few fields are number fields, e.g. dwc:Event:year
Property type: variable
Data type: integer
Edit type: default
Fields that sometimes need fairly long text entries, like your "occurrenceRemarks" would however be defined as:
Property type: variable
Data type: string
Edit type: memo

The dates would be
Property type: variable
Data type: date
Edit type: default




Phil Harvey

Hi Frank,

Quote from: Frank Bungartz on January 29, 2013, 11:17:57 AM
There are some fields, however, where it makes sense to permit entry of several language options. An example is "dwc:taxon:vernacularName".

Yes.  In ExifTool these are denoted by "lang-alt" in the Writable column.

All of these types are explained in the preamble to the XMP tags documentation as I mentioned.  I have already assigned date, integer and lang-alt types for all of the tags I thought needed them.  There is no distinction between a long and short string as you had in IDI.  In ExifTool the structures themselves ("struct" type) are also accessible, but in general it is usually easiest to interface with the flattened tags.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Frank Bungartz

Hi Phil,
I started too look at this now.
So, let me see if I understand this corrrectly. Your automatically generated tag name would be "ResourceRelationshipRelationshipRemarks" because both the structure tag name and the axtual XMP tag get combined, that means, for the XMP Tag "RelationshipRemarks" that belongs to "ResourceRelationship", you get this unwieldy name and therefore suggest to shorten it to "RelationshipRemarks" as the name displayed in ExifTool, though not changing how this is written as XMP, right?
I think that should be no problem, even for most things where there is no repetition, so "HighestBiostratigraphicZone" for "GeologicalContextHighestBiostratigraphicZone" should be alright too. I think anyone  who would use the Darwin Core should understand "biostratigraphic" is a geological term.

What I do not understand though:
Sometimes you have not listed that repetition of the structure tag name in the Notes column.
For example:
EventTime    date_    (EventEventTime)
but
EventFieldNotes    string_    

and NOT
EventFieldNotes    string_     (EventEventFieldNotes)

OR

OccurrenceLifeStage    string_    
OccurrenceDetails    string_    (OccurrenceOccurrenceDetails)

Any reason why?

Thanks,
Frank

Phil Harvey

Hi Frank,

Quote from: Frank Bungartz on January 30, 2013, 06:46:49 PM
therefore suggest to shorten it to "RelationshipRemarks" as the name displayed in ExifTool, though not changing how this is written as XMP, right?

Correct.

QuoteSometimes you have not listed that repetition of the structure tag name in the Notes column.

Right.  The combined name is only listed in the Notes if it differs from the Tag Name.

QuoteEventFieldNotes    string_

and NOT
EventFieldNotes    string_     (EventEventFieldNotes)

This is the "Event" structure "fieldNotes" field.  So there is no duplicate "Event" as there would be for the "eventTime" field for example.  Take a look at the field names in your structures:  Sometimes the structure name appears in the field name, and sometimes it doesn't.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Frank Bungartz

Hi Phil,
OK, I got it.
I think the list of XMP tags looks complete and I do think that for your naming conventions it is not a problem that the structure tags are not repeated by ExifTool. I have read through the list twice now and not caught anything that looks out of place.
One problem of course is: I am not a native speaker and glitches like "occurenceRemarks" instead of "occurrenceRemarks". I really do hope I have not overlooked anything else, there are so many fields. It also does not help that I have now read this stuff so many times, that even if some spelling errors were there I probably would not catch them, things that you read and are familiar with, you tend to overlook these glitches.
Still, since most of that stuff was copied from the Darwin Core Website, when I originally put it together for IDimager, I am quite confident this is ready to g now...
Sorry for the delay...
And THANKS again!!! I really do hope the DwC XMP will reach more followers now that ExifTool documents it.
Cheers,
Frank

PS: when is the new version with that documentation coming out? (Just asking because iMatch apparently supports XMP through ExifTool and thus, if ExifTool officially supports DwC XMP, iMatch should be able to read it - and I would really like to see if perhaps I could use iMatch as an alternative to IDimager, now that IDimager is a dead end...)

Phil Harvey

Hi Frank,

Great, thanks for going over this.

Quote from: Frank Bungartz on February 02, 2013, 10:50:09 PM
PS: when is the new version with that documentation coming out?

It is available now.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).