Support for Darwin Core XMP Metadate?

Started by Mac2, September 27, 2012, 09:01:19 AM

Previous topic - Next topic

Mac2

The Darwin Core is an XML schema used in science world-wide.

http://rs.tdwg.org/dwc/

and

http://rs.tdwg.org/dwc/terms/guides/xml/index.htm for an XML schema.

Apparently so far IDImager was the only software which supported this namespace in XMP as well. Since IDImager has been recently discontinued, there is now a real need to support XMP-dwc in other applications.

ExifTool can extract the DarwinCore data from XMP like all other unknown XMP data, but does not include it in -listx schema output. My application uses the output from listx to setup reference tables in the database, and cannot work with or store tags "unknown" to ExifTool, even if Exiftool can extract the data from XMP as text. I need somehow to integrate this.

How could I make this namespace known to ExifTool so that ExifTool supports it on the same level as, for example, XMP-dc or XMP-photoshop?
What information would I need to request from Darwin Core people to add support for it to ExifTool?

I know I can declare "custom namespaces" (somehow, did not do that yet) but this is not an application-specific namespace so maybe we should integrate it tighter?


Phil Harvey

Adding this schema is certainly a possibility.  What I need is an XMP sample containing all of the available properties.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, Phil

this is also new to me. I have one sample image with a subset of the data in an XMP namespace.
I will try to get the full spec and more info and then come back to you.

Frank Bungartz

Dear Phil,
I have actually used the software IDImager to create that custom DarwinCore XMP namespace.
Photo Supreme (now marketed as replacement of IDImager) can still apparently read the XMP but no longer permits customization or modifications.
With IDImager Pro now being discontinued I am quite desperate to find alternatives that are future-proof, supporting the Darwin Core XMP (quite unfortunately all information about the DarwinCore XMP was also even removed from the IDImager user forum and their WikiSite has been deleted !!! Not sure if that was intentional, but it is all the more reason for me to try finding alternatives...).

Thus I am highly interested in other software that at least reads, better writes to that schema. I have been in touch with the developer of iMatch and he mentioned that he is using ExifTool for reading XMP. He even suggested he could get in touch with you and I sent him quite a bit of information about the DarwinCore. Not sure if you ever got this...
Attached that info...
One problem with the DarwinCore: it is very extensive, many, many fields. At the research station where I work we have essentially only used a subset of it, so I can send you images with the fields that we are regularly using, but it would take quite a bit of effort to create a photo that makes use of every single field. Still, if that is what is required, I am more than happy to work on this and send you an example...

Cheers,
Frank


Phil Harvey

Hi Frank,

I'll read your documents tomorrow.

Even without a full set of pre-defined Darwin core tags, ExifTool is future proof in this respect.  It can read any XMP as-is, and you only need to create user-defined tags for any undefined XMP tags that you want to write.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Hi Frank,

Thanks for the documents, Mario didn't pass them along to me.

I am more than happy to create an ExifTool config file to allow writing of these properties.  I will need at minimum a small example with a cross-section of the properties to verify that I have the correct general formatting.  But a sample with all properties would be preferable, and would allow validation of all DarwinCore tags.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Frank Bungartz

Hi Phil,
that's great. OK - I will work on generating a photo that has content for all XMP fields then and send you that.
(It will take a bit because I will need to make sure I add all fields - quite a few...)
Frank

Frank Bungartz

Hi Phil,
sorry for the long delay. Here comes an example with all DarwinCore XMP fields filled in. Essentially all nonsense data, i.e., the examples that are given on the Darwin Core website about what kind of data the fields should contain.
Let me know if you need anything more.
Thanks,
Frank

Phil Harvey

Hi Frank,

Here is the config file that gives you the ability to write these Darwin Core tags (I have also added it as an attachment):

#------------------------------------------------------------------------------
# File:         ExifTool_dwc.config  -->  ~/.ExifTool_config
#
# Description:  ExifTool configuration file to add Darwin Core XMP tags
#
# Usage:        exiftool -config ExifTool_dwc.config ...
#
# Revisions:    2013/01/28 - P. Harvey Created
#
# References:   http://rs.tdwg.org/dwc/index.htm
#------------------------------------------------------------------------------
use Image::ExifTool::XMP;

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::Main' => {
        dwc => {
            SubDirectory => { TagTable => 'Image::ExifTool::UserDefined::dwc' },
        },
    },
);

%Image::ExifTool::UserDefined::dwc = (
    GROUPS        => { 0 => 'XMP', 1 => 'XMP-dwc', 2 => 'Other' },
    NAMESPACE     => { 'dwc' => 'http://rs.tdwg.org/dwc/index.htm' },
    WRITABLE      => 'string',
    Event => {
        Struct => {
            day                 => { Writable => 'integer' },
            earliestDate        => { %Image::ExifTool::XMP::dateTimeInfo },
            endDayOfYear        => { Writable => 'integer' },
            eventID             => { },
            eventRemarks        => { Writable => 'lang-alt' },
            eventTime           => { %Image::ExifTool::XMP::dateTimeInfo },
            fieldNotes          => { },
            fieldNumber         => { },
            habitat             => { },
            latestDate          => { %Image::ExifTool::XMP::dateTimeInfo },
            month               => { Writable => 'integer' },
            samplingEffort      => { },
            samplingProtocol    => { },
            startDayOfYear      => { Writable => 'integer' },
            verbatimEventDate   => { },
            year                => { Writable => 'integer' },
        },
    },
    GeologicalContext => {
        Struct => {
            bed                         => { },
            earliestAgeOrLowestStage    => { },
            earliestEonOrLowestEonothem => { },
            earliestEpochOrLowestSeries => { },
            earliestEraOrLowestErathem  => { },
            earliestPeriodOrLowestSystem=> { },
            formation                   => { },
            geologicalContextID         => { },
            group                       => { },
            highestBiostratigraphicZone => { },
            latestAgeOrHighestStage     => { },
            latestEonOrHighestEonothem  => { },
            latestEpochOrHighestSeries  => { },
            latestPeriodOrHighestSystem => { },
            lithostratigraphicTerms     => { },
            lowestBiostratigraphicZone  => { },
            member                      => { },
        },
    },
    Identification => {
        Struct => {
            dateIdentified              => { %Image::ExifTool::XMP::dateTimeInfo },
            identificationID            => { },
            identificationQualifier     => { },
            identificationReferences    => { },
            identificationRemarks       => { },
            identifiedBy                => { },
            typeStatus                  => { },
        },
    },
    MeasurementOrFact => {
        Struct => {
            measurementAccuracy         => { Format => 'real' },
            measurementDeterminedBy     => { },
            measurementDeterminedDate   => { %Image::ExifTool::XMP::dateTimeInfo },
            measurementID               => { },
            measurementMethod           => { },
            measurementRemarks          => { },
            measurementType             => { },
            measurementUnit             => { },
            measurementValue            => { },
        },
    },
    Occurrence => {
        Struct => {
            associatedMedia             => { },
            associatedOccurrences       => { },
            associatedReferences        => { },
            associatedSequences         => { },
            associatedTaxa              => { },
            behavior                    => { },
            catalogNumber               => { },
            disposition                 => { },
            establishmentMeans          => { },
            individualCount             => { },
            individualID                => { },
            lifeStage                   => { },
            occurenceRemarks            => { },
            occurrenceDetails           => { },
            occurrenceID                => { },
            occurrenceStatus            => { },
            otherCatalogNumbers         => { },
            preparations                => { },
            previousIdentifications     => { },
            recordNumber                => { },
            recordedBy                  => { },
            reproductiveCondition       => { },
            sex                         => { },
        },
    },
    Record => {
        Struct => {
            basisOfRecord               => { },
            collectionCode              => { },
            collectionID                => { },
            dataGeneralizations         => { },
            datasetID                   => { },
            datasetName                 => { },
            dynamicProperties           => { },
            informationWithheld         => { },
            institutionCode             => { },
            institutionID               => { },
            ownerInstitutionCode        => { },
        },
    },
    ResourceRelationship => {
        Struct => {
            relatedResourceID           => { },
            relationshipAccordingTo     => { },
            relationshipEstablishedDate => { %Image::ExifTool::XMP::dateTimeInfo },
            relationshipOfResource      => { },
            relationshipRemarks         => { },
            resourceID                  => { },
            resourceRelationshipID      => { },
        },
    },
    Taxon => {
        Struct => {
            acceptedNameUsage           => { },
            acceptedNameUsageID         => { },
            class                       => { },
            family                      => { },
            genus                       => { },
            higherClassification        => { },
            infraspecificEpithet        => { },
            kingdom                     => { },
            nameAccordingTo             => { },
            nameAccordingToID           => { },
            namePublishedIn             => { },
            namePublishedInID           => { },
            nomenclaturalCode           => { },
            nomenclaturalStatus         => { },
            order                       => { },
            originalNameUsage           => { },
            parentNameUsage             => { },
            parentNameUsageID           => { },
            phylum                      => { },
            scientificName              => { },
            scientificNameAuthorship    => { },
            scientificNameID            => { },
            specificEpithet             => { },
            subgenus                    => { },
            taxonConceptID              => { },
            taxonID                     => { },
            taxonRank                   => { },
            taxonRemarks                => { },
            taxonomicStatus             => { },
            verbatimTaxonRank           => { },
            vernacularName              => { Writable => 'lang-alt' },
        },
    },
    dctermsLocation => {
        Struct => {
            continent                   => { Groups => { 2 => 'Location' } },
            coordinatePrecision         => { Groups => { 2 => 'Location' }  },
            coordinateUncertaintyInMeters => { Groups => { 2 => 'Location' }  },
            country                     => { Groups => { 2 => 'Location' }  },
            countryCode                 => { Groups => { 2 => 'Location' }  },
            county                      => { Groups => { 2 => 'Location' }  },
            decimalLatitude             => { Groups => { 2 => 'Location' }  },
            decimalLongitude            => { Groups => { 2 => 'Location' }  },
            footprintSRS                => { Groups => { 2 => 'Location' }  },
            footprintSpatialFit         => { Groups => { 2 => 'Location' }  },
            footprintWKT                => { Groups => { 2 => 'Location' }  },
            geodeticDatum               => { Groups => { 2 => 'Location' }  },
            georeferenceProtocol        => { Groups => { 2 => 'Location' }  },
            georeferenceRemarks         => { Groups => { 2 => 'Location' }  },
            georeferenceSources         => { Groups => { 2 => 'Location' }  },
            georeferenceVerificationStatus => { Groups => { 2 => 'Location' }  },
            georeferencedBy             => { Groups => { 2 => 'Location' }  },
            higherGeography             => { Groups => { 2 => 'Location' }  },
            higherGeographyID           => { Groups => { 2 => 'Location' }  },
            island                      => { Groups => { 2 => 'Location' }  },
            islandGroup                 => { Groups => { 2 => 'Location' }  },
            locality                    => { Groups => { 2 => 'Location' }  },
            locationAccordingTo         => { Groups => { 2 => 'Location' }  },
            locationID                  => { Groups => { 2 => 'Location' }  },
            locationRemarks             => { Groups => { 2 => 'Location' }  },
            maximumDepthInMeters        => { Groups => { 2 => 'Location' }  },
            maximumDistanceAboveSurfaceInMeters => { Groups => { 2 => 'Location' }  },
            maximumElevationInMeters    => { Groups => { 2 => 'Location' }  },
            minimumDepthInMeters        => { Groups => { 2 => 'Location' }  },
            minimumDistanceAboveSurfaceInMeters => { Groups => { 2 => 'Location' }  },
            minimumElevationInMeters    => { Groups => { 2 => 'Location' }  },
            municipality                => { Groups => { 2 => 'Location' }  },
            pointRadiusSpatialFit       => { Groups => { 2 => 'Location' }  },
            stateProvince               => { Groups => { 2 => 'Location' }  },
            verbatimCoordinateSystem    => { Groups => { 2 => 'Location' }  },
            verbatimCoordinates         => { Groups => { 2 => 'Location' }  },
            verbatimDepth               => { Groups => { 2 => 'Location' }  },
            verbatimElevation           => { Groups => { 2 => 'Location' }  },
            verbatimLatitude            => { Groups => { 2 => 'Location' }  },
            verbatimLocality            => { Groups => { 2 => 'Location' }  },
            verbatimLongitude           => { Groups => { 2 => 'Location' }  },
            verbatimSRS                 => { Groups => { 2 => 'Location' }  },
            waterBody                   => { Groups => { 2 => 'Location' }  },
        },
    },
);

1;  #end


As an example, you can use this command to generate an output XMP containing all DWC tags:

exiftool -config ExifTool_dwc.config -xmp-dwc:all=2013 out.xmp

Note that I have left most of the tags as open text format, and have only restricted the formatting of a few tags (ie. the date-format tags).

Just one question:  Is the namespace URI really "http://rs.tdwg.org/dwc/index.htm"?  The normal convention would be to leave off the "index.htm".

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Frank Bungartz

Hi Phil,
Thanks a lot. Excuse my ignorance: Does that mean that people can only use the DWC if they load that config file into ExifTool or will it be part of future ExifTool releases?

>>> Just one question:  Is the namespace URI really "http://rs.tdwg.org/dwc/index.htm"?  The normal convention would be to leave off the "index.htm".

I must admit that I am not too familiar with the conventions. When I used IDImager to define the XMP for the DarwinCore I assumed that the full URL would be the URI. If the convention would be to use ""http://rs.tdwg.org/dwc/", I guess that would better be changed? However, I do have tons of images already with DarwinCore XMP embedded and it would be near impossible to re-write that XMP to all those images...
I am really no expert in all this, but I would definitely prefer not to have to re-write those metadata.

Frank

Phil Harvey

Hi Frank,

Currently this means that people can only use DWC if they download and apply this config file.  I am open to including this in a future ExifTool release if there is enough demand, either by including the config file with the distribution, or by building definitions into the XMP module.  Maybe a significant benefit of the latter is that your tags would then be documented in the ExifTool XMP tag name documentation, which would give the Darwin Core namespace broader exposure.

But for now, I think it is appropriate to take this one step at a time.

If the config file is activated by renaming it to ".ExifTool_config" and placing it in your home directory, or the directory of the exiftool application, or the current directory, then you don't need to use the -config option on the command line.

About the URI:  Definitely do not change it if this is what you have used already.  I just wanted to be sure that I had it right.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Frank Bungartz

Hi Phil,

>>>Currently this means that people can only use DWC if they download and apply this config file.  I am open to including this in a future ExifTool release if there is enough demand, ...

The demand for the DwC will never be huge. By its very nature it is geared mostly towards specialists, i.e., scientists who work with collection data and photos of specimens in natural history collections. I strongly believe in XMP and think that it is a huge advantage, if these kind of metadata can be embedded as XMP inside image files. That is why I am pushing this a bit. I am actually quite surprised that hardly anyone has yet discovered the advantage of embedding Darwin Core Metadata into image files. Huge projects that manage tons of images of species (e.g., Encyclopedia of Life) use things like Flickr Tags to store this kind of information...

This is a bit of a "vicious circle". If you say it should only be included "if there is enough demand", big projects like the Encyclopedia of Life, GBIF, Symbiota, etc. likely will continue to ignore DwC XMP. I think that would be really quite a shame, given its huge potential.

>>>...either by including the config file with the distribution, or by building definitions into the XMP module.  Maybe a significant benefit of the latter is that your tags would then be documented in the ExifTool XMP tag name documentation, which would give your namespace broader exposure.

I think building the namespace definitions into the XMP module would be fantastic. Like I said: I believe DwC XMP has a huge potential to become a de facto standard for image management of specimens, yet it won't likely be adopted, if there are no tools out there that support it - and ExifTool is THE most widely distributed XMP tool around...

Cheers,
Frank

Phil Harvey

Hi Frank,

Understood.

The barrier to inclusion in ExifTool is really a performance issue.  ExifTool is pure Perl, so the modules are compiled at run time, and each new namespace that I add incurs a slight performance penalty, which is more significant for namespaces with a large number of tags such as DwC.

But perhaps the solution is to break up the XMP module into smaller sections (currently it is in 3 parts), so only the required part(s) of the module get compiled.  If I can do this without a lot of work, I would be in a better position to include large and less common namespaces.  Let me think about this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Frank Bungartz

Hi Phil,
OK, I understand the challenge. The DwC is indeed huge and like I said before, we essentially use only a small part of it at the Charles Darwin Foundation (CDF) in the Galapagos.

>>> But perhaps the solution is to break up the XMP module into smaller sections

You will have noticed by now that the DwC is divided into several XMP structures (Record, Event, Occurrence, Location, Geological Context, Identification , Taxon, ResourceRelationship, MeasurementOrFact). Of those, we (CDF) regularly use only some fields in Record, Event, Occurrence, Location, and Identification , Taxon.
So, perhaps ExifTool could read those XMP structures as separate sections?

Then again: even of this smaller set of DwC XMP structures most people will hardly use all fields. The documents that I sent you before summarize pretty well which fields inside those XMP structures we regularly fill in at CDF.
In the DwC XMP there is also some repetitiveness with other XMP: fields like "country" or "province in "dwc::dcterms::Location::country" store essentially the same information as "photoshop::country". So, in a way these fields are duplication and somewhat redundant.
Perhaps focusing on a "slimmed down" version of the most commonly used fields would be another option?

Cheers,
Frank

Phil Harvey

Hi Frank,

You've convinced me that this is worth the work...  I'll see about building the DwC tags into ExifTool in a separate module to avoid the performance penalty.  Give me a few days to work on this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).