How to create XMP inside DjVu?

Started by monday2000, August 18, 2010, 08:11:42 AM

Previous topic - Next topic

monday2000

Hello, Phil.

I need your help in understanding how to create the XMP metadata inside DjVu files.

I came here looking at the discussion here:

http://www.djvu.org/forum/phpbb/viewtopic.php?t=530

I know that the djvused program now is capable to insert XMP inside DjVu.

The question is - how exactly? The djvused online help lacks details.

Could you please describe the details about how exactly to prepare the djvused script for the XMP insertion?

What XMP tags are allowed in DjVu? What is their max length? What is the syntax etc.?

Here is your DjVu-sample script:
Quoteselect; remove-ant; remove-txt
# -------------------------
select 1
set-ant
(metadata
   (Author   "Phil Harvey")
   (Title   "DjVu Metadata Sample")
   (Subject   "ExifTool DjVu test image")
   (CreationDate   "2008-09-23T12:31:34-04:00")
   (ModDate   "2008-11-11T09:17:10-05:00")
   (Keywords   "ExifTool, Test, DjVu, XMP")
   (Producer   "djvused")
   (Trapped   "Unknown")
   (Creator   "ExifTool")
   (note   "Must escape double quotes (\") and backslashes (\\)") )
   (url   "https://exiftool.org/")
(xmp "<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n\n
<rdf:Description rdf:about=''\n  xmlns:album=\"http://ns.adobe.com/album/1.0/\">\n
<album:Notes>Must escape double quotes (&quot;) and backslashes (\\)</album:Notes>\n
</rdf:Description>\n\n <rdf:Description rdf:about=''\n  xmlns:dc='http://purl.org/dc/elements/1.1/'>\n
<dc:creator>\n
<rdf:Seq>\n
<rdf:li>Phil Harvey</rdf:li>\n
</rdf:Seq>\n
</dc:creator>\n
<dc:description>\n
<rdf:Alt>\n
<rdf:li xml:lang='x-default'>ExifTool DjVu test image</rdf:li>\n
</rdf:Alt>\n
</dc:description>\n
<dc:rights>\n
<rdf:Alt>\n
<rdf:li xml:lang='x-default'>Copyright 2008 Phil Harvey</rdf:li>\n
</rdf:Alt>\n
</dc:rights>\n
<dc:subject>\n
<rdf:Bag>\n
<rdf:li>ExifTool</rdf:li>\n
<rdf:li>Test</rdf:li>\n
<rdf:li>DjVu</rdf:li>\n
<rdf:li>XMP</rdf:li>\n
</rdf:Bag>\n
</dc:subject>\n
<dc:title>\n
<rdf:Alt>\n
<rdf:li xml:lang='x-default'>DjVu Metadata Sample</rdf:li>\n
</rdf:Alt>\n
</dc:title>\n
</rdf:Description>\n\n
<rdf:Description rdf:about=''\n  xmlns:pdf='http://ns.adobe.com/pdf/1.3/'>\n
<pdf:Keywords>ExifTool, Test, DjVu, XMP</pdf:Keywords>\n
<pdf:Producer>djvused</pdf:Producer>\n
<pdf:Trapped>/Unknown</pdf:Trapped>\n
</rdf:Description>\n\n
<rdf:Description rdf:about=''\n  xmlns:xmp='http://ns.adobe.com/xap/1.0/'>\n
<xmp:CreateDate>2008-09-23T12:31:34-04:00</xmp:CreateDate>\n
<xmp:CreatorTool>ExifTool</xmp:CreatorTool>\n
<xmp:ModifyDate>2008-11-11T09:17:10-05:00</xmp:ModifyDate>\n
</rdf:Description>\n</rdf:RDF>")
.
So what is this stuff in the "xmp" directive, what rules does it follow? I just am unaware about how to make my own XMP in DjVu. :o

Phil Harvey

What goes in the XMP directive is standard XMP as documented in the XMP specification.  The only difference is that the XMP should not use the "xpacket" or "x:xmpmata" wrappers indicated in the XMP spec, and any double quotes and backslashes must be escaped with a backslash.

Any XMP tags mentioned in the XMP spec are allowed in DjVu.  You can use ExifTool to generate the XMP packet if you want by writing to a .xmp file.

I forget now how to use djvused to insert this into a DjVu image, but it shouldn't be too hard to figure out if you read the djvused documentation.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

monday2000

Phil, why your DjVu-sample script has 2 parts?:
1 part:
Quote(metadata
   (Author   "Phil Harvey")
   (Title   "DjVu Metadata Sample")
   (Subject   "ExifTool DjVu test image")
   (CreationDate   "2008-09-23T12:31:34-04:00")
   (ModDate   "2008-11-11T09:17:10-05:00")
   (Keywords   "ExifTool, Test, DjVu, XMP")
   (Producer   "djvused")
   (Trapped   "Unknown")
   (Creator   "ExifTool")
   (note   "Must escape double quotes (\") and backslashes (\\)") )
   (url   "https://exiftool.org/")
2 part:
Quote(xmp "<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n\n
<rdf:Description rdf:about=''\n  xmlns:album=\"http://ns.adobe.com/album/1.0/\">\n
<album:Notes>Must escape double quotes (&quot;) and backslashes (\\)</album:Notes>\n
</rdf:Description>\n\n <rdf:Description rdf:about=''\n  xmlns:dc='http://purl.org/dc/elements/1.1/'>\n
<dc:creator>\n
<rdf:Seq>\n
<rdf:li>Phil Harvey</rdf:li>\n
</rdf:Seq>\n
</dc:creator>\n
<dc:description>\n
<rdf:Alt>\n
<rdf:li xml:lang='x-default'>ExifTool DjVu test image</rdf:li>\n
</rdf:Alt>\n
</dc:description>\n
<dc:rights>\n
<rdf:Alt>\n
<rdf:li xml:lang='x-default'>Copyright 2008 Phil Harvey</rdf:li>\n
</rdf:Alt>\n
</dc:rights>\n
<dc:subject>\n
<rdf:Bag>\n
<rdf:li>ExifTool</rdf:li>\n
<rdf:li>Test</rdf:li>\n
<rdf:li>DjVu</rdf:li>\n
<rdf:li>XMP</rdf:li>\n
</rdf:Bag>\n
</dc:subject>\n
<dc:title>\n
<rdf:Alt>\n
<rdf:li xml:lang='x-default'>DjVu Metadata Sample</rdf:li>\n
</rdf:Alt>\n
</dc:title>\n
</rdf:Description>\n\n
<rdf:Description rdf:about=''\n  xmlns:pdf='http://ns.adobe.com/pdf/1.3/'>\n
<pdf:Keywords>ExifTool, Test, DjVu, XMP</pdf:Keywords>\n
<pdf:Producer>djvused</pdf:Producer>\n
<pdf:Trapped>/Unknown</pdf:Trapped>\n
</rdf:Description>\n\n
<rdf:Description rdf:about=''\n  xmlns:xmp='http://ns.adobe.com/xap/1.0/'>\n
<xmp:CreateDate>2008-09-23T12:31:34-04:00</xmp:CreateDate>\n
<xmp:CreatorTool>ExifTool</xmp:CreatorTool>\n
<xmp:ModifyDate>2008-11-11T09:17:10-05:00</xmp:ModifyDate>\n
</rdf:Description>\n</rdf:RDF>")
Why is not 2 part sufficient?

Phil Harvey

Where did you find this script by the way?  I don't remember making it and I can't find it in the thread on the djvu forum.

You could just write XMP if you want.  The advantage of writing both types of metadata is for maximum compatibility with DjVu readers that don't recognize XMP.  Of course, XMP has many properties which have no native DjVu metadata counterpart, but it is recommended to synchronize the properties that do.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

monday2000

QuoteWhere did you find this script by the way?
I extracted it with djvused from your last DjVu sample file (and slightly edited): http://owl.phy.queensu.ca/~phil/djvu_with_xmp_final.djvu

What is "PDF DocInfo" which is mentioned in http://djvu.cvs.sourceforge.net/viewvc/djvu/djvulibre-3.5/doc/djvuchanges.txt ? Where to read about it?

And what is "BibTex bibliography system"? Is it covered by this article: http://en.wikipedia.org/wiki/BibTeX ?

And what is the maximum length of each field?

QuoteOf course, XMP has many properties which have no native DjVu metadata counterpart, but it is recommended to synchronize the properties that do.
So is XMP the most preferrable (and wide-covering) if to compare with "PDF DocInfo" and "BibTex bibliography system"?

Will your ExifTool support editing/writing DjVu-metadata?

Do you know also some basic open-sourced C++ tool for managing PDF XMP?

Phil Harvey

Quote from: monday2000 on August 20, 2010, 01:20:16 AM
I extracted it with djvused from your last DjVu sample file (and slightly edited): http://owl.phy.queensu.ca/~phil/djvu_with_xmp_final.djvu

Ah, OK.  This makes sense.

Quote
What is "PDF DocInfo" which is mentioned in http://djvu.cvs.sourceforge.net/viewvc/djvu/djvulibre-3.5/doc/djvuchanges.txt ? Where to read about it?

You can find the official PDF specification here.  Look in the section titled "Document Information Dictionary".

Quote
And what is "BibTex bibliography system"? Is it covered by this article: http://en.wikipedia.org/wiki/BibTeX ?

I don't know about this one.

Quote
And what is the maximum length of each field?

Unlimited.

QuoteSo is XMP the most preferrable (and wide-covering) if to compare with "PDF DocInfo" and "BibTex bibliography system"?

I suppose this depends on what software you use to read the DjVu images.  I don't know how many support XMP at this time.

Quote
Will your ExifTool support editing/writing DjVu-metadata?

It is unlikely that I will add this feature in the near future since it is relatively low on the gain/pain scale.

Quote
Do you know also some basic open-sourced C++ tool for managing PDF XMP?

No, but I do know an open-source Perl tool. ;)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).