PDF Metadata - Custom Config File - Request for Review/Advice

Started by JRocchio, August 09, 2023, 01:17:36 PM

Previous topic - Next topic

JRocchio

Request: Could someone review the below custom fields definition and advise if it is A-OK or if it could benefit from some mods and tweaks?

Background: The custom field definitions are for PDF files that are sheet music files to be imported into the MobileSheets sheet music reader. I apply the metadata to the PDF file as part of a composing/transcribing process, at the end of which I export a PDF from MuseScore, then use exiftool to apply an arguments file that I have created, in a text editor, to match the score. I have been working with the MobileSheets developer to be sure I have the field names correct and we've agreed upon list item separator characters (comma/semicolon). The below shown config file is working fine as it is - I can sucessfully populate the metadata into my PDFs and they correctly get populated by the MobileSheets app upon import.

Reason for My Request: Before I go months populating 100s of PDF docs using the below config file; and also sharing it out with other MobileSheets users, I'd like a higher level of confidence that I really do have it right from both a reliability and 'best practices' perspective. The custom config file I have created is really just copied from some place on the internet and I just revised and added my own tags. I have studied the example config file provided here on the site but I haven't really been able to work out what-all is going on with the various entries. And they are all more complex than what I am using below, which leads me to think I am missing a few things that may be important.

Specific Questions --

  1. Why the two sections XMP::pdfx and PDF::Info? I could guess that the 'PDF::Info' section is populating the PDFs Info Table whereas the XMP section is then also writing the tag into the XMP group. But honestly I don't know.

  2. If a field will contain a list of values I've specified { List => 'String' }; but for a single entry field I only have the empty { }. Seems like something should go in the brackets. I've tried a few things for that but also got some "parsing" errors reported by exiftool when I did. I assume empty { } mean 'use defaults.' Even if true, is that good or should I be more specific in specifying something in them?

  3. Any issues in the below relative to any best practices?

  4. Any issues or recommendations relative to reliability - both over time and across platforms (I am on Linux with no access to a Windows or Mac machine so all I know is that the below is currently working on Linux just fine)?


My Custom Config File (.ExifTool.config) --

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::pdfx' => {
        Books => { },
        Key => { },
        Album => { },
        Source => { },
        Duration => { },
    },
    'Image::ExifTool::PDF::Info' => {
        Books => { List => 'string' },
        Key => { List => 'string' },
        Album => { List => 'string' },
        Source => { List => 'string' },
        Duration => { },
    },
);
1;  #end

An Example Arguments File for A Score (210_TEST_Score_B_Books.pdfexifargs.txt) --

# NOTE: Below tags are custom tags put into the PDF document and
# depend on the existence of custom tag definitions in the
# file ~/.ExifTool_config. See notes concerning this in my Evernote
# note "PDF Meta-Data Properties"

# The -E option below tells exiftool to use HTML special characters.
# e.g., use for tags that are lists where a space would cause the
# unintended split of the tag's text into separate list items.

-E
-Books=Book Title1; Book Title2
-Duration=11:45
-Source=Source 1, Source 2
-Key=Dm
-Album=Album 1, Album 2
-Title=210 TEST Score B - Book & Album
-Author=Test, Author2
-Composer=Libre Office, composer2
-Subject=Generes, Subject with a space, 3rd Genere
-Keywords=MobileSheets Migration, Test
-overwrite_original
210_TEST_Score_B_Books.pdf

Command Used to Apply Above Arguments File --

exiftool -@ 210_TEST_Score_B_Books.pdfexifargs.txt


See attached file Example_MetadataSetByArgumentsFile.jpg for listing of metadata in 210_TEST_Score_B_Books.pdf after the above command is run.

Any suggested improvements appreciated - even (esp?, lol) those saying "You're all good, keep as-is."


Phil Harvey

Quote from: JRocchio on August 09, 2023, 01:17:36 PM1. Why the two sections XMP::pdfx and PDF::Info? I could guess that the 'PDF::Info' section is populating the PDFs Info Table whereas the XMP section is then also writing the tag into the XMP group. But honestly I don't know.

PDF::Info is the native PDF format for storing metadata.  I'm not sure what other software will read custom PDF Info tags.  Duplicating these in XMP makes sense, and probably has a better chance of being compatible with other software since XMP is designed to be extensible.

Quote2. If a field will contain a list of values I've specified { List => 'String' }; but for a single entry field I only have the empty { }.

Correct.  This is fine.

Quote3. Any issues in the below relative to any best practices?

I would think that non-breaking spaces could be a problem.  I would suggest using normal spaces.

You are writing Subject as a string, which is wrong.  See FAQ 17.  Adding -sep ", " to the command (on 2 separate lines and without the quotes in your argfile) will write as separate list items.

Quote4. Any issues or recommendations relative to reliability - both over time and across platforms (I am on Linux with no access to a Windows or Mac machine so all I know is that the below is currently working on Linux just fine)?

XMP is the future, and metadata in this format is most likely survive better over time.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

JRocchio

Thanks for reviewing Phil.

> See FAQ 17.  Adding -sep ", " <-- Ok, interesting, I will study that.

>XMP is the future<-- The PDF v XMP comments are very helpful. That was confusing to me.

StarGeek

Quote from: Phil Harvey on August 10, 2023, 01:49:25 PMI would think that non-breaking spaces could be a problem.  I would suggest using normal spaces.

I very much agree with this. Subject&nbsp;with&nbsp;a&nbsp;space would be very hard to enter in any sort of GUI.  For example Adobe Acrobat or Reader.  If someone attempts write the "same" tag to a file, the GUI would enter it as Subject with a space, which will not match.

QuoteYou are writing Subject as a string, which is wrong.  See FAQ 17.  Adding -sep ", " to the command (on 2 separate lines and without the quotes in your argfile) will write as separate list items.

Subject in PDFs is especially problematic, as exiftool will write to both XMP-dc:Subject, which is a list type tag for keywords, and PDF:Subject, which is a simple string and I think it's normally for a short title or something similar, not for keywords.  When looking at a file's properties in Adobe Reader, any value in PDF:Subject shows up in the "Subject" box, while anything in XMP-dc:Subject is combined with (I think) PDF:Keywords and XMP-pdf:Keywords and displayed in the "Keywords" box.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

JRocchio

Quote from: StarGeek on August 10, 2023, 03:27:43 PMSubject in PDFs is especially problematic,

I need a master's degree in metadata! lol My use-case is setting metadata into PDF's which are musical scores. These PDF's get imported into tablet sheet music readers - forScore and/or MobileSheets. Both of these programs map the PDF 'Subject' field into their internal properties 'Tags' (forScore) and 'Keywords' (MobileSheets). A part of the challenge is making metadata that will work with both apps. Actually, the first part of the problem is making metadata entries in the PDFs that will work even with just one (I personally use MobileSheets, so I am currently focused on that one.)

So I am experiencing that this whole challenge, for me, is also about figuring out what the apps on the other side of my equation are doing. The good news is that with MobileSheets I am having some dialog with the developer and so we are able to negotiate a bit about how the fields should work.

JRocchio

Quote from: Phil Harvey on August 10, 2023, 01:49:25 PMYou are writing Subject as a string, which is wrong.  See FAQ 17.  Adding -sep ", " to the command

Ok, I have this worked out now; and indeed it is so much more pleasant not to be using the HTML space code for spaces in list fields.  :)

Still haven't been able to make the 'SourceType' tag work, per a separate post I submitted on that.

My new arguments file --

-Title=210 TEST Score F – List Fields
-SourceType=SourceTypeField
-Author=Test
-Composer=Libre Office
-Artist=Artist 1, Artist 2
-Album=Album 1, Album 2
-Key=C#
-Subject=Generes, Subject with a space
-Keywords=MobileSheets Migration, Test
-overwrite_original
210_TEST_Score_F_ListFields.pdf

Applied to PDF docs with the (KDE launcher) command:
exiftool -sep ", " -@ %f

JRocchio

Quote from: JRocchio on August 10, 2023, 03:53:49 PMBoth of these programs map the PDF 'Subject' field into their internal properties 'Tags' (forScore) and 'Keywords' (MobileSheets).

Actually, for the record, I conflated the mapping. The subject field gets mapped to 'Genre' in each of those apps. But in both cases the Genre property is a list as well, so your comments re PDF Subject sill apply.