Leading semicolon and space as well as quotes in keywords

Started by nbsusa, June 06, 2016, 02:07:13 PM

Previous topic - Next topic

nbsusa

With the leading semicolon:
[PDF]           Keywords                        : Document, 6426;, Report;, 1967, Aug, 23;, Environmental, Education, for, Urban, Schools, (An, Address, Delivered, at, the, 14th, Annual, National, Conservation, Education, Association, Conference, in, Springfield, Missouri), /, by, Edward, A., Ames, (Executive, Director, of, Wave, Hill, Center, for, Environmental, Studies).
[XMP-dc]        Subject                         : Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).


Without leading semicolon:
[PDF]           Keywords                        : Document, 6426;, Report;, 1967, Aug, 23;, Environmental, Education, for, Urban, Schools, (An, Address, Delivered, at, the, 14th, Annual, National, Conservation, Education, Association, Conference, in, Springfield, Missouri), /, by, Edward, A., Ames, (Executive, Director, of, Wave, Hill, Center, for, Environmental, Studies).
[XMP-dc]        Subject                         : Document 6426, Report, 1967 Aug 23, Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).


I see commas in the one without the semicolon even though when viewing in PDF properties it has semicolons.

Phil Harvey

This output is very different than the one I posted.  The XMP-dc:Subject is stored incorrectly in the one with the leading semicolon.   It was written without the -sep "; " option, so it is one long string instead of separate items.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Sorry about that.

The one with the leading semicolon I just ran and this is all it output now:

[XMP-dc]        Subject                         : Document 6426, Report, 1967 Aug 23, Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).

Phil Harvey

OK, so what is the difference between this and the other?  You might have to compare the full ExifTool output.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

I believe the problem can be seen right there in the CSV file.  Keywords in the original file exists, but is empty.

I went and downloaded Adobe reader to see what tags is was reading.  It looks like the relevant fields are XMP-pdf:Keywords and XMP-dc:Subject.  Reader reads both of these tags and shows them as "Keywords".  Adding to the fun is the fact that XMP-pdf:Keywords appears to be a string, not a list, using semi-colons as separators.

nbsusa, if you clear out XMP-pdf:Keywords first, you probably will remove the leading blank keyword.
exiftool -XMP-pdf:Keywords=
  see edit 2

edit:
Command I used to fill the tags:
exiftool -XMP-pdf:Subject="XMP-pdf:Subject" -XMP-pdf:Keywords="XMP-pdf:Keywords" -XMP-dc:Subject="XMP-dc:Subject" -PDF:Subject="PDF:Subject" -PDF:Keywords="PDF:Keywords"
And what Adobe reader showed


Edit2: Actually, it may not.  I cleared XMP-pdf:Keywords out, double checked, and even though the XMP-pdf:Keywords tag didn't exist, Reader showed the leading semicolon in Properties.  Clearing out XMP-dc:Subject and just using XMP-pdf:Keywords did not leave any leading or trailing semicolons.  The solution might be to just use XMP-pdf:Keywords.  And since that tag appears to be a string, not a list, it won't be affected by the -sep option.  You just have to separate the keywords with (SemicolonSpace).
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

nbsusa

I'm still not getting the desired result.

If I use -sep "; " -xmp-pdf:keywords="Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)."

Then the PDF (in Reader) displays "Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)."

Note the quotes around everything. This does not split the four separate keywords (I can see they are not split in Adobe Acrobat) and does not display them in Dublin Core Properties as separate keyword lists.

Maybe this is not possible to do with exiftool?

Phil Harvey

You should be using XMP-dc:Subject, not XMP-pdf:Keywords.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Phil, when I use:
-sep "; " -XMP-dc:Subject="Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)."

I get this in the PDF Keywords (viewing in Reader)

; Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).

Back to the leading semicolon. However, the Dublin Core properly displays the 4 separate keyword lists.

Phil Harvey

Sorry.  I just re-read StarGeek's post in which he suggested using XMP-pdf:Keywords.  This tag is a simple string, not a list like XMP-dc:Subject.

StarGeek suggested using XMP-pdf:Keywords because it avoided the leading semicolon, but it doesn't look like this is the right solution if Acrobat puts quotes around a complex string like this.

I don't understand why you continue to have this problem.  ExifTool can write anything you want, so it should be easy to reproduce exactly what Acrobat writes.

You have worn me down, so I finally gave in and tested this with a trial version of Adobe Acrobat.  Here is what I did (and what we have been trying to get you to do all along):

1. Write the Keywords you want using Adobe Acrobat.  (For this test I wrote "test 1; test 2; test 3", without the quotes)

2. Use ExifTool to see what was written:

> exiftool ~/Desktop/a.pdf -G1 -a -subject -keywords
[XMP-dc]        Subject                         : test 1, test 2, test 3
[XMP-pdf]       Keywords                        : test 1; test 2; test 3
[PDF]           Keywords                        : test, 1;, test, 2;, test, 3


3. Use ExifTool to write the same thing to another file:

> exiftool ~/Desktop/b.pdf -xmp-dc:subject="test 1, test 2, test 3" -xmp-pdf:keywords="test 1; test 2; test 3" -pdf:keywords="test, 1;, test, 2;, test, 3" -sep ", "
    1 image files updated


4. Open the other file in Acrobat to verify that the Keywords appear as desired --> YES THEY DO!

The only trick was using -sep ", " to split the strings into separate list items when writing.

Note that even though it looks nice in Acrobat, it is wrong.  Acrobat did not split the PDF:Keywords into separate items properly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

That worked like a champ! I apologize for not quite understanding everything you were asking me to do. I didn't realize I needed to do all 3. My fault and I appreciate everything everyone did. Thank you very much!

StarGeek

Adobe Reader completely ignored PDF:Keywords.  Annoying difference between two pieces of software by the same company.

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

I have no idea which of those 3 tags are important.  However, Adobe Acrobat wrote them all, so that's what I did too.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).