Leading semicolon and space as well as quotes in keywords

Started by nbsusa, June 06, 2016, 02:07:13 PM

Previous topic - Next topic

nbsusa

I am using exiftool to write some keywords into a PDF. I am using a simple command line and getting somewhat of what I am in need of. I have attempted both -xmp:subject and -keywords with pretty much the same result.

This is what I am using right now in my Windows command line (using most recent version of exiftool as of today):

exiftool  -xmp:Subject="Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield, Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)." -sep ";"MPHS-CCR_6426.pdf -overwrite_original

When I view PDF properties I see the following (note leading ; and space as well as quotes beginning right before Environmental):

; Document 6426;  Report;  1967 Aug 23; " Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield, Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)."

When viewing Advanced. Dublin Core. Subject. the keywords are correct.

How can I get this to display in the normal PDF properties without the leading semicolon and space and no quotes. When using -keywords I get quotes around everything and the Dublin core keywords are not split up.

Hayo Baan

You get the leading spaces because you entered them yourself after the ; ;)
The solution is simple, don't enter them or specify the separator as "; ".

As for the leading semi colon (the list separator), I don't see this in the output of exiftool. The same goes for the quotes. In both cases my guess would be that the application you use to view the file info is causing this (in case of the quotes, probably because there are "special" characters in the keyword).  What application are you using to view the file info?
Hayo Baan – Photography
Web: www.hayobaan.nl

nbsusa

Thanks for the reply. I am using both Adobe Acrobat and Acrobat Reader to view the completed PDF's. Our customer is requiring the first semicolon (with the space following) to be removed, as well as the quotes that show up. If I change my command line file to use -keywords= instead of -xmp:subject= I do not get the leading semicolon but I get quotes around the entire string (which should be 4 separate keyword fields)

StarGeek

You might want to try the command in FAQ 3.  I'm guessing that Adobe is reading the keywords from multiple tags and combining them.  Plus, there are PDF specific tags, like PDF:Keywords.

"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Not really sure where to go from here as neither FAQ3 or FAQ17 helped with the issue.

Phil Harvey

Did you use Adobe Acrobat to change the metadata as required by the customer, then did you use ExifTool to read this metadata?  You should then be able to write the metadata exactly like this using ExifTool.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Same issue. I exported a csv that contained the info I hand typed into the PDF Keywords field, modified the csv, then imported it. I still get the leading semicolon. The issue with the quotes I was having was due to a comma within part of the text. But, I need to get rid of the leading semicolon. I have no idea why that would keep appearing.

Phil Harvey

Some concrete examples are necessary.  Can you post the modified CSV and the command you used to import it?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

I've attached the output I created using the following command line:

exiftool -csv -r MPHS-CCR_6426.pdf > out.csv

I've also attached my modified csv and using the following command line to update the PDF. The only things I changed were to remove the Keyword field and modify the Subject field.

exiftool -sep ";" -csv=in.csv MPHS-CCR_6426.pdf -overwrite_original

Phil Harvey

Here's what I get (changing the SourceFile to a.pdf so I can test your CSV file):

% exiftool a.pdf -sep ";" -csv=/Users/phil/Desktop/in.csv
    1 image files updated

% ./exiftool a.pdf -keywords -subject -G1
[XMP-dc]        Subject                         : Document 77777,  Report,  1967 Aug 23,  Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).
[PDF]           Subject                         : Document 77777; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).


I don't see the leading semicolon that you mention.

The only problem I see is an extra space before all entries after the first in XMP:Subject.  You should use -sep "; " (semicolon+space) in the first command above to avoid this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Phil, it's when you physically open the PDF in Adobe Acrobat or Reader and view File/Properties and look at Keywords where you will see it. I initially had the -sep with "; " but a previous reply told me to take the space out.

Phil Harvey

I think the previous reply mentioned to either remove the space from after the semicolon in the tag value, or add a space in the -sep argument.

Could you post the ExifTool output has I have done (the second command in my last post) for two PDF files:  one that shows the extra semicolon in Adobe Acrobat, and one that doesn't?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Phil, I get the same look that you get when running that. It does not show the leading semicolon running either one. The only way it shows up is when looking at the PDF properties in Acrobat. I have not looked in any other PDF viewer to see if it is there because the client needs it to look a certain way in Acrobat. I'm attaching a screen shot.

Phil Harvey

Yes, but what is the difference in the metadata (comparing ExifTool outputs) between one that shows the leading semicolon in Acrobat and one that doesn't?  There must be a difference, and ExifTool will show you what it is.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

With the leading semicolon:
[PDF]           Keywords                        : Document, 6426;, Report;, 1967, Aug, 23;, Environmental, Education, for, Urban, Schools, (An, Address, Delivered, at, the, 14th, Annual, National, Conservation, Education, Association, Conference, in, Springfield, Missouri), /, by, Edward, A., Ames, (Executive, Director, of, Wave, Hill, Center, for, Environmental, Studies).
[XMP-dc]        Subject                         : Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).


Without leading semicolon:
[PDF]           Keywords                        : Document, 6426;, Report;, 1967, Aug, 23;, Environmental, Education, for, Urban, Schools, (An, Address, Delivered, at, the, 14th, Annual, National, Conservation, Education, Association, Conference, in, Springfield, Missouri), /, by, Edward, A., Ames, (Executive, Director, of, Wave, Hill, Center, for, Environmental, Studies).
[XMP-dc]        Subject                         : Document 6426, Report, 1967 Aug 23, Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).


I see commas in the one without the semicolon even though when viewing in PDF properties it has semicolons.

Phil Harvey

This output is very different than the one I posted.  The XMP-dc:Subject is stored incorrectly in the one with the leading semicolon.   It was written without the -sep "; " option, so it is one long string instead of separate items.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Sorry about that.

The one with the leading semicolon I just ran and this is all it output now:

[XMP-dc]        Subject                         : Document 6426, Report, 1967 Aug 23, Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).

Phil Harvey

OK, so what is the difference between this and the other?  You might have to compare the full ExifTool output.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

I believe the problem can be seen right there in the CSV file.  Keywords in the original file exists, but is empty.

I went and downloaded Adobe reader to see what tags is was reading.  It looks like the relevant fields are XMP-pdf:Keywords and XMP-dc:Subject.  Reader reads both of these tags and shows them as "Keywords".  Adding to the fun is the fact that XMP-pdf:Keywords appears to be a string, not a list, using semi-colons as separators.

nbsusa, if you clear out XMP-pdf:Keywords first, you probably will remove the leading blank keyword.
exiftool -XMP-pdf:Keywords=
  see edit 2

edit:
Command I used to fill the tags:
exiftool -XMP-pdf:Subject="XMP-pdf:Subject" -XMP-pdf:Keywords="XMP-pdf:Keywords" -XMP-dc:Subject="XMP-dc:Subject" -PDF:Subject="PDF:Subject" -PDF:Keywords="PDF:Keywords"
And what Adobe reader showed


Edit2: Actually, it may not.  I cleared XMP-pdf:Keywords out, double checked, and even though the XMP-pdf:Keywords tag didn't exist, Reader showed the leading semicolon in Properties.  Clearing out XMP-dc:Subject and just using XMP-pdf:Keywords did not leave any leading or trailing semicolons.  The solution might be to just use XMP-pdf:Keywords.  And since that tag appears to be a string, not a list, it won't be affected by the -sep option.  You just have to separate the keywords with (SemicolonSpace).
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

nbsusa

I'm still not getting the desired result.

If I use -sep "; " -xmp-pdf:keywords="Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)."

Then the PDF (in Reader) displays "Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)."

Note the quotes around everything. This does not split the four separate keywords (I can see they are not split in Adobe Acrobat) and does not display them in Dublin Core Properties as separate keyword lists.

Maybe this is not possible to do with exiftool?

Phil Harvey

You should be using XMP-dc:Subject, not XMP-pdf:Keywords.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

Phil, when I use:
-sep "; " -XMP-dc:Subject="Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies)."

I get this in the PDF Keywords (viewing in Reader)

; Document 6426; Report; 1967 Aug 23; Environmental Education for Urban Schools (An Address Delivered at the 14th Annual National Conservation Education Association Conference in Springfield Missouri) / by Edward A. Ames (Executive Director of Wave Hill Center for Environmental Studies).

Back to the leading semicolon. However, the Dublin Core properly displays the 4 separate keyword lists.

Phil Harvey

Sorry.  I just re-read StarGeek's post in which he suggested using XMP-pdf:Keywords.  This tag is a simple string, not a list like XMP-dc:Subject.

StarGeek suggested using XMP-pdf:Keywords because it avoided the leading semicolon, but it doesn't look like this is the right solution if Acrobat puts quotes around a complex string like this.

I don't understand why you continue to have this problem.  ExifTool can write anything you want, so it should be easy to reproduce exactly what Acrobat writes.

You have worn me down, so I finally gave in and tested this with a trial version of Adobe Acrobat.  Here is what I did (and what we have been trying to get you to do all along):

1. Write the Keywords you want using Adobe Acrobat.  (For this test I wrote "test 1; test 2; test 3", without the quotes)

2. Use ExifTool to see what was written:

> exiftool ~/Desktop/a.pdf -G1 -a -subject -keywords
[XMP-dc]        Subject                         : test 1, test 2, test 3
[XMP-pdf]       Keywords                        : test 1; test 2; test 3
[PDF]           Keywords                        : test, 1;, test, 2;, test, 3


3. Use ExifTool to write the same thing to another file:

> exiftool ~/Desktop/b.pdf -xmp-dc:subject="test 1, test 2, test 3" -xmp-pdf:keywords="test 1; test 2; test 3" -pdf:keywords="test, 1;, test, 2;, test, 3" -sep ", "
    1 image files updated


4. Open the other file in Acrobat to verify that the Keywords appear as desired --> YES THEY DO!

The only trick was using -sep ", " to split the strings into separate list items when writing.

Note that even though it looks nice in Acrobat, it is wrong.  Acrobat did not split the PDF:Keywords into separate items properly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

nbsusa

That worked like a champ! I apologize for not quite understanding everything you were asking me to do. I didn't realize I needed to do all 3. My fault and I appreciate everything everyone did. Thank you very much!

StarGeek

Adobe Reader completely ignored PDF:Keywords.  Annoying difference between two pieces of software by the same company.

"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

I have no idea which of those 3 tags are important.  However, Adobe Acrobat wrote them all, so that's what I did too.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).