Adding first keyword to pdf splits it into two keywords

Started by vicarage, June 24, 2023, 02:25:50 AM

Previous topic - Next topic

vicarage

Linux Mint. Starting with a pdf and a jpg with no keywords.

exiftool -ver
12.63
exiftool -q -overwrite_original_in_place -Keywords+="Fred Bloggs" 1.jpg 1.pdf
exiftool -q -overwrite_original_in_place -Keywords+="Sally Ryde" 1.jpg 1.pdf
exiftool -Keywords 1.pdf 1.jpg
======== 1.pdf
Keywords                        : Fred, Bloggs, Sally Ryde
======== 1.jpg
Keywords                        : Fred Bloggs, Sally Ryde
    2 image files read

See how the pdf has incorrectly split the first keyword, while the jpg has the expected behaviour

StarGeek

Interesting. Tested this with the earliest version I still have, 10.13, and the same result.  But only when adding a single keywords.

Adding two with += works correctly
C:\>exiftool  -P -overwrite_original  -Keywords+="Fred Bloggs" -Keywords+="Sally Ryde"  Y:\!temp\test.pdf
    1 image files updated

C:\>exiftool -G1 -a -s -sep ## -Keywords Y:\!temp\test.pdf 
[PDF]          Keywords                        : Fred Bloggs##Sally Ryde

Adding only a single keyword with just = splits it on PDF:Keywords, but also creates XMP-pdf:Keywords.  That's because += for a string tag doesn't work because it would be a shift command for a number tag.
C:\>exiftool -P -overwrite_original -Keywords="Fred Bloggs" Y:\!temp\test.pdf
    1 image files updated

C:\>exiftool -G1 -a -s -sep ## -Keywords Y:\!temp\test.pdf 
[XMP-pdf]      Keywords                        : Fred Bloggs
[PDF]          Keywords                        : Fred##Bloggs

Adding two directly using only the equal sign, writes correctly.
C:\>exiftool  -P -overwrite_original  -Keywords="Fred Bloggs" -Keywords="Sally Ryde"  Y:\!temp\test.pdf
    1 image files updated

C:\>exiftool -G1 -a -s -sep ## -Keywords Y:\!temp\test.pdf 
[XMP-pdf]      Keywords                        : Sally Ryde
[PDF]          Keywords                        : Fred Bloggs##Sally Ryde
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

This is an unfortunate result of the lack of consistency in the PDF list format.  Sometimes lists are stored as space-delimited words, and sometimes comma-delimited.  When ExifTool reads PDF:Keyords and it doesn't contain any commas, then it assumes space-delimited.

But note that PDF:Keywords is deprecated in PDF 2.0, so I would recommend not using it.  (ExifTool will give a minor error if you try to write this to a PDF 2.0 document.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Ah, that's a real pain.  And that means you can't add keywords with commas.

C:\>exiftool -P -overwrite_original -all= -pdf:keywords="Smith, John" -pdf:keywords="Jane Doe" Y:\!temp\test.pdf
Warning: [minor] ExifTool PDF edits are reversible. Deleted tags may be recovered! - Y:/!temp/test.pdf
    1 image files updated

C:\>exiftool -G1 -a -s -sep ## -keywords Y:\!temp\test.pdf
[PDF]           Keywords                        : Smith##John##Jane Doe

C:\>exiftool -P -overwrite_original -all=  -pdf:keywords="Jane Doe" -pdf:keywords="Smith, John" Y:\!temp\test.pdf
Warning: [minor] ExifTool PDF edits are reversible. Deleted tags may be recovered! - Y:/!temp/test.pdf
    1 image files updated

C:\>exiftool -G1 -a -s -sep ## -keywords Y:\!temp\test.pdf
[PDF]           Keywords                        : Jane Doe##Smith##John
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

vicarage

That does seem unwise behaviour on exiftool's part. If it just returned what it saw, the user would see

Fred Bloggs
Fred Bloogs John Smith
Fred Bloggs, John Smith

and searching the text for "Fred Bloggs" or "Smith", typical use cases, would work. As it is

Fred, Bloggs, John Smith

needs a 2 pass search with both variants because the result is unpredictable, and you are asserting there are 3 keyword phrases when you don't really know that.

In append mode, why not get the phrase returned, and then submit it with ", John Smith" appended. If that's stored as "Fred Bloggs John Smith", you are no worse off.

I am working on this workaround

exiftool -Keywords+="Fred Bloggs,"pdf' 1.pdf
exiftool -Keywords 1.pdf
Keywords                        : Fred Bloggs
exiftool -Keywords+="John Smith," 1.pdf
exiftool -Keywords 1.pdf
Keywords                        : Fred Bloggs, John Smith

which helps documenting my own files, but I can't expect others to adopt it. Is their a downside to exiftool adding that comma?

Phil Harvey

#5
I don't have time right now to think about the ramifications of adding an unnecessary comma, but I did check a couple of Adobe-generated PDF files to see how the PDF:Keywords were actually stored.  They are stored as a PDF string (strings in PDF are enclosed in round brackets).  Here are two examples copied straight from the PDF data:

/Keywords (adobe photoshop sdk api file formats psd tiff jpg jpeg nptc-naa)

/Keywords (Security feature user guide, digital signatures, security policies)

Note that your suggestion would break things for anyone trying to add/delete list items from the first example above.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I found an old StackOverflow thread on this topic.  I like the top answer.

Also, you can disable the ExifTool List-behaviour of this tag by setting the API NoPDFList option if that helps.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

vicarage