Removing PDF Subject Tag with Exiftool

Started by mrgou, January 22, 2020, 05:00:56 PM

Previous topic - Next topic

mrgou

Hi

Can someone help me understand why the following command will overwrite the Title tag, but not delete the subject tag of my PDF file?

exiftool -Title="My Title" -Subject= myfile.pdf

Oddly, the value of the Subject field still shows in Acrobat Reader DC or PDF Architect, but not when extracting metadata from Exiftool:

exiftool -a myfile.pdf

Upon inspection of the PDF file in a text editor, I can still see this towards the end of the file:

/Title()
/Subject(Original_value)


The desired values only show in a later %BeginExifToolUpdate section, which, I presume, other PDF applications don't take into consideration.

Thanks!

R.

StarGeek

See the 3rd paragraph under PDF tags.

So using a text editor to look through the file will find the original data, as mentioned in that link.

I do find it odd that Acrobat would see that data, as exiftool is following Adobe's rules for incremental updates.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

mrgou

QuoteI do find it odd that Acrobat would see that data, as exiftool is following Adobe's rules for incremental updates.

Agreed. Even after linearizing with qpdf, the result is the same. As Acrobat Reader is a reference implementation, I'm not sure if a bug should be considered somewhere. Anyway, PDF Architect shows the same values.

I actually noticed that reprocessing the file through GhostScript's ps2pdf sets the expected blank value in the subject field.

Phil Harvey

This is unsettling.  It has been tested previously with Adobe products and worked as specified at that time.  I don't have time to re-test it now, but I'll look into this with a current version of Adobe Reader when I get a chance.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

mrgou

Here's a test scenario for your consideration:


  • I produced the attached PDF file with PDF Creator, setting original values:

  • Then, I used exiftool: exiftool -Title="New title" -Subject= blank.pdf. Metadata is not as expected in Acrobat Reader DC:

  • However, if I use ps2pdf: ps2pdf blank.pdf blank-reprocessed.pdf, I end up with the expected ouput:

Acrobat Reader version:


I hope this helps.

StarGeek

What happens if you use this
exiftool -Description= myfile.pdf

Using exiftool to look at the data for your example, it shows that PDF Creator fills both PDF:Subject and XMP:Description with your "Original subject".

I should have looked at my notes on Adobe reader.  My previous research, though a couple years old, showed that Adobe Reader will fill the "Subject" field with data from these tags
PDF:Subject
XMP-dc:Description
XMP-pdf:Subject
XMP-xmp:Description


The last two are probably pretty rare, but all of these would be cleared with
exiftool -subject= -description= myfile.pdf
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

mrgou

Yes, removing Description gets the expected results :-)

Note that I initially had the issue from a file that wasn't produced by PDF Creator, so I suppose that this way of setting the metadata isn't uncommon.

Thanks!

Phil Harvey

Glad you figured it out.  I had just tried this myself with a different PDF file and couldn't reproduce these results.

Thanks StarGeek.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).