Unable to remove metadata from PDF even temporary

Started by bertalanimre, February 03, 2017, 03:54:45 AM

Previous topic - Next topic

bertalanimre

Hey Forum!

I know that changes with Exiftool in PDF are reversible, please keep in mind, my main goal is (yet) not to erease data, just to modify it.

According to the manual, this should be a working command I assume:
exiftool -args -extractEmbedded -all:Creator= Cleaned/out.pdf

The output is:
    0 image files updated
    1 image files unchanged


If I examine the command with v2 enabled and grepped to the word "Creator", I get the following:
Deleting PostScript:Creator
Deleting PDF:Creator
Deleting XMP-iptcExt:Creator
Deleting XMP-pdf:Creator
Deleting XMP-dc:Creator

However, if I ask for the detailed information in the file and I grep it to Creator again, I'm still having my creator named in the file:
Without v2:
-Creator=John Doe
With v2:
  | | | CreatorTool = Adobe InDesign CS6 (Macintosh)
  | | | - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/xmp:CreatorTool'


Even if I use -all:all=, the information still remains in that PDF. Can it be write protected? How can I erease the information without damaging the content of tables?

How is this even possible? Did I execute the command wrong? Please give me a hint or something.

Thanks in advance!

Bert

bertalanimre

Wuhu, found it!

It was GhostScript which helped me out. Altho it is not as good as ExifTool with PDF-s, but it does remove the embedded metadata without harming the table of contents. After this, I can simply remove all the metadata with exiftool and make it permanent with a qpdf linearization.

Phil: If you are intrested I can give the whole process to you in a bash script file I'm working on at the moment. It is not a big deal but might give you some good thoughts. :)
Modify message

Phil Harvey

Quote from: bertalanimre on February 03, 2017, 03:54:45 AM
According to the manual, this should be a working command I assume:
exiftool -args -extractEmbedded -all:Creator= Cleaned/out.pdf
Without v2:
-Creator=John Doe
With v2:
  | | | CreatorTool = Adobe InDesign CS6 (Macintosh)
  | | | - Tag 'x:xmpmeta/rdf:RDF/rdf:Description/xmp:CreatorTool'


Of course "CreatorTool" is still there because you just deleted "Creator".  The -v2 option won't show tags which have been deleted (even though they still remain in dead sections of the PDF file).

Quote from: bertalanimre on February 03, 2017, 05:59:32 AM
Phil: If you are intrested I can give the whole process to you in a bash script file I'm working on at the moment. It is not a big deal but might give you some good thoughts. :)

It isn't useful for me, but it would be of use to others, so posting it here would be great.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

bertalanimre

OK, so the following set of commands do the following: It deletes not just plain metadata but also the embedded metadata from PDF files which normally won't be deleted by Exiftool

Requirements:

  • Exiftool 10.35
    Qpdf 6.0.0
    GhostScript 9.20


Commands that can be put into a bash script if you wish:
gs -q -sOutputFile=temp.pdf -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dPDFSETTINGS=/prepress input.pdf
exiftool -args -extractEmbedded -all= temp.pdf 2> /dev/null
exiftool -args -extractEmbedded -XMPToolkit= ./temp.pdf 2> /dev/null
rm -rf input.pdf
qpdf --linearize temp.pdf input.pdf
find ./ -name '*.pdf_original' -type f -exec rm -rf {} \;
rm -rf temp.pdf


I hope it helps somebody. I've spent the last 2 days getting this together in a quite complicated script cleaning PDFs and InDesign files :)

Phil Harvey

Just a few comments:

You can avoid the "find" line by adding -overwrite_original to you ExifTool commands.  Also, the -args and -extractEmbedded options do nothing when writing.  As well, I don't see what use -XMPToolkit= is since you already removed all XMP with -all= earlier.  So I think this single ExifTool command will do what you want:

exiftool -all= -overwrite_original temp.pdf 2> /dev/null

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

bertalanimre

Cheer Phil for the ideas. In the next version, I'll implement it.

However the XMPToolkit is being written back into the file once. Honestly I've skipped one line from my own script where I add a title and an author with Exiftool. Then the XMPToolkit get's filled up telling that the file was modified with Exiftool. That is why I've got this last part sticked in.

But cheers again. You are a great help to all of us. :) And I thank you for that!

Phil Harvey

OK then, the command would be:

exiftool -all= -xmptoolkit= -author="author name" -title="some title" -overwrite_original temp.pdf 2> /dev/null

Still, no need for multiple commands.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

bertalanimre


soozie

Phil,

I deleted my pdf metadata Creator and Creator Tool by using:

exiftool -creator= xxx.PDF

Is this thread saying they are not actually deleted but rather still stored somewhere in the PDF?  I am using Windows so I am guessing some of this conversation doesn't apply to Windows.  Is that correct?

Thanks

Quote from: Phil Harvey on February 03, 2017, 11:26:32 AM
OK then, the command would be:

exiftool -all= -xmptoolkit= -author="author name" -title="some title" -overwrite_original temp.pdf 2> /dev/null

Still, no need for multiple commands.

- Phil

Phil Harvey

ExifTool does the same thing on all platforms.  Windows is not special.

I think that reading the start of the PDF tags documentation will answer your question.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).