Removing all MetaData from PDF files

Started by craisin, February 21, 2012, 11:29:54 AM

Previous topic - Next topic

Phil Harvey

Actually, thinking about this now, I realize that it may be sufficient, and maybe not too difficult, to just zero out the old data.  The file still wouldn't be linearized, but at least the old metadata would be gone.  I'll look into this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

It turns out there are 3 problems with this:

1. It is harder than I had hoped to simply zero out the existing metadata.

2. The solution wouldn't be complete because there could already be unused objects containing old metadata in the original PDF, and ExifTool wouldn't be able to zero out these.

3. It has been advertised that ExifTool PDF edits are reversible, and some users may be relying on this feature.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

IWTA

Phil
Removing metadata (not hiding) in PDF files is a very urgent problem. It is relevant because it is one of the most common formats for exchanging documents, and there are no alternatives for deleting metadata in PDF files. Anyway, this problem needs to be somehow solved. Hiding metadata is not a way out. And I don't know whether this "hiding" of metadata is necessary at all? Hide metadata from yourself? But that makes no sense! And from other people, such "hiding" of metadata does not work.
I think it's correct to delete (rather than "hide") metadata HOW IT IS POSSIBLE for this format. And before or after deleting the metadata, display a message saying that deleting metadata in this format will be irreversible and possibly incomplete.
And IF there are people who need the function of "hiding" metadata (but I doubt it), then you can implement the "hiding" through an additional command.

Phil Harvey

The technique is to run ExifTool, then re-linearze the PDF with a PDF utility.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Quote from: Phil Harvey on August 14, 2019, 07:32:49 AM
The technique is to run ExifTool, then re-linearze the PDF with a PDF utility.

IWTA, to re-linearise after removing metadata you can e.g. use qpdf (on a Mac you can install it easily with brew: brew install qpdf).

For example:
# Strip metadata example.pdf and re-linearise:
exiftool -all= -overwrite_original example.pdf
mv example.pdf temp.pdf
qpdf --linearize temp.pdf example.pdf
Hayo Baan – Photography
Web: www.hayobaan.nl

IWTA

Phil
Is it possible to add the LINEARIZATION FUNCTION CODE from the qpdf program to the ExifTool program? So that in the end result, to completely remove metadata from the PDF file, the metadata was "hidden" first (a function already implemented) and re-linearization was automatically started immediately. This will allow you to completely delete metadata using only one already familiar metadata removal command in ExifTool and without using third-party applications. Is it possible to do this?

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

IWTA

Using a bunch of ExifTool and qpdf programs doesn't guarantee complete removal of metadata?

Hayo Baan

If you follow the procedure as I outlined in my previous post, all metadata should be gone, completely and irreversible.
Hayo Baan – Photography
Web: www.hayobaan.nl

StarGeek

Then there's also the option of using the OS Print to PDF driver.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

IWTA

Quote from: Hayo Baan on August 14, 2019, 08:01:23 AM
# Strip metadata example.pdf and re-linearise:
exiftool -all= -overwrite_original example.pdf
mv example.pdf temp.pdf
qpdf --linearize temp.pdf example.pdf

I found a .bat file on the network with the following contents:
exiftool -overwrite_original -all:all="" %1
qpdf --pages %1 1-z -- --empty output.pdf
REM qpdf --linearize %1 output.pdf
move output.pdf %1
REM pause

I made a shortcut to this .bat file and if I need to clear the document of metadata, I drag the desired PDF file to this icon and the metadata is deleted. But after processing in this way, tails remain. If you open the file after processing in a text editor, then there you can see in which program the document was created. Also, the values ​​remain DocumentID, OriginalDocumentID, LastModified.
I created a .bat file from the command you proposed and dragged the document onto the icon of this .bat file, but file processing never happened. :(
Maybe need to correct the metadata deletion command that you proposed so that it can be used as a .bat file?


IWTA

Quote from: StarGeek on August 14, 2019, 03:26:31 PM
Then there's also the option of using the OS Print to PDF driver.
What good PDF printer would you recommend to use to remove metadata from a file so that the contents of the document do not change?

StarGeek

Quote from: IWTA on August 15, 2019, 07:55:38 AMWhat good PDF printer would you recommend to use to remove metadata from a file so that the contents of the document do not change?

To be honest, it is not something I've tried myself, as I don't create PDFs.  But it is something I've read about on many occasions.

Windows has one built in with Windows 10.  I'm pretty sure the Mac also has one, though I don't know for how long (found a web page that says at least 10 years).  No idea on Linux.

If you're on an earlier version of Windows, looking around I find ones by Bullzip and CutePDF.  Foxit Reader supposedly includes one as well.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

obetz

Quote from: StarGeek on August 15, 2019, 11:33:36 AM
If you're on an earlier version of Windows, looking around I find ones by Bullzip and CutePDF.  Foxit Reader supposedly includes one as well.

https://pdf24.org/ PDF24 Creator is widespread and feature-rich. Don't know about their motivation (business model?).

Foxit became insecure bloatware over the years, similar to Adobe.

CutePDF has been known to install third-party software but currently it might be clean (don't know)

Oliver

StarGeek

Quote from: obetz on August 15, 2019, 01:59:15 PM
Foxit became insecure bloatware over the years, similar to Adobe.

Yeah, I had heard that as well.  Which is why I wasn't sure as I no longer have it installed.

QuoteCutePDF has been known to install third-party software but currently it might be clean (don't know)

Ah, yes, that was mentioned in the post I read.  But there is an Ninite installer and a Chocolatey installer for it, so that can help avoid the 3rd party stuff.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).