Removing all MetaData from PDF files

Started by craisin, February 21, 2012, 11:29:54 AM

Previous topic - Next topic

craisin


I am new to all this (I must say it all looks absolutely terrific!)

I want to REMOVE the metadata and hidden data from PDF files while still retaining the originals.

What command line do I write to do this?
I have tried running it already, but the original files seem to stay there4 and no new files are written.

Many thanks

Chris (craisin)
Australia

Phil Harvey

Hi Chris,

The command

exiftool -all= some.pdf

will "remove" all metadata from the file "some.pdf".  A file called "some.pdf_original" will be saved.  This is the original file before removing the metadata.  To delete the original when writing, use -overwrite_original.

I used quotes around "remove" because for PDF files the metadata is only removed from the document information dictionary, and not actually deleted from the file.  This allows the information to be recovered later with this command:

exiftool -pdf-update:all= some.pdf

You can read about this in the PDF Tags documentation.

It sounds from your description as if you would like to delete this information permanently from the file.  You should be able to do this by using the acrobat distiller to linearize the PDF after removing the metadata with ExifTool, but I haven't actually tested this myself.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

craisin

Thanks for such a quick and comprehensive response.

I DO want to remove the data permanently, so I will investigate the linearization (what a word!)
agter running the files through ExifTool.

Actually, would you need to do that, or would linearization remove it anyway?

Cheers
Chris

Phil Harvey

Hi Chris,

Simply linearizing the PDF will not remove the metadata.  I'm not sure how the acrobat distiller handles metadata, but unless it has a special option to remove it then you would need something like ExifTool to do this for you first.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I just had another idea.

On Mac OS when you print a file you can set the output to a PDF file, and the output PDF retains no meta information from the original (I believe).  So you could also remove the metadata using a technique like this, but note that the print conversion may affect the look of the document since it could change the layout for printing.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

craisin

I tried running Distiller but it says it does not process PDF files (it appears to only process Postscript files)...This is using the Distiller from Acrobat 5.0 (Full package).

Any ideas?

Cheers
Chris



Phil Harvey

Hi Chris,

I had thought that the distiller could linearize a PDF, but it seems I may be wrong about this.

I did a quick search but couldn't find a definitive answer, but I'd be surprised if Acrobat Pro won't do this.

- Phil

Edit: Searching adobe.com, I can't find any reference to Acrobat Pro being able to do this, so maybe it can't either.  But you don't need to linearize the PDF... any optimization or simply re-saving the PDF after removing metadata with ExifTool may do.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

craisin

#7
Thanks for that.

Is there a command line switch in ExifTool that could be incorporated to save the origibak file to a new name so that the removal of the Metadata is assured.

Perhaps something like the following?:

                    exiftool -all= "c:\data\test.pdf" > "c:\data\test.new.pdf"

Or perhaps if I extract a SPECIFIC metadata tag from the file in conjunction with the "-all=" switch it will write the file out (would that output file then need to be saved elsewhere to ensure the actual Metadata has gone?)

Also - I do not understand what the "DIR" seems to be doing in a lot of your command line operations. Can you please clarify that for me?

Many thanks

Chris

Phil Harvey

Hi Chris,

You can set the output file name to anything you want with the -o option:

exiftool -all= FILE -o OUTFILE

But I'm not sure what you are hoping to accomplish.  The changes to PDF files are always reversible with -pdf-update:all=.

When I use all capitals in a command, it symbolizes something that you must type in based on your specific situation.  FILE is any file name (such as "flowers.jpg").  DIR is any directory name (such as "c:\pictures").  However, exiftool accepts both file and/or directory names for most commands.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

craisin

Thanks Phil,

It was just that you said that if I saved the file after "removing" the Metadata then it would permamenty remove that information, but it appears to not be the case by your comments.

Since on a MAC you can print to a PDF file, I will try the same on my windows system using a PDF Printer to see what happens.

Thanks again for your help.

Cheers
Chris

Phil Harvey

Quote from: craisin on February 22, 2012, 12:16:50 AM
It was just that you said that if I saved the file after "removing" the Metadata then it would permamenty remove that information, but it appears to not be the case by your comments.

Sorry for not being clear.  I meant saved by any PDF editor, such as Acrobat.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

craisin

OK - gotcha.

So if I open the file using acrobat then save it, the Metadata will then be really removed?

Cheers
Chris

Phil Harvey

Hi Simon,

It should be.  But I don't have Acrobat so I can't verify this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

craisin

Thanks Phil - I am Chris, but I suppose the message was meant for me LOL

You are (after all) very busy at your end I suppose....Thanks again for you help.

Cheers
Chris

Phil Harvey

Ooops. Sorry Chris.  Yes, the message was for you.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).