ExifTool Forum

ExifTool => Newbies => Topic started by: mackey on September 08, 2021, 11:38:09 AM

Title: Batch removing metadata from 300+ PDFs in subfolders
Post by: mackey on September 08, 2021, 11:38:09 AM
Hi, I am in the process of organizing a company's entire WooCommerce product catalog in preparation for uploading to their new website. I found that the PDF schematics attached to most products have some metadata that we would like to strip (Author, Title, sometimes the file location of the original file on the computer it was created on C:\etc. etc.). The PDFs (~320 of them) are in a bunch of different subfolders, is there a way to strip all of the PDF metadata from any PDFs in the entire root folder?

I've tried doing this with Acrobat and for some reason the quality of the PDF drawings deteriorated greatly. Any help would be appreciated!
Title: Re: Batch removing metadata from 300+ PDFs in subfolders
Post by: StarGeek on September 08, 2021, 12:22:53 PM
The simplest command would be
exiftool -All= -ext pdf /path/to/pdfs/

But exiftool's edits to PDFs are reversible (see the PDF tags page (https://exiftool.org/TagNames/PDF.html)) and the files would have to be re-linearized to permanently remove the data, which isn't as easy to do in batch.  That link gives a command that can be used with qpdf (http://qpdf.sourceforge.net/), but that program can only be used on one file at a time and doesn't directly edit the original.

I would think that there should be an option somewhere on Acrobat to tell it not to re-compress the images, but I don't have access to it to double check.
Title: Re: Batch removing metadata from 300+ PDFs in subfolders
Post by: mackey on September 10, 2021, 11:38:19 AM
Maybe I can get away with just running exiftool, not sure how many people will care enough to try to reverse the process to see the metadata (there's nothing sensitive, I just wanted to clean the files up). If I run exiftool like you said will it seek out and find any/all PDFs in the subfolders?
Title: Re: Batch removing metadata from 300+ PDFs in subfolders
Post by: StarGeek on September 10, 2021, 12:00:15 PM
Add the -r (-recurse) option (https://exiftool.org/exiftool_pod.html#r-.--recurse) to recurse into subdirectories.  Don't  use something like *.pdf, as that will block recursion (see above link and Common Mistake #2 (https://exiftool.org/mistakes.html#M2)).  Just pass a directory name and the -ext (-extension) option (https://exiftool.org/exiftool_pod.html#ext-EXT---ext-EXT--extension) will limit processing to PDFs.

The above command will create backup files. Add -overwrite_original option (https://exiftool.org/exiftool_pod.html#overwrite_original) to suppress this.