How to Permanently Remove All Embedded XMP Metadata from PDF (incl. XMPToolkit)

Started by Darkestone, May 08, 2025, 08:16:06 PM

Previous topic - Next topic

Darkestone

Hi all,

I'm trying to remove all embedded XMP metadata from a PDF file, including tags like:

[XMP-x] XMPToolkit
[XMP-xmpMM] DocumentID
[XMP-xmp] CreatorTool
and any other [XMP-*] blocks
Here's what I've done so far:

exiftool -all= -overwrite_original example.pdf
Then I re-add only the clean, minimal metadata I want:

exiftool \
-Producer="Example Producer" \
-PDFVersion=1.4 \
-CreateDate="2025:01:01 00:00:00-00:00" \
-ModifyDate="2025:01:01 00:00:00-00:00" \
-Title="Example Document" \
-PageCount=5 \
-overwrite_original \
example.pdf
I've also flattened the file using Ghostscript:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH \
-sOutputFile="flattened.pdf" "example.pdf"
Issue: Despite this, when I run:

exiftool -a -G1 -s flattened.pdf
I still see XMPToolkit: Image::ExifTool 13.29 and other XMP tags (like DocumentID, MetadataDate, etc.).

Is there a reliable method to completely and permanently strip all XMP metadata so none of these tags appear in the output, even with full ExifTool flags?

Thanks for any help you can offer!

StarGeek

To remove just the XMP metadata, you would run
exiftool -XMP:All= /path/to/files/

The thing to remember is that exiftool uses the incremental update function of PDFs to update the files (see "Incremental Updates in PDF files", Debenu Foxit). This does not remove any previous data and such changes are reversible.

Does ghostscript properly deal with incremental updates? You may have to re-linearize the file with something like qpdf to make the metadata changes permanent. See the PDF Tags page.

I've been dealing with PDFs lately and the XMP-xmpMM tags are one I always remove.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

StarGeek

Turns out, I had ghostscript on my computer (used by some other program). Ghostscript is creating XMP data in the file. Some is copied from the corresponding PDF tags, and some is completely created.

In my test file, these tags were created by ghostscript
XMP-x:XMPToolkit
XMP-pdf:Producer
XMP-xmp:ModifyDate
XMP-xmp:CreateDate
XMP-xmp:CreatorTool
XMP-xmpMM:DocumentID
XMP-dc:Format

And ghostscript copied these tags from the same name PDF tags
XMP-dc:Title
XMP-dc:Creator

I would suggest running your ghostscript command first, then exiftool to remove the metadata you don't want, then qpdf to make the changes permanent.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Darkestone


Hi @StarGeek,

Thanks so much for your help and breakdown — it's been super valuable.

I followed your full recommended flow using Ghostscript, ExifTool, and QPDF in this order:

Flattened the PDF:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/prepress \
-dDetectDuplicateImages=true -dNOPAUSE -dQUIET -dBATCH \
-sOutputFile="flattened.pdf" "original.pdf"
Stripped all XMP metadata:
exiftool -XMP:All= -overwrite_original "flattened.pdf"
Re-added clean metadata:
exiftool \
-Producer="Example Producer" \
-PDFVersion=1.4 \
-CreateDate="2025:01:01 00:00:00-00:00" \
-ModifyDate="2025:01:01 00:00:00-00:00" \
-Title="Example Document" \
-PageCount=7 \
-overwrite_original "flattened.pdf"
Linearized with QPDF:
qpdf --linearize "flattened.pdf" "clean_output.pdf"
✅ Result
File opens correctly.
Shows all 7 pages.
Metadata like Producer, CreateDate, PDFVersion are intact.
❌ But the issue remains:
I still see the following when I run:

exiftool -a -G1 -s clean_output.pdf
[XMP-x]         XMPToolkit                      : Image::ExifTool 13.29
[XMP-dc]        Title                           : Example Document
[XMP-pdf]       PDFVersion                      : 1.4
[XMP-xmp]       CreateDate                      : 2025:01:01 00:00:00
[XMP-xmpMM]     DocumentID                      : uuid:...
So even after following the Ghostscript → ExifTool → QPDF workflow, those [XMP-*] blocks persist — especially the XMPToolkit, which is the main red flag I'm trying to remove.

Is there anything else I might be missing to fully remove these XMP traces — or a way to confirm if QPDF is locking them in?

Happy to provide a sample if helpful.

Environment Info:

OS: macOS Sonoma 14.4.1 (Apple Silicon)
ExifTool: 13.29
Ghostscript: 10.0.5
QPDF: 11.6.1
File Type: PDF v1.4
Thanks again for your time and guidance!

— Darkestone


StarGeek

Some of that is getting added back in when you set tags that have the same name in the PDF group and XMP group. Though I don't know how the XMP-xmpMM tag is surviving. It should be stripped

Try removing the XMP data in the same command after setting your data. Also, specify that you want to write PDF tags
exiftool -PDF:Producer="Example Producer" -PDF:PDFVersion=1.4 -PDF:CreateDate="2025:01:01 00:00:00-00:00" -PDF:ModifyDate="2025:01:01 00:00:00-00:00" -PDF:Title="Example Document"  -overwrite_original -XMP:All= "flattened.pdf"

I removed PageCount because it is not a writable PDF tag. Writing to PageCount in a PDF sets the XMP-prism:PageCount tag.

IF the XMP-xmpMM tags still survive, I would have to see an example file, because it's not behavior that I can reproduce here.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype