I only want to keep the metadata that's necessary, the rest I want deleted

Started by meta, January 25, 2025, 02:07:09 AM

Previous topic - Next topic

meta

Hi,

I need to upload a document but don't want any metadata to reveal about myself or computer. The file in question is a png. One of things that I would not like to reveal is the name of my computer, but if there is anything else I can eliminate from the meta that would be nice as well. I want to reduce the meta data as much as possible. Would the process be the same for PDF uploads?

What commands would I need to run to remove what your suggesting? Thank you in advance.

My machine is linux debian. I have exiftools installed

exiftool /home/mycomputersname/Pictures/'sample file.png'
ExifTool Version Number         : 12.57
File Name                       : sample file.png
Directory                       : /home/mycomputersname/Pictures
File Size                       : 106 kB
File Modification Date/Time     : 2025:01:23 19:16:58+11:00
File Access Date/Time           : 2025:01:23 19:16:58+11:00
File Inode Change Date/Time     : 2025:01:23 19:16:58+11:00
File Permissions                : -rw-r--r--
File Type                       : PNG
File Type Extension             : png
MIME Type                       : image/png
Image Width                     : 868
Image Height                    : 438
Bit Depth                       : 8
Color Type                      : RGB with Alpha
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Image Size                      : 868x438
Megapixels                      : 0.380


I did come across this note about jepg in FAQ (see below), but my question relates to PNG and PDF

Quote32. "How do I safely delete all metadata from a file?"

    First of all, all metadata shouldn't be removed from some file types (such as RAW images) because this information is necessary for display of the image. JPEG is the most popular image format and most suited to erasing all metadata because the image and metadata are well separated in this format. However, even with JPEG images care should be taken because the metadata may contain color space information which should be maintained to preserve the color rendition.

    Here is a command that may be used to safely delete all metadata from .JPG images in a directory:

    exiftool -ext jpg -all= --icc_profile:all -tagsfromfile @ -colorspacetags DIR

    This command deletes all metadata except the ICC Profile if it exists, then copies back any EXIF color space tags (adding any mandatory EXIF tags using default values if necessary).

greybeard

I can answer part of your question - you don't need to worry about the computer name - in your example its part of the directory name and not stored as metadata in the image.

You can see this by changing to the Pictures directory and running the command:

exiftool 'sample file.png'

You will no longer see the computer name

Phil Harvey

Run this command on the file:

exiftool -G1 FILE

... anything in the "System" group is part of your file system and not metadata inside the file itself.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: meta on January 25, 2025, 02:07:09 AMI did come across this note about jepg in FAQ (see below), but my question relates to PNG and PDF

The command listed in the FAQ works for PNGs as well.

PDFs are a different story. From the PDF Tags page
QuoteIt uses an incremental update technique that has the advantages of being both fast and reversible. If ExifTool was used to modify a PDF file, the original may be recovered by deleting the PDF-update pseudo-group (with -PDF-update:all= on the command line).
...
All metadata edits are reversible. While this would normally be considered an advantage, it is a potential security problem because old information is never actually deleted from the file. (However, after running ExifTool the old information may be removed permanently using the "qpdf" utility with this command: "qpdf --linearize in.pdf out.pdf".)

If you want to remove all metadata from a PDF, there are better tools out there. At the very simplest, you can use the OSs "Print to PDF" printer option to "reprint" the PDF into a new file. The new file will not have any metadata, but you will no longer be able to copy any text from the file as has also been removed.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

meta

Quote from: Phil Harvey on January 25, 2025, 08:44:03 AMRun this command on the file:

exiftool -G1 FILE

... anything in the "System" group is part of your file system and not metadata inside the file itself.

- Phil
Thanks, I ran that command and can see in brackets [SYSTEM] and [PNG], and since the name of my computer appears under [SYSTEM] it solves my problem.

There would be no need to reduce the metadata under [PNG] as it doesn't reveal anything of substance. For an upload to occur some metadata is required - is that correct?

I have another question similar to this, I need to scan a document then submit it,  which  file format would give the least metadata when it comes to scanning documents from a linux machine. What's the least invasive file format for a scan that needs to be submitted from the get go? Does it matter which scanner I am using? I am using a Cannon.

StarGeek is one solution to my scanning and uploading question but I was wondering if there is another alternative too?
Quote from: StarGeek on January 25, 2025, 10:13:15 AMuse the OSs "Print to PDF" printer option to "reprint" the PDF into a new file. The new file will not have any metadata,

StarGeek

Quote from: meta on February 01, 2025, 03:17:56 AMI have another question similar to this, I need to scan a document then submit it,  which  file format would give the least metadata when it comes to scanning documents from a linux machine. What's the least invasive file format for a scan that needs to be submitted from the get go? Does it matter which scanner I am using? I am using a Cannon.

Technically? A BMAP, because that format doesn't allow any embedded metadata whatsoever. But it's something you should test out with your scanner and the scanning program. I don't believe that most scanners embed any really identifiable data in the files they create. Maybe the Make/Model, but there shouldn't be too much more than that.

I can tell you that VueScan doesn't embed any data beyond what is necessary in the files it creates. It's a paid program, but it is an excellent choice and worth it if you scan a lot of documents and is multi-platform.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype