Remove metadata from multiple different type of files simultaneously

Started by 216ann, August 22, 2021, 02:33:23 AM

Previous topic - Next topic

216ann

Hi, I am a total newbie.
I broke up with my BF and he has given me permission to get my files off his Win 7 laptop.
I am getting frustrated with the <R Click > Properties > Details > Remove all properties and personal data > Make a copy with all possible properties removed> because I have to do it for thousands of different types of files (jpg, pdf, doc, video, etc).
Even though you can select several files at once, the folders' contents don't get sanitized.
Furthermore, I have to manually delete all the original files and I get the annoying -Copy after each and every file.
Plus I don't trust the inbuilt Win7 function to delete all the metadata.
And when I transfer to the external HD and then to my Linux desktop, won't that collect metadata?

Is there a way using Exiftool or any other FOSS or similar free but reliable method to completely remove all the metadata off all files (like an entire data partition?) that I am trying to copy over to my external HD? 
I am going to use Linux and so I would be satisfied with a way to remove before I copy, after I copy to external HD, or even after I have already copied to my Linux desktop.
And I heard that there was an "undo" delete type function with exiftool...how do I make sure the metadata is deleted permanently?
I tried posting on Superuser.com and on Reddit r/Windows 7 but either my post gets deleted or noone responds. 
I presume it is because I had never before posted on those sites and they think I am a spamming bot.
I am not doing anything illegal, I only want to erase all traces of my BF (as owner/author) and his computer (Win7, where, when, etc) off MY files.
Please let me know.  Thank you. 

StarGeek

Quote from: 216ann on August 22, 2021, 02:33:23 AM
Is there a way using Exiftool or any other FOSS or similar free but reliable method to completely remove all the metadata off all files (like an entire data partition?) that I am trying to copy over to my external HD?

Exiftool can remove data from any file type that has Write support on the Supported File Types table but there are some caveats.  For example, you mention Doc files, which exiftool cannot edit.  Also, any edit that exiftool makes to a PDF file can be reversed unless the file is re-liniarized (see the text at the top of the PDF tags page).  Finally, exiftool will normally create a backup of any file it edits but that can be suppressed by adding the -overwrite_original option.

But doing this will remove all data, including the things like the date/time that the image/video was taken. And if you used RAW files of any type (.NEF, .CR2/3, .ARW, etc), it will destroy the image.

The bigger question is why do you need to remove the data from files that you are removing from the ex's computer?  That just means you have files that have no useful metadata on your own computer, which they supposedly won't have access to anymore anyway.

I would think you would have more of a problem with the ex running a recovery program to get back the files you removed. The actual contents of the files would be more sensitive than the metadata.

I would suggest moving all your files into a single directory on the computer, then quickly copy (not move) that entire directory over to your computer, and then use a program that securely erases the files.  This is usually done by overwriting the original file with 0s, often multiple times, before actually deleting the file.  That way, if the ex tries to run a recovery program, the file they get back has been changed to all 0s and is for all intents and purposes, useless.  A quick Google search pulls up this page which has a list of multiple programs which will securely erase files.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

216ann

@StarGeek, Thanks for your info.
It may sound weird to you but I just don't want to have anything to do with him.
I don't want anything to remind me of him.
I don't want anyone seeing the files (like a new BF or friend) to know about my prior association with him.
I learned he does some sketchy things with his computer and I don't want to be linked to that.
For example, if I upload a file or submit a file somewhere, I don't want it and me to be linked to my ex. 
He has had plenty of opportunity to copy or misuse my files that had been on his computer so I am not as worried about that and certainly can't do anything about that now even if I forced him to wipe and reset the computer.
So I am much more concerned about making sure none of him or his computer is linked to any of my files, I appreciate your concerns though. 

Is there a list of the type of files which exif will not work on (eg, doc).
Is there a list of RAW file types which would be destroyed. 
What is the cmd that I have to use to not only eliminate the metadata off an entire directory? Or at least a file with subfolders? 
I was especially confused about that reliniarized thing you talked about it. 
How would I do that or how would one recover the PDFs deleted metadata?
I read the page: https://exiftool.org/TagNames/PDF.html
But it is a bit confusing: Would I just add "qpdf --linearize in.pdf out.pdf" to the cmd line that you are going to teach me?
And to reverse/recover (if I didn't do that gpdf), would I just separately delete the PDF-update pseudo-group (with -PDF-update:all= on the command line)?  So I just type the folder path and -PDF-update:all=   cmd? 
Sorry about the basic questions, I am very, very new to this.  Thanks. 

StarGeek

Quote from: 216ann on August 22, 2021, 06:23:10 AM
Is there a list of the type of files which exif will not work on (eg, doc).

See the Supported file types table I linked above.  Anything with a W in the Support column can be written and data removed.

QuoteIs there a list of RAW file types which would be destroyed.

Not specifically, but odds are if you don't know about what RAW file types there might be, then there probably isn't any.  RAW files are those that come from digital SLR cameras and have to be processed in a program like Lightroom.  So unless the ex was carrying around a separate, more traditional looking camera, not a camera phone, you probably don't have to worry about it.

QuoteWhat is the cmd that I have to use to not only eliminate the metadata off an entire directory? Or at least a file with subfolders?

Put all the files and directories in a single directory.  Open up a CMD window (see this post).  Then type in (or copy/paste) this:
exiftool -overwrite_original -r -All=
Hit the space bar (very important)
Then, drag the directory with the files you want to remove metadata onto the CMD window like this

Then hit enter. 

Exiftool will then go through every file in the directory and all subdirectories of the one you dropped onto the CMD window and remove as much data as you can.

Ooops, just realized you said linux.  I'm pretty sure the process would be similar, though I don't know how to open a bash/shell on linux.  But the command would be exactly the same.

QuoteI was especially confused about that reliniarized thing you talked about it. 
How would I do that or how would one recover the PDFs deleted metadata?
I read the page: https://exiftool.org/TagNames/PDF.html
But it is a bit confusing: Would I just add "qpdf --linearize in.pdf out.pdf" to the cmd line that you are going to teach me?
And to reverse/recover (if I didn't do that gpdf), would I just separately delete the PDF-update pseudo-group (with -PDF-update:all= on the command line)?  So I just type the folder path and -PDF-update:all=   cmd? 

It would require downloading the qpdf program.  And you would run it separately.  Unfortunately, it's not a program that will process multiple files, only one file at a time, creating a new file each time i.e. it reads the "in.pdf" file and creates the "out.pdf" file.  You change in.pdf/out.pdf to the actual filenames you want to use.

But there's also a very good chance that you really don't need to do this for PDF files, as very few people ever bother to change metadata in PDFs.  Run exiftool on them and see if anything actually needs to be removed.  It also might be easier to look for a program that will do this for PDFs in batch.  For example, I found this page which has some Windows programs that will do so.  On Linux, most of the answers I could find are for command line programs, such as the above mentioned qpdf or PDFTk
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

216ann

@StarGeek
Thanks so much for your help. 
I even had trouble registering because I didn't know that "documentation" meant one of the hypertexted files to find the word in the challenges.
So I will have to ask my friend for help.

Don't I have to download the exiftool or does it work natively in Win7?
How is this exiftool different from the R Click > Properties method? 

Can we run the cmd by dragging the external HD's directory over to the cmd box or must it be on the laptop HD?
If we sanitize, then we copy over to the external HD or we copy over to Linux, isn't that going to create metadata by itself (eg, now "computer" is the brand/model of external HD or whatever)? 
Would it be wiser to erase the metadata on the end target computer?
Is this easier on Linux or harder?
Is bash/shell the same as terminal using sudo or similar?

So if we have H.265 files then we don't have to worry only if we have RAW type files?
How would one sanitize multiple RAW files safely?

Does the R Click method of Win7 erase the pdf metadata or not?
When I run that cmd, will exiftool simply ignore the pdf?
Or will there be errors or a cutoff?
So after I run the cmd I have to search through all the folders and subfolders to find all the pdfs? 
The main info I want to get rid of is author, owner, computer, date created, and other metadata which can be used to track or link my files with my ex. 
If pdfs don't have that type of data then I guess that would be ok but I can't imagine that they wouldn't. 
How do I just have the output pdf overwrite the input one?
A Quick look at the batch pdf programs you linked to seems to say that they leave the pdf metadata reversible.  It also is unclear whether they will handle folders which have a mix of pdfs and other types of files.  I also have odf and other pdf like files.  Would those require a separate special metadata remover? 
Is there a free or FOSS type program that will take care or automate most of what you are discussing (eg, just drag into a sandbox type program which will sanitize all files including pdfs and non pdfs safely and securely?)?

Sorry about the interrogation.  I don't want to screw up my files.  I already spent over a hundred hours trying to delete the metadata manually and want to minimize the additional time I waste.  And I definitely don't want to think that I deleted all the metadata and then find out the hard way that I didn't. 
Thanks again for your help.  I appreciate your time because you are saving me a lot of it. 

StarGeek

Quote from: 216ann on August 22, 2021, 05:21:54 PM
Don't I have to download the exiftool or does it work natively in Win7?

Yes, you will also have to download exiftool.  On Windows you can use the standalone Windows executable, which you just need to rename to remove the (-k) and drop it in a directory that Windows normally checks for programs to run, i.e. a directory that is part of the PATH env variable.  See Installing Exiftool-Windows.  There is also the Alternative exiftool for Windows which uses an installer and sets up the PATH variable for you.

QuoteHow is this exiftool different from the R Click > Properties method?

I've never really used the RightClick option, so I don't know how complete it might be.  I believe that it will work on Doc files, maybe also PDFs, so you probably would want to use it for those file types.  But otherwise exiftool has powerful batch abilities built in and the command I listed above will clear data from most images and MP4/Mov files in one shot.

QuoteCan we run the cmd by dragging the external HD's directory over to the cmd box

Yes, any directory can be used.

QuoteIf we sanitize, then we copy over to the external HD or we copy over to Linux, isn't that going to create metadata by itself (eg, now "computer" is the brand/model of external HD or whatever)?

No.  Copying files does not edit them and put metadata in the files.  There may be some file system data, but that type of data does not survive being uploaded through the net and it's unlikely to have identifiable data.

QuoteWould it be wiser to erase the metadata on the end target computer?

It would be quicker to copy the files, securely erase the originals, and then you can erase the metadata at your leisure.

QuoteIs this easier on Linux or harder?

I wouldn't know.

Is bash/shell the same as terminal using sudo or similar?

Yes.

QuoteSo if we have H.265 files then we don't have to worry only if we have RAW type files?

You don't have to worry about MP4/Mov files.  Exiftool will strip metadata from those.  MKV files, which would be rare in these circumstances, would be a different story.

QuoteHow would one sanitize multiple RAW files safely?

That depends, but I wouldn't worry about it unless you come across a RAW file.  See this wikipedia page for a list of RAW file types.

QuoteDoes the R Click method of Win7 erase the pdf metadata or not?

I do not know.

QuoteWhen I run that cmd, will exiftool simply ignore the pdf?

Exiftool will basically hide the data, which could be recovered by someone with some expertise.

QuoteIf pdfs don't have that type of data then I guess that would be ok but I can't imagine that they wouldn't.

PDFs downloaded from the net won't have any personally identifiable data.  PDFs created by, for example, exporting from a doc file in MS Office, might.

QuoteHow do I just have the output pdf overwrite the input one?

The qpdf program cannot do this.  This is pretty common in a lot of command line programs.

QuoteWould those require a separate special metadata remover?

I don't know.  That link was just a quick google search.  You may have to do a deeper dive to find what you need.

QuoteIs there a free or FOSS type program that will take care or automate most of what you are discussing (eg, just drag into a sandbox type program which will sanitize all files including pdfs and non pdfs safely and securely?)?

Very rarely would there be a single program to do this, as different file types are created in completely different ways.  That said, for linux I did see  some mentions of Metadata Anonymisation Toolkit, though I don't have any experience with them.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).