PDF Batch Metadata Editor

Started by MrMills, December 01, 2019, 01:05:59 PM

Previous topic - Next topic

MrMills

Hi - I scan in a lot of old magazines (which I keep) to PDF for easy reading and use Calibre to catalogue and read them. I like to keep the metadata in the files correct because that's what Calibre uses to catalogue the files. I've been looking for a batch PDF metadata editor and there doesn't seem to be one anywhere so I'm trying to write one myself in Python using PyExifTool as a Python wrapper to the Exiftool application.

I am no expert on programming so I'm finding this a challenge. So far, I can read and display the metadata from files which are listed in a csv file.

import exiftool
import csv

files = csv.csv2sequence('metadata.csv')
filenames = [file[0] for file in files]

print(filenames)

with exiftool.ExifTool() as et:
    metadata = et.get_metadata_batch(filenames)
    for key,value in metadata[0].items():
        print(key,":",value)


I would like to use the CSV file to hold the 'Title', 'Author', 'Created with', 'Produced by' values which I can then apply to the PDF files in a batch.

Ideally, I would like to use a GUI for this but that's a long way off for me!

Can anyone help me to achieve this goal?

Thank you so much for a great tool.

StarGeek

This will apply to using exiftool on the command line, as I have no experience with PyExifTool.

Part of the problem is figuring out the actual tag names are, because different programs call the tags different things.  Author and Title are pretty straight forward, but "Created with" and "Produced by", not so much.  My guess is that those are the Creator and Producer tags.  See PDF Tags for the full list of PDF tags.

The best thing to do is to find a file that has all the tags you are looking for and then running exiftool -a -G1 -s File.pdf on the file (see FAQ #3). That will list all the tags and you can look for the tags names that match the data.

To create the CSV file, the first line needs to be the names of all the tags you want to write, with the first column named "SourceFile".  That column needs to be filled with the file path/names.  They can be absolute or relative to the current directory.  As an example
SourceFile,Title,Author,Creator,Producer
/path/to/File.pdf,Great Title,Great Writer,MrMills,Adobe Acrobat


You would then run to copy the data from the csv file to the pdf files
exiftool -csv=/path/to/File.CSV /path/to/PDFDir
This command would create backup files which can be suppressed with the Overwrite_Original option.  You can recurse into subdirectories with the -r (recurse) option.  See the -csv option and FAQ #26 for more details.

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).