Hi -- completely new to exiftool (and coding, for that matter). Trying to accomplish batch changing metadata for a bunch of pdf files through a shell script. Went through some of the posts and was able to cobble something together but only get halfway. Could someone please help me -- I'm trying to also add to the script so that it will change the Author metadata of the PDF to the folder name it is currently housed in.
Really appreciate any help.
Here is what I have so far:
#!/bin/bash
IFS=$(echo -en "\n\b")
for i in $(ls *.pdf)
do
exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' $i
done
# Restore IFS
IFS=$SAVEIFS
It is much more efficient if you give the list of files to ExifTool, rather than looping in a script. This also simplifies your script to a single command:
exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' *.pdf
Second, I would recommend using -ext pdf . instead of *.pdf for a number of reasons (https://exiftool.org/mistakes.html#M2).
For the Author, the argument would be something like this: '-author<${directory;s(.*/)()}'
but for this to work you'll have to specify the directory by name instead of just (".") in the command (and/or add -r to recurse subdirectories).
- Phil
Edit: Fixed link
I'm totally new to coding, so definitely think the process I came up with is not the most efficient.
I'm putting the script into Hazel as an embedded script so that it can do this automatically. The files get sorted to different folders so I was hoping there was a way to write the command in such a way that it draws from the specific folder name for the pdf (which will be different for each pdf). Is that possible? to code it so that the directory name that gets input into the author field is variable/relative for each pdf file?
thanks for the quick reply, and patience for someone who is new to this
Quote from: Hiryu on January 15, 2016, 12:19:23 PM
I was hoping there was a way to write the command in such a way that it draws from the specific folder name for the pdf (which will be different for each pdf). Is that possible? to code it so that the directory name that gets input into the author field is variable/relative for each pdf file?
I tried to explain how to do this in my last post. Let me know if there is something you don't understand.
- Phil
Quote from: Phil Harvey on January 15, 2016, 11:53:58 AM
Second, I would recommend using -ext pdf . instead of *.pdf for a number of reasons (http://exiftool%20'-PDF:Title<$%7Bfilename;s/%5C..*?$//%7D'%20'-XMP-dc:Title<$%7Bfilename;s/%5C..*?$//%7D'%20$i).
The link isn't working, looks like a copy/paste error. The kind that I do all too often.
Right. Thanks. It is fixed now.
- Phil
sorry -- I know you explained it in the last post, but I'm completely clueless about recurse -r. Is that what you're saying would make the command use variable folder names instead of having to hard code the actual name?
Is this how I would do it (for a hard coded folder name since I don't understand how to use -r) -- where ResearchJournal is the name of the directory?
#!/bin/bash
IFS=$(echo -en "\n\b")
for i in $(ls *.pdf)
do
exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' $i
exiftool '-author<${ResearchJournal;s(.*/)()}'
done
# Restore IFS
IFS=$SAVEIFS
You specify a root directory name in the command, and use -r to cause subdirectories to be processed. The argument I gave takes the subdirectory name and writes it to the Author field for each file.
Also read common mistake number 3 in the link I posted (which is fixed now).
- Phil
OK, thanks -- yeah, I tried the link when you first responded and didn't get through.
In the back of my mind, I always knew that I was probably over-scripting, because I'm not exactly sure of what I'm doing and am just trying to learn. I've been trying different things, but nothing like getting advice from an expert. In your expert opinion, how would you write the command so that it isn't over scripted? Would you mind writing it out and showing me so that I can learn? Thanks -- not that I'm not willing to experiment and learn from trial and error, but thought I'd go to the source.
So let's say I have a directory called ResearchJournal, and then two subdirectories within that are Research1 and Research2 (respectively). Hazel ends up sorting it so that Research1 pdfs all go into Research1 subdirectory and Research2 pdfs go into Research2 subdirectory.
I'm running the script at the Research1 subdirectory level -- I want the pdfs that go into this subdirectory to have it's metadata changed. The Title metadata should be an exact replica of the filename (Research1 Jan 2016.pdf) minus the extension of course. Then the author metadata should be just Research1.
I want to write it so that I can just use this for the Research 2 subdirectory as well, without hard coding the subdirectory name.
The title part works, now just figuring out the author part. I'm all for elegance and efficiency in the shell script, but lack the knowledge, so please if you can teach me, willing to learn. Thanks!
The exact command line depends on things you haven't told me:
1. Do you want to scan multiple directories at once?
2. If so, do you need to specify them separately, or do you want to process all subdirectories within a given directory?
3. What is the current working directory when you run the command?
4. What is the full path of the directory(s) containing the files you want to process?
- Phil
1. Not scanning multiple directories - isolated within 1 directory
2. No need to specify
3. Current working directory is Research1 (which is a subdirectory of ResearchJournal)
Dropbox/ResearchJournal/Research1/research12.pdf
So far, I am able to get the title metadata of research12.pdf to be research12
Now I want to change the author metadata to Research1
Hoping to write it so that I can just reuse the command on the following as well, without having to hardcode "Research2", I would run the script on the following folder separately from the first.
Dropbox/ResearchJournal/Research2/results12.pdf
I tried the hardcoding, and that works
I used exiftool -author=Research2 $i
But then I would have to change that for every folder (Plans are that there will be a lot of subdirectories in the future -- Research1, Research2, Research 3, Data1, Data 2, etc.)
Would using basename be possible in this situation? I tried playing around with that but couldn't figure out the correct syntax.
Just one file? OK, here is the command from the information you have given:
exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' '-author<${directory;s(.*/)()}' Dropbox/ResearchJournal/Research1/research12.pdf
But of course this won't work because you say the working directory is Research1 but you haven't given either an absolute path or a path relative to Research1. But I hope you get the idea.
- Phil
As you surmised, that didn't work. Sorry, what do you mean by relative paths? I was hoping there would be a way to write the command so that it pulls the folder name in which the file (research12.pdf) is being held (which in this case would be Research1 folder)
The workflow I imagine is this
/Users/HX/Dropbox/ResearchJournal/Research1/
A pdf gets dropped into Research1 which makes the path for the file /Users/HX/Dropbox/ResearchJournal/Research1/research12.pdf
Hazel runs the script on Research1 folder, and appends Research12 to the Title metadata (since that's the name of the pdf) and appends Research1 (since that's the name of the folder holding research12.pdf) I'm trying to get it to be relative to the location of the file in question, because if I have to type in an absolute path each time, then I might as well just use the simpler -author=name of folder
Hopefully that explains my situation more, sorry if I was unclear before.
I tried this in the terminal
echo "${PWD##*/}"
this gives me the exact folder name where the pdf (that I want to process) is residing.
so how would I integrate that and make author as the result of ${PWD##*/}? That would be exactly what I want. Please help! Thanks -- and thank you for all your help thus far.
OK, the exact command is:
exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' '-author<${directory;s(.*/)()}' /Users/HX/Dropbox/ResearchJournal/Research1/research12.pdf
With the absolute path (starts with a "/"), the working directory doesn't matter, so this command should work from any directory.
- Phil
P.S. I've moved this thread to the Newbies forum because it fits in best there.
Hey Phil -- I just tried it and nothing appeared in the author metadata tag when I open up the pdf in preview and look at tools -- show inspector (after running the command, of course)
Is there anyway to set author to "${PWD##*/}" (the current directory of the pdf on which the script is being applied to)?