Appending folder name to author metadata for PDF

Started by Hiryu, January 15, 2016, 11:50:26 AM

Previous topic - Next topic

Hiryu

Hi -- completely new to exiftool (and coding, for that matter).  Trying to accomplish batch changing metadata for a bunch of pdf files through a shell script.  Went through some of the posts and was able to cobble something together but only get halfway.  Could someone please help me -- I'm trying to also add to the script so that it will change the Author metadata of the PDF to the folder name it is currently housed in.

Really appreciate any help.

Here is what I have so far:


#!/bin/bash

IFS=$(echo -en "\n\b")

for i in $(ls *.pdf)
do
 
  exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' $i
 
done

# Restore IFS
IFS=$SAVEIFS


Phil Harvey

#1
It is much more efficient if you give the list of files to ExifTool, rather than looping in a script.  This also simplifies your script to a single command:

exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' *.pdf

Second, I would recommend using -ext pdf . instead of *.pdf for a number of reasons.

For the Author, the argument would be something like this: '-author<${directory;s(.*/)()}'

but for this to work you'll have to specify the directory by name instead of just (".") in the command (and/or add -r to recurse subdirectories).

- Phil

Edit: Fixed link
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hiryu

I'm totally new to coding, so definitely think the process I came up with is not the most efficient.

I'm putting the script into Hazel as an embedded script so that it can do this automatically.  The files get sorted to different folders so I was hoping there was a way to write the command in such a way that it draws from the specific folder name for the pdf (which will be different for each pdf).  Is that possible?  to code it so that the directory name that gets input into the author field is variable/relative for each pdf file?

thanks for the quick reply, and patience for someone who is new to this

Phil Harvey

Quote from: Hiryu on January 15, 2016, 12:19:23 PM
I was hoping there was a way to write the command in such a way that it draws from the specific folder name for the pdf (which will be different for each pdf).  Is that possible?  to code it so that the directory name that gets input into the author field is variable/relative for each pdf file?

I tried to explain how to do this in my last post.  Let me know if there is something you don't understand.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Phil Harvey on January 15, 2016, 11:53:58 AM
Second, I would recommend using -ext pdf . instead of *.pdf for a number of reasons.

The link isn't working, looks like a copy/paste error.  The kind that I do all too often.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hiryu

sorry -- I know you explained it in the last post, but I'm completely clueless about recurse -r.  Is that what you're saying would make the command use variable folder names instead of having to hard code the actual name?

Is this how I would do it (for a hard coded folder name since I don't understand how to use -r) -- where ResearchJournal is the name of the directory?

#!/bin/bash

IFS=$(echo -en "\n\b")

for i in $(ls *.pdf)
do
 
  exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' $i
  exiftool  '-author<${ResearchJournal;s(.*/)()}'
 
done

# Restore IFS
IFS=$SAVEIFS

Phil Harvey

You specify a root directory name in the command, and use -r to cause subdirectories to be processed.  The argument I gave takes the subdirectory name and writes it to the Author field for each file.

Also read common mistake number 3 in the link I posted (which is fixed now).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hiryu

OK, thanks -- yeah, I tried the link when you first responded and didn't get through.

In the back of my mind, I always knew that I was probably over-scripting, because I'm not exactly sure of what I'm doing and am just trying to learn.  I've been trying different things, but nothing like getting advice from an expert.  In your expert opinion, how would you write the command so that it isn't over scripted?  Would you mind writing it out and showing me so that I can learn?  Thanks -- not that I'm not willing to experiment and learn from trial and error, but thought I'd go to the source.

So let's say I have a directory called ResearchJournal, and then two subdirectories within that are Research1 and Research2 (respectively).  Hazel ends up sorting it so that Research1 pdfs all go into Research1 subdirectory and Research2 pdfs go into Research2 subdirectory.

I'm running the script at the Research1 subdirectory level -- I want the pdfs that go into this subdirectory to have it's metadata changed.  The Title metadata should be an exact replica of the filename (Research1 Jan 2016.pdf) minus the extension of course.  Then the author metadata should be just Research1.

I want to write it so that I can just use this for the Research 2 subdirectory as well, without hard coding the subdirectory name.

The title part works, now just figuring out the author part.  I'm all for elegance and efficiency in the shell script, but lack the knowledge, so please if you can teach me, willing to learn.  Thanks!

Phil Harvey

The exact command line depends on things you haven't told me:

1. Do you want to scan multiple directories at once?

2. If so, do you need to specify them separately, or do you want to process all subdirectories within a given directory?

3. What is the current working directory when you run the command?

4. What is the full path of the directory(s) containing the files you want to process?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hiryu

1. Not scanning multiple directories - isolated within 1 directory

2. No need to specify

3. Current working directory is Research1 (which is a subdirectory of ResearchJournal)

Dropbox/ResearchJournal/Research1/research12.pdf

So far, I am able to get the title metadata of research12.pdf to be research12
Now I want to change the author metadata to Research1

Hoping to write it so that I can just reuse the command on the following as well, without having to hardcode "Research2", I would run the script on the following folder separately from the first.

Dropbox/ResearchJournal/Research2/results12.pdf


Hiryu

I tried the hardcoding, and that works

I used exiftool -author=Research2 $i

But then I would have to change that for every folder (Plans are that there will be a lot of subdirectories in the future -- Research1, Research2, Research 3, Data1, Data 2, etc.)

Hiryu

Would using basename be possible in this situation?  I tried playing around with that but couldn't figure out the correct syntax.

Phil Harvey

Just one file?  OK, here is the command from the information you have given:

exiftool '-PDF:Title<${filename;s/\..*?$//}' '-XMP-dc:Title<${filename;s/\..*?$//}' '-author<${directory;s(.*/)()}' Dropbox/ResearchJournal/Research1/research12.pdf

But of course this won't work because you say the working directory is Research1 but you haven't given either an absolute path or a path relative to Research1.  But I hope you get the idea.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hiryu

As you surmised, that didn't work.  Sorry, what do you mean by relative paths?  I was hoping there would be a way to write the command so that it pulls the folder name in which the file (research12.pdf) is being held (which in this case would be Research1 folder)

The workflow I imagine is this

/Users/HX/Dropbox/ResearchJournal/Research1/

A pdf gets dropped into Research1 which makes the path for the file /Users/HX/Dropbox/ResearchJournal/Research1/research12.pdf
Hazel runs the script on Research1 folder, and appends Research12 to the Title metadata (since that's the name of the pdf) and appends Research1 (since that's the name of the folder holding research12.pdf)  I'm trying to get it to be relative to the location of the file in question, because if I have to type in an absolute path each time, then I might as well just use the simpler -author=name of folder

Hopefully that explains my situation more, sorry if I was unclear before.