ExifTool Forum

General => Metadata => Topic started by: Coro on August 27, 2021, 07:52:31 PM

Title: Hierarchical Keywords structure ideas (parent / child options)
Post by: Coro on August 27, 2021, 07:52:31 PM
I have just started the daunting process of preparing to tag our personal family photo collection of some 75k photos!

I am not sure which way to go about naming the final tag in the branch, so as to avoid having duplicates, so would be interested to hear what results others have had in the long term with various methods...

I started off following advice that it would be a good idea to include the parent keywords in the flattened keyword fields(tags) also...
So although the HierarchicalSubject tag contains "People|Family|Our Family|Child1", the general keyword tags (non hierarchial) also contain all the parents: "People, Family, Our Family, Child1".
This was a recommended method, as then you could then search keyword "Our Family", and all children will be included.
Let's call this "Idea A", to make it easy for anyone replying to a particular point here.

Idea B:
I decided to not include parent keywords, as it seemed to clutter the keyword tags (some software also duplicated the parent tags, so they appeared twice in the flattened list), and it would appear that most software these day's can read the HierarchicalSubject tag, and "rebuild" the heierachy from there anyway (I am saving tags within the files with future compatibility in mind).

I am open to advice on Idea A vs Idea B. However, assuming I am going with idea B, I have 2 theories about how to name the final tag in a branch. Here are 2 situations where this becomes an isssue....

Idea C
Occasions|Birthdays|Child1|1st Birthday
Occasions|Birthdays|Child1|2nd Birthday
Occasions|Birthdays|Child2|1st Birthday
Occasions|Birthdays|Child2|2nd Birthday

People|Friends|Smiths|Peter
People|Friends|Williams|Peter
(Smith and Williams are the surnames)

Pros: Looks tidy/easy to read when navigating the tag tree
Cons: Only the final tag in the branch gets included in the flattened keyword tags, so there will be lots of non unique duplicates, which are actually different

Idea D
Occasions|Birthdays|Child1|1st Birthday Child1
Occasions|Birthdays|Child1|2nd Birthday Child1
Occasions|Birthdays|Child2|1st Birthday Child 2
Occasions|Birthdays|Child2|2nd Birthday Child 2

People|Friends|Smiths|Peter Smith
People|Friends|Williams|Peter Williams

Pros: Final tag is 100% unique, and does not need the parent tag to be included .
Cons: Tag tree starts looking a bit convoluted


So is Idea D even necessary due to what I mentioned above?.... (most software these day's can read the HierarchicalSubject tag, and "rebuild" the heierachy from there anyway).
There are a few specific different questions here, but I am also trying to look at the big picture as a whole, and envisage any issues I might run into down the track.

Another thought... does having spaces in your tags often cause issues or not really? Eg. should I really be using "Peter_Smith"?
Title: Re: Hierarchical Keywords structure ideas (parent / child options)
Post by: StarGeek on August 28, 2021, 10:53:26 AM
When it comes down to it, the driving force should be the program you are using and how it handles the data.  I've had to alter my hierarchy a few times over the years because of this (no problem thanks to exiftool and its batch editing ability).

Personally, I pretty much use idea D.  The major driving force was the fact that I go to comic conventions and can take over a thousand pictures per day some times and it helps to separate them a bit.  So I started off splitting them by day.  But that ended up with duplicated "Day 1", "Day 2", etc, as you mentioned.  So now I set the highest to the full name and abbreviate or shorten it lower down this list.  As an example, I hit the San Diego Comic-Con every year, so my hierarchy is
Comic-Con International->Comic-Con <YEAR>->CC <YEAR> Day 1

Also, even though I fill out Keywords/Subject with the leaf entry, I really don't use it.  The program I use to parse the data is IMatch and it is very good at displaying the hierarchy.  The base keywords is only looked at when using a program that doesn't show the HierarchicalSubject, in this case Irfanview to view the images.  I don't include every branch of the hierarchy because I feel they are less important for a quick check and would clutter the keywords.

And I have no problem with spaces in the keywords, as I find it easier to read.  Unlike filenames, where I always replace spaces with underscores, as I do a lot of things from the command line and spaces add extra steps.

But you do have to see how the programs you are using deals with things, as there can sometimes be unexpected results.  For example, in the old Picasa software, you couldn't use a comma in a keyword, as Picasa would separate keywords on that comma.  So, for example, using "Smith, John" as keyword in Picasa would end up as two separate keywords, "Smith" and "John".