Removing HierarchicalSubject based upon the final keyword

Started by StarGeek, March 30, 2016, 06:05:52 PM

Previous topic - Next topic

StarGeek

A few weeks back I trapped myself into a more complex command than what was needed.  This is about the original command that trapped me and I'm interested to see if anyone can figure out a simpler way.

The command is designed to remove an item from Keywords, Subject, and HierarchicalSubject.  I use it with a text replacement program and the %c is replaced with whatever text is in the clipboard.

exiftool -P -overwrite_original -keywords-="%c" -subject-="%c" -sep "##" "-HierarchicalSubject<${HierarchicalSubject;my $needle=quotemeta('%c');$_= join('##',grep(!/(^|\|)${needle}$/, split/##/))}"

I tried for a long time to remove a HierarchicalSubject item with regex but I couldn't come up with a useful expression.  It would always remove more than I wanted or leave trailing or leading separators.

Currently, there are two problems.  First is if there are double or single quotes in the keyword which break the command under Windows.  Those keywords would require more explicit commands.  It's not too big of a problem for me as I rarely use double quotes in keywords and that just leaves single quotes. 

Second, it'll rewrite the HierarchicalSubject even if it doesn't need to be rewritten.  There are only two fixes I can think of for this right now.  One would be to split it into two with an -execute after the keyword and subject removal and use an -if to try and limit the command to only matching HierarchicalSubjects.  But that gets blocked by keywords with regex meta characters like dots and slashes and requires two passes through the files.  The other would be to expand upon the HierarchicalSubject code by saving the current HierarchicalSubject and checking to see if there was any change.  That makes things more complex, but might be the better solution, changing it to something like "-HierarchicalSubject<${HierarchicalSubject;my $orig=$_;my $needle=quotemeta('%c');$_= join('##',grep(!/(^|\|)${needle}$/, split/##/));$_=undef if ($orig eq $_)}".  I'm just thinking that through right now and haven't significantly tested it, though it seems to work.

Anyone have any better ideas?
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Hi StarGeek,

I think there must be a simpler way, but I don't know the exact format of the items in HierarchicalSubject.  Can you give me a few examples, including edge cases, for me to play with?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

From what I understand, each entry in the list is just a string that has each branch of the hierarchy separated by a pipe character.   Here's a test case, where I would want to remove each of the "Target Keyword" hierarchies, including the last one which has no hierarchy.
Hierarchical Subject            : Top Level|Mid Level|Target Keyword, Top Level|Another Mid Level|Another Keyword, Another Top Level|Target Keyword, Another Top Level|Another Mid Level|YA Keyword, Target Keyword

(easier to read list)
Top Level|Mid Level|Target Keyword
Top Level|Another Mid Level|Another Keyword
Another Top Level|Target Keyword
Another Top Level|Another Mid Level|YA Keyword
Target Keyword


In actuality, multiple duplicate final keywords (leaf keywords seems to be the term) are probably rare, but I was trying to account for any possibility.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Thanks.  OK this seems to work for me:

exiftool -sep ## -p "${hierarchicalsubject;s/(##|^)(.*?\|)?\QTarget Keyword\E(\|.*?)?(##|$)/##/g ? s/(^##|##$)//g : ($_ = undef)}" FILE

The \Q \E around Target Keyword will escape any meta characters, like you were doing with quotemeta before.

It should return nothing if the target keyword isn't found, thus not changing files that don't contain the target keyword when copying this tag.

One problem I can see is that HierarchicalSubject will be set to an empty string if it contained only items with the target keyword.

But I'm not sure if this will help with your problem of quotes in the keyword.  If not, then somehow you'll have to escape them in your %c input before using it in the ExifTool command line.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Phil Harvey on March 31, 2016, 07:46:14 AM
The \Q \E around Target Keyword will escape any meta characters, like you were doing with quotemeta before.

The reason I went with quotemeta() was because \Q \E didn't work with slash, backslash, and pipe characters, at least on Windows.

This was my original attempt to search for exact Subject command:
c:\>exiftool -subject X:\!temp\Test3.jpg
Subject                         : Special F/X, Random, Word

c:\>exiftool -sep "##" -if "$subject=~/(##|^)(\QSpecial F/X\E)($|##)/" -subject X:\!temp\Test3.jpg
    1 files failed condition

c:\>exiftool -sep "##" -if "$subject=~/(##|^)(\QRandom\E)($|##)/" -subject X:\!temp\Test3.jpg
Subject                         : Special F/X##Random##Word


and then my replace Subject command (case insensitive)
c:\>exiftool -subject X:\!temp\Test3.jpg
Subject                         : Special F/X, Random, Word

c:\>exiftool -sep "##" -P -overwrite_original "-subject<${subject;s/(##|^)(?:\QSpecial F/X\E)(?=($|##))/$1Special FX/ig}" X:\!temp\Test3.jpg
Warning: Unmatched ( in regex; marked by <-- HERE in m/(##|^)( <-- HERE ?:Special\ F/ for Subject - X:/!temp/Test3.jpg
    1 image files updated

c:\>exiftool -subject X:\!temp\Test3.jpg
Subject                         : Special F/X, Random, Word

c:\>exiftool -sep "##" -P -overwrite_original "-subject<${subject;s/(##|^)(?:\QRandom\E)(?=($|##))/$1Special FX/ig}" X:\!temp\Test3.jpg
    1 image files updated

c:\>exiftool -subject X:\!temp\Test3.jpg
Subject                         : Special F/X, Special FX, Word


"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

StarGeek

Quote from: Phil Harvey on March 31, 2016, 07:46:14 AM
Thanks.  OK this seems to work for me:

exiftool -sep ## -p "${hierarchicalsubject;s/(##|^)(.*?\|)?\QTarget Keyword\E(\|.*?)?(##|$)/##/g ? s/(^##|##$)//g : ($_ = undef)}" FILE

This command made me realized there was another test case to add.  Cases where the target keyword is in the hierarchy, not just the final keyword.  For example, if Error Test|Target Keyword|Leaf Keyword is part of the hierarchy, it is also removed, which isn't what I want.  But it looks like that can be fixed by removing the (\|.*?)?.

The addition of s/(^##|##$)//g fixes the problem I was having with leading and trailing separators.  I was always trying for a single regex that wouldn't leave those behind.

I'll still have to go with QuoteMeta because Windows  >:(  And I can deal with quote problems on an individual basis.

So this is what I'll start testing
"-HierarchicalSubject<${HierarchicalSubject;my $needle=quotemeta('%c');s/(##|^)(.*?\|)?${needle}(\|.*?)?(##|$)/##/g ? s/(^##|##$)//g : ($_ = undef)}"

Many thanks
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Glad I could help.  You might try quoting '%c' differently to see if it helps with your single quote problem:

"-HierarchicalSubject<${HierarchicalSubject;my $needle=quotemeta(q(%c));s/(##|^)(.*?\|)?${needle}(\|.*?)?(##|$)/##/g ? s/(^##|##$)//g : ($_ = undef)}"

However, double quotes (and maybe backslashes) will still likely cause problems.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).