Regex with optional match does not work

Started by brightwolf, May 11, 2021, 04:38:40 PM

Previous topic - Next topic

brightwolf

Hi there, I am using exiftool to recursively go through my directories and photo files and set the subject tag based on the words found in the directory name and set the person in image tag based on the words found in the photo filename.

The command is:
exiftool -r -m -addtagsfromfile @ '-subject<${directory;s(.*Photo.*?/)()}' '-personinimage<${filename;s(\S*\..*)()}' -api listsplit='[ /]' -overwrite_original DIR
This command will yield the keywords Holidays, Egypt, 1997 and the person in image Pyramids, Me, Wife for photo file /Users/me/Photos/Holidays/Egypt 1997/Pyramids Me Wife IMG_1099.JPG
(see my earlier post: https://exiftool.org/forum/index.php?topic=12435.msg67286#msg67286)

I now would like to enhance this command to have it match words between brackets as keywords, while still matching words that are not between brackets as person in image.

So the first step to achieve this is to have the regex in the command ignore any word between brackets (that is just before the original filename) but at the same time have it take the other words into account.
I have rewritten my command line to this:
exiftool -r -m -addtagsfromfile @ '-subject<${directory;s(.*Photo.*?/)()}' '-personinimage<${filename;s(\s*(\(.*\))?\s*\S*\..*)()}' -api listsplit='[ /]' -overwrite_original DIR

This command should yield the keywords: Holidays, Egypt, 1997; and the persons in image: Me, Wife for photo file /Users/me/Photos/Holidays/Egypt 1997/Me Wife (Pyramids) IMG_1099.JPG
(note that, once this works, I will have to further enhance the command to have it match the words between brackets once more to add them to the keywords too)

I have tested the regex on https://www.regexplanet.com/advanced/perl/index.html and it seems to work correct: I see that the words Me and Wife are extracted to $.
However, it does not work in exiftool. When I use this new command, it still extracts the keywords correctly but the personinimage tag remains empty.
I must have made a mistake somewhere, but where?

What is wrong about the regex used for the personinimage tag?
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

StarGeek

It seems to work for me
C:\>exiftool -P -overwrite_original -api listsplit="[ /]" -addtagsfromfile @ "-subject<${directory;s(.*Photo.*?/)()}" "-personinimage<${filename;s(\s*(\(.*\))?\s*\S*\..*)()}" "Y:\!temp\aaaa\Photos\Holidays\Egypt 1997\Me Wife (Pyramids) IMG_1099.JPG"
    1 image files updated

C:\>exiftool -G1 -a -s -xmp:all "Y:\!temp\aaaa\Photos\Holidays\Egypt 1997\Me Wife (Pyramids) IMG_1099.JPG"
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.25
[XMP-iptcExt]   PersonInImage                   : Me, Wife
[XMP-dc]        Subject                         : Holidays, Egypt, 1997


The only changes I would make would be to maybe use BaseName instead of Filename if you're using ver 12.22+.  That would simplify the regex so you could drop the need to worry about the extension.  Saving the text between the parenthesis will require a separate tag copy.  Something like
'-Subject<${BaseName;m/\((.*?)\)/;$_=$1}'

C:\>exiftool -P -overwrite_original -api listsplit="[ /]" -addtagsfromfile @ "-subject<${directory;s(.*Photo.*?/)()}" "-personinimage<${Basename;s(\s*(\(.*\))?\s*\S*$)()}" "-Subject<${BaseName;m/\((.*?)\)/;$_=$1}" "Y:\!temp\aaaa\Photos\Holidays\Egypt 1997\Me Wife (Pyramids) IMG_1099.JPG"
    1 image files updated

C:\>exiftool -G1 -a -s -xmp:all "Y:\!temp\aaaa\Photos\Holidays\Egypt 1997\Me Wife (Pyramids) IMG_1099.JPG"
[XMP-x]         XMPToolkit                      : Image::ExifTool 12.25
[XMP-iptcExt]   PersonInImage                   : Me, Wife
[XMP-dc]        Subject                         : Holidays, Egypt, 1997, Pyramids
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

brightwolf

Hi Stargeek, thanks for yor answer. On my iMac, it really does not work. But when I tried out the same command on my MacBook, indeed it worked. Both iMac and MacBook are using the same exiftool version (12.24). At this point I am not sure why there is a difference in outcome between the two. Is that a known issue? Anyway, for now I can proceed on my MacBook and will also take your other comments into account in further enhancing my command.
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

brightwolf

Could it be the perl version? I have compared both of my computers since they are on different version of Mac OSX too.

iMac --> v5.18.4 (OSX: Mojave)
MacBook --> 5.28.2 (OSX: Big Sur)
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

StarGeek

I would think that regex would be in the Perl core and shouldn't change so much between versions.

Try comparing the output of
exiftool -ver -v

Beyond that, there's not much more I can help with as I don't use a Mac.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

brightwolf

#5
I did that and found that the same modules are installed but all versions are different between the two.
A quick check in the perl version history tells me that in perl 5.26 the following was changed in relation to regex: New regular expression modifiers and capture groups
I think that the parentheses ( and ) are part of the capture groups and may indeed influence the outcome of this regex on my iMac which has perl 5.18.
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

StarGeek

Yeah, this part (\(.*\))? does capture (Pyramids), but that shouldn't change anything, since you don't do anything with the capture. 

Maybe change it to a non-capture grouping
(:?\(.*\))?
and see if that helps.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Luuk2005

This was very difficult to understand, because people always saying 'keywords' instead of $Subject, so its very confusing to me.
Also, Im not know anything about this newer regex version being used, because Im still using the exiftool version 12.11.
So now I think Im finally understanding that you want to sometimes match (words) inside $Filename for the $Subject ????

But in all of the commands, $Subject is only asking $Directory for words, so (Pyramids) could never be matched.
Also, Im guessing that $PersonInImage should always be prejudice against (words) inside of $Filename ????

So if the guessing is true, this some regexs to experiment with, that still work for me on the older regex version...
  -Subject'<${Directory;s|.*Photo.*?/||}${Filename;s|^[^()]*$||;s|[^()]*?\((.+?)\)[^()]*| $1|g}'
  -PersonInImage'<${Filename;s| *\(.*?\) *| |g;s/ \S*\.[^.]*$//}'
Windows8.1-64bit, exiftool-v12.11(standalone), sed-v4.0.7

brightwolf

@Luuk2005 Initially I indeed meant "Keywords" however, Phil pointed out that Subject is more often used nowadays so I switched to that. My intention, in any case, is to extract the words from the folder path and use those words as (ahem) keywords and put them in the Subject XMP tag. And for the words found in the filename, to use those to insert in the PersonInImage XMP tag. But at the same time, to use any word between parentheses in the filename as a keywords in Subject XMP tag again.

For example:
- /Holidays/Egypt 1997/Me Wife IMG_1099.JPG --> must yield Subject: Holidays, Egypt, 1997; and PersonInImage: Me, Wife
- /Holidays/Egypt 1997/Me Wife (Pyramids) IMG_1099.JPG --> must yield Subject: Holidays, Egypt, 1997, Pyramids; and PersonInImage: Me, Wife

Furthermore, I got exiftool running with perl version 5.32.1 now so I can work around the regex problems.

@Stargeek Thank you for proposing an additional tag copy. It works, but only if the photo file does have a word between parentheses. If it does not, it copies the text from the additional tag copy command literally to the Subject XMP tag. Like this:
- /Holidays/Egypt 1997/Me Wife IMG_1099.JPG --> will yield Subject: Holidays, Egypt, 1997, m, \((.*?)\), ;$_=$1; and PersonInImage: Me, Wife
- /Holidays/Egypt 1997/Me Wife (Pyramids) IMG_1099.JPG --> will yield Subject: Holidays, Egypt, 1997, Pyramids; and PersonInImage: Me, Wife

I am trying a lot of different things, but cannot work around this problem.. Do you have any suggestion?
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

StarGeek

Quote from: Luuk2005 on May 14, 2021, 02:32:45 PM
This was very difficult to understand, because people always saying 'keywords' instead of $Subject, so its very confusing to me

There's the more generic term "Keywords" or "Tags" which is what programs like Lightroom or Windows would display when entering data.  I tend to call this a Property, which I take from the fact that image metadata can be found under the Properties window.

Then there's the actual tags, Keywords and Subject.  I'll always use the to mark these so as their more distinct.

Most people who don't get into the minute details of metadata are just going to use "Keywords", as that is what it will say on the program they use to input the data.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

StarGeek

Quote from: brightwolf on May 18, 2021, 03:57:14 PM
@Stargeek Thank you for proposing an additional tag copy. It works, but only if the photo file does have a word between parentheses. If it does not, it copies the text from the additional tag copy command literally to the Subject XMP tag. Like this:

Ooops, you did make the parenthesis enclosed word optional in your original post, but I forgot about it.

Offhand I can offer this
'-Subject<${BaseName;$_=(m/\((.*?)\)/) ? $1 :undef}'
You will get a minor warning Advanced formatting expression returned undef for files that don't have the parenthesis, but that can be ignored or you can suppress it with the -m (-ignoreMinorErrors) option.
C:\>exiftool -P -overwrite_original -subject= "-Subject<${BaseName;$_=(m/\((.*?)\)/) ? $1 :undef}" Y:\!temp\aaa
Warning: [minor] Advanced formatting expression returned undef for 'BaseName' - Y:/!temp/aaa/Me Wife IMG_1099.jpg
Warning: No writable tags set from Y:/!temp/aaa/Me Wife IMG_1099.jpg
    1 directories scanned
    1 image files updated
    1 image files unchanged

C:\>exiftool -g1 -a -s -subject Y:/!temp/aaa/
======== Y:/!temp/aaa/Me Wife (Pyramids) IMG_1099.JPG
---- XMP-dc ----
Subject                         : Pyramids
======== Y:/!temp/aaa/Me Wife IMG_1099.jpg
    1 directories scanned
    2 image files read
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Luuk2005

#11
Greetings everyone! All I can say is that with exiftool v12.11, the two regexs I give, will conduct like the below descriptions.
Its unfortunate, but Im not wanting to update because afraid this new regex version might destroy some of my expressions.
With v12.11, this was my command line (except using double-quotes for Windows)


exiftool -r -m -overwrite_original -AddTagsFromFile @ -api listsplit='[ /]' -if '$Directory=~/Photo/' -Subject'<${Directory;s|.*Photo.*?/||}${Filename;s|^[^()]*$||;s|[^()]*?\((.+?)\)[^()]*| $1|g}' -PersonInImage'<${Filename;s| *\(.*?\) *| |g;s/ \S*\.[^.]*$//}' '.'


So for pathnames ending like...
Photos/Holidays/Egypt 1997/Me Wife IMG_1099.jpg
                 Subject: Holidays, Egypt, 1997
     PersonInImage: Me, Wife

Photos/Holidays/Egypt 1997/Me Wife (Pyramids) IMG_1099.jpg
                 Subject: Holidays, Egypt, 1997, Pyramids
     PersonInImage: Me, Wife

Photos/Holidays/Egypt 1997/Me (reading) Wife (eating) Joe (Pyramids) IMG_1099.jpg
                 Subject: Holidays, Egypt, 1997, reading, eating, Pyramids
     PersonInImage: Me, Wife, Joe
Windows8.1-64bit, exiftool-v12.11(standalone), sed-v4.0.7

brightwolf

Hi Luuk, thanks for your contribution! And indeed, your regex works real well and offers the added benefit that all words between parentheses are added as subject fields (keywords), no matter their position in the photo filename. With my solution, the words between parentheses had to be just before the basename; if they were not, other words following after them would be omitted.

With regard to the perl version, I did not update the version but installed a later version alongside the default version. Then, I start up exiftool as follows:
perl /usr/local/bin/perl5.32.1 exiftool ...
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

brightwolf

@Luuk2005 I am testing the regex in the exiftool command a bit more extensive now, and have come across a problem. For the personinimage tag, if the filename has a person in the filename (for example: Me Wife IMG_1099.JPG) then it works. But if there's no such name in the filename (for example: IMG_1099.JPG) then the filename (IMG_1099.JPG) gets added as the person in the image. How could I extend the regex to solve that problem? It should optionally match, if there's no name it should not match anything. Any help appreciated!
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

Luuk2005

Yes, Im was going to say changing Space into Space* to make the space optional, but \s* also conducts, nice work!
If getting more troubles from $PersonInImage, can always use the -p option for troubleshooting like...
exiftool -p '$Filename  -------  ${Filename;s| *\(.*?\) *| |g;s/ *\S*\.[^.]*$//}'  DIR

Like, if your filenames can have "bad words" like Pyramids or Egypt without parenthesis, that should never go inside $PersonInImage.
You could invent a 'bad-words list' and then test $PersonInImage like...
exiftool -p '$Filename  ------- ${Filename; s| *\(.*?\) *| |g; s/ *(Pyramids|Egypt|MoreBadWords) */ /g; s/ *\S*\.[^.]*$//; s/(^ *| *$)//}'  DIR

It might be better to invent a "good words" list instead, but really Im just trying to present the way Im often conduct the troubleshooting.
Because usually Im not smart enough to consider the troubles ahead of time, but the -p option will present them for me.
Windows8.1-64bit, exiftool-v12.11(standalone), sed-v4.0.7