Regex with optional match does not work

Started by brightwolf, May 11, 2021, 04:38:40 PM

Previous topic - Next topic

brightwolf

#15
Hi Luuk, actually my solution was not really working performance-wise and I am back at your solution. There's only one problem left and that's when there's no additional word in the filename, in that case the filename gets copied to the personinimage tag.

For example:
Holidays/Egypt 1997/Me Wife IMG_1099.JPG --> subject=Holidays, Egypt, 1997 personinimage=Me, Wife
Holidays/Egypt 1997/Me Wife (Pyramids) IMG_1099.JPG --> subject=Holidays, Egypt, 1997, Pyramids personinimage=Me, Wife
Holidays/Egypt 1997/IMG_1099.JPG --> subject=Holidays, Egypt, 1997 personinimage=IMG_1099.JPG

When ran from my iMac it works with the \s+ addition, but that's veeeeeeery slow since the files are on my NAS.
When ran from the NAS it works the way I explained in the example.

How could I work around this problem?
[EDIT] Regex is driving me crazy. It *does* seem to work, also on my NAS. I must have made some other (unintended) change.
So this is the final, working, command (credits and big thanks to luuk2005): exiftool -r -m -overwrite_original -AddTagsFromFile @ -api listsplit='[ /]' -if '$Directory=~/Photo/' -Subject'<${Directory;s|.*Photo.*?/||}${Filename;s|^[^()]*$||;s|[^()]*?\((.+?)\)[^()]*| $1|g}' -PersonInImage'<${Filename;s| *\(.*?\) *| |g;s/\s*\S*\.[^.]*$//}' DIR

Also, I wanted to ask: where can I find more information about this regex formatting? I understand parts of it, but still cannot grasp the use of s| and |g; and //. No clue. Much appreciated!
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

Luuk2005

Yes, Im was going to say that your regex with \s* conducts perfectly to make the space optional.
So Im guessing that PersonInImage=IMG_1099.JPG was left over from another experiment.

Im not know where this regex documentation is, but thinking 'perl' does probably give the best explanations.
But for substitutions with regex, this some different formats that the exiftool does seem to grant...
  s(match)(replace)modifiers;
  s/match/replace/modifiers;
  s|match|replace|modifiers;   (except | can be many non[a-z0-9] characters!)

Im learned most regex from conducting the experiments with sed.exe, so Im always preferring the s///; format.
But sometimes Im use s|||; when I need put / in my match, because its better than using \/ for the eyesight.

The 'g' modifier means "global", but global is just synonym for meaning "all", so replacing all-matches.
There is also 'i' for "case-insensitive" and 'r' for "respect $_" (I dont know the synonym, but the english is like "dont modify $_" )
Windows8.1-64bit,  exiftool-v12.92(standalone),  sed-v4.0.7

brightwolf

Regarding this command and its regex' I have one  more question: how could I enhance it to ignore words between brackets?

For example:
/Users/me/Photos/Holidays/Egypt 1997 [comment]/Me Wife (Pyramids) IMG_1099.JPG
Would yield Subject: Holidays, Egypt, 1997, Pyramids; and PersonInImage: Me, Wife; But not: comment

I have tried various versions of the command, including s|\[.*/]*| |; and s|^[^\[\]]*$| |; and s|\[.*\]| $1|g; but I cannot get it to work: the comment keeps on appearing as a keyword.

Any suggestions to point me in the right direction?
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

Luuk2005

Your first expression was good, except the typo with / instead of \ right before ] so otherwise probably conducts perfectly!
So if not wanting to match [words] in $Directory, I would make the second s///; like... s/ *\[.*?\] */ /g; 
So this could replace all of the [words] with just one space.

But with having [word1] [word2] or [lastword], you can add some final s///; 's like...  s/ +/ /g; s/ $//g;
So this could fix ManySpaces-->1Space; and remove the trailing space if $Directory ends with some [lastword].

So to present your 'keywords' coming from $Directory, you could experiment like...
-p '$Directory --- ${Directory; s|.*Photo.*?/||; s/ *\[.*?\] */ /g; s|[/ ]+| |g; s/ $//g}'
(The third s||| is also replacing "/" with a "space" for the eyesight)

So if that presents ok, then a command like...
exiftool -r -m -overwrite_original -AddTagsFromFile @ -api listsplit=' ' -if '$Directory=~/Photo/' 
-Subject'<${Directory;s|.*Photo.*?/||;tr|/| |;s/ *\[.*?\] */ /g;s/ +/ /g;s/ $//}${Filename;s|^[^()]*$||;s|[^()]*?\((.+?)\)[^()]*| $1|g}' 
-PersonInImage'<${Filename;s| *\(.*?\) *| |g;s/ *\S*\.[^.]*$//}' DIR


Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Deep/Deepest/abc.jpg
                    Subject: Holidays, Egypt, 1997, Deep, Deepest
       PersonInImage:

Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Me (eating) Wife (reading) Joe (Pyramids) IMG_1099.jpg
                    Subject: Holidays, Egypt, 1997, eating, reading, Pyramids
        PersonInImage: Me, Wife, Joe

Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Me Wife (Pyramids) IMG_1099.jpg
                    Subject: Holidays, Egypt, 1997, Pyramids
        PersonInImage: Me, Wife

Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Pyramids Me Wife IMG_1099.jpg
                    Subject: Holidays, Egypt, 1997
        PersonInImage: Pyramids, Me, Wife
Windows8.1-64bit,  exiftool-v12.92(standalone),  sed-v4.0.7

Luuk2005

Im forget to include some s/// for $PersonInImage to destroy 'BadWords' in the filename (like 'Pyramids' in the last example).
The first post has "Pyramids Me Wife IMG_1099.JPG", so Im guessing that $PersonInImage should never match Pyramids????

So if needed, this can be a way to destroy 'BadWords' in the filename for $PersonInImage...
exiftool -r -m -overwrite_original -AddTagsFromFile @ -api listsplit=' ' -if '$Directory=~/Photo/' 
-Subject'<${Directory;s|.*Photo.*?/||;tr|/| |;s/ *\[.*?\] */ /g;s/ +/ /g;s/ $//}${Filename;s|^[^()]*$||;s|[^()]*?\((.+?)\)[^()]*| $1|g}' 
-PersonInImage'<${Filename;s| *\(.*?\) *| |g;s/\b(Pyramids|Egypt|BadWords)\b//g;s/ +/ /g;s/(^ | $)//g;s/ *\S*\.[^.]*$//}'  DIR

So then conducting like....
Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Deep/Deeper/Deepest/aaaa.jpg
                  Subject: Holidays, Egypt, 1997, Deep, Deeper, Deepest
      PersonInImage:

Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Me (eating) Wife (reading) Joe (Pyramids) IMG_1099.jpg
                  Subject: Holidays, Egypt, 1997, eating, reading, Pyramids
      PersonInImage: Me, Wife, Joe

Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Me Wife (Pyramids) IMG_1099.jpg
                  Subject: Holidays, Egypt, 1997, Pyramids
      PersonInImage: Me, Wife

Photos/Holidays/[xxx]Egypt[xxx]1997 [xx1] [xx2]/Pyramids Me Wife IMG_1099.jpg
                  Subject: Holidays, Egypt, 1997
      PersonInImage: Me, Wife

There might be much better ways to conduct this using other perl commands, but all Im really know is the s/// and tr///.
I know there is some perl commands like split() but Im not good enough to depend on them, so always using s/// instead.
This using a whole lot of s///; so Im thinking there might be some better ways, especially for the eyesight.
Windows8.1-64bit,  exiftool-v12.92(standalone),  sed-v4.0.7

brightwolf

Thanks very much, Luuk2005! Your suggestion works like a charm.

My final command is now:
exiftool -r -m -overwrite_original -AddTagsFromFile @ -api listsplit='[ /]' -if '$Directory=~/Photos/' -Subject'<${Directory;s|.*Photos.*?/||;tr|/| |;s/ *\[.*?\] */ /g;s/ +/ /g;s/ $//}${Filename;s|^[^()]*$||;s|[^()]*?\((.+?)\)[^()]*| $1|g}' -PersonInImage'<${Filename;s| *\(.*?\) *| |g;s/ *\[.*?\] */ /g;s/\s*\S*\.[^.]*$//}' DIR
Photographer. Hobbyist. Using iMac to manage photos, with Photo Mechanic Plus, Pixelmator, Luminar, and exiftool. Longing back to Aperture sometimes.

Luuk2005

Nice work! I didnt even realize that you needed to destroy [words] along with (words) inside the filename for $PersonInImage.
And my last -PersonInImage could never remove any trailing spaces with $, because Im forgetting about the file-extension!
So really, with also removing [words] for $PersonInImage, the -PersonInImage should change like...

exiftool -r -m -overwrite_original -AddTagsFromFile @ -api listsplit=' ' -if '$Directory=~/Photo/'
-Subject'<${Directory;s|.*Photo.*?/||;tr|/| |;s/ *\[.*?\] */ /g;s/ +/ /g;s/ $//}${Filename;s|^[^()]*$||;s|[^()]*?\((.+?)\)[^()]*| $1|g}'
-PersonInImage'<${Filename;s/\.[^.]*$//;s| *\(.*?\) *| |g;s/ *\[.*?\] */ /g;s/\b(Pyramids|Egypt|BadWords)\b//g;s/ +/ /g;s/(^ | $)//g;s/ *\S*$//}' DIR

The first s/// removes the extension, so then letting $ conduct properly, and the last s/// doesnt worry about the extension.
The underlined s///'s could fix any troubles coming from "[word1] [word2]" or [words] at the beginning or end of a filename.
So for example, if having any words like "IMG_1099(xxx).JPG", they could never set a 'keyword' for $PersonInImage.
Also I forgot to describe that tr|/| |; converts all "/" --> space, so then listsplit only needs the space.
Windows8.1-64bit,  exiftool-v12.92(standalone),  sed-v4.0.7