unicode matching for .mp4 'Keyword' tag values 'Title Case' substitution

Started by John_Smith, December 10, 2022, 09:52:28 PM

Previous topic - Next topic

John_Smith

I am trying to do a simple 'Title Case' substitution of values in the 'Keyword' tag of an .mp4 file:

exiftool -v -P in.mp4 "-Keyword<${Keyword; s/(\w+)/\u\L$1/ug}"
It works quite ok for ASCII/English letters/words,
but not for unicode special characters like ěščřžýáíéúů etc. (These are not matched by \w and the next non-special/ASCII character following them in a word is capitalized when/while should not be.)

It seems, that ExifTool ignores/doesn't honor the /u modifier for unicode rules when pattern matching ( as defined here: https://perldoc.perl.org/perlre#/u ).

So my question is:
Is it possible to make ExifTool honor the /u modifier ?
And if not, to do the desired 'Title Case' substitution/replacement of words (with non-English special characters) in some other way ?

(I am on Windows 7, default cmd/powershell code page cp852, ExifTool 12.52)

Thank you

StarGeek

This will probably have to wait until Phil gets back, as he's currently away until sometime next week.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

From the Notes section of the ImageInfo documentation:

ExifTool returns all values as byte strings of encoded characters. Perl wide characters are not used.

Try this to use Perl wide characters:

exiftool -v -P in.mp4 "-Keyword<${Keyword; use Encode; $_=decode('utf8',$_); s/(\w+)/\u\L$1/ug; $_=encode('utf8',$_)}"
Note that it isn't strictly necessary to convert back to a byte string afterwards as I have done (since ExifTool should do this), but it is better safe than sorry.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).