unicode matching for .mp4 'Keyword' tag values 'Title Case' substitution

Started by John_Smith, December 10, 2022, 09:52:28 PM

Previous topic - Next topic

John_Smith

I am trying to do a simple 'Title Case' substitution of values in the 'Keyword' tag of an .mp4 file:

exiftool -v -P in.mp4 "-Keyword<${Keyword; s/(\w+)/\u\L$1/ug}"
It works quite ok for ASCII/English letters/words,
but not for unicode special characters like ěščřžýáíéúů etc. (These are not matched by \w and the next non-special/ASCII character following them in a word is capitalized when/while should not be.)

It seems, that ExifTool ignores/doesn't honor the /u modifier for unicode rules when pattern matching ( as defined here: https://perldoc.perl.org/perlre#/u ).

So my question is:
Is it possible to make ExifTool honor the /u modifier ?
And if not, to do the desired 'Title Case' substitution/replacement of words (with non-English special characters) in some other way ?

(I am on Windows 7, default cmd/powershell code page cp852, ExifTool 12.52)

Thank you

StarGeek

This will probably have to wait until Phil gets back, as he's currently away until sometime next week.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

From the Notes section of the ImageInfo documentation:

ExifTool returns all values as byte strings of encoded characters. Perl wide characters are not used.

Try this to use Perl wide characters:

exiftool -v -P in.mp4 "-Keyword<${Keyword; use Encode; $_=decode('utf8',$_); s/(\w+)/\u\L$1/ug; $_=encode('utf8',$_)}"
Note that it isn't strictly necessary to convert back to a byte string afterwards as I have done (since ExifTool should do this), but it is better safe than sorry.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).