Robust Invalid Character Cleaner

Started by Stephen Marsh, August 02, 2018, 11:55:51 PM

Previous topic - Next topic

Stephen Marsh

I am trying to build a robust or "bullet proof" cross platform tag cleaner, for use in situations as as:

exiftool '-filename<${pseudotag1}_TextString_${pseudotag2}.%e' -r 'path to top level folder or file'


I am aware of ${pseudotag1;} however that does not go far enough. I really only require alphanumeric hyphens underscores to be retained, all else can go.

The source tag may contain:

!@#$%^&*(){}[]'":;<>?~`|\/=+-,.

I have tried a number of regular expressions, however I have had problems with some (a new directory is created under the user account and the files are moved there).

The best that I have come up with is:

exiftool '-filename<${pseudotag1;s/[^A-z\s\d][\\\^]?//g}_TextString_${pseudotag2;s/[^A-z\s\d][\\\^]?//g}.%e' -r 'path to top level folder or file'


Which results in a filename of:

PseudoTag1[]`_TextString_PseudoTag2[]`.tiff

I can't figure out how to remove the unwanted []` in a single regex command.

Stephen Marsh

#1
OK, some search and experimentation bought me this:

${pseudotag;s/\W+//g}

I am happy to have underscores, and alpha numeric.

Phil Harvey

Hi Stephen,

I would have suggested this:

tr/-_0-9a-zA-Z//dc

What you have done will have similar results except it isn't as efficient and hyphens will also be removed.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Thank you Phil, of course your suggestion is much better!

I had a few false starts, however I finally got the syntax right:

{pseudotag;tr/-_0-9a-zA-Z//dc}

Stephen Marsh

For future reference, a similar and related topic thread that has very useful information relating to this topic can be found here:

https://exiftool.org/forum/index.php/topic,9857.0.html