How to include multiple regular expressions in a ValueConv expression?

Started by CWCorrea, June 20, 2012, 03:48:37 PM

Previous topic - Next topic

CWCorrea

Hello Phil,

First, thank you very much for such fine application as ExifTool. Now I cannot imagine my life without it. Second, please bear with me as I'm totally new to Perl (actually I started learning about regular expressions and Perl just because ExifTool), and third, please excuse me for the long post.

Now, this is what I would like to do:

I have several thousand pictures taken with different cameras and I would like to sort them by camera make and model. I reviewed the ExifTool documentation and several examples in the ExifTool Forum and I understand how to do it. As several pictures have the Model tag with not-valid characters for my filesystem (btw, I use OS X Snow Leopard), I decided to create an user defined tag called MyModel with a regex that deletes any character not valid for me (I used one example that you gave us in the forum).

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        MyModel => {
            Require => 'Model',
ValueConv => '$val =~ s/[^A-Za-z0-9\-\_\.\,\(\)\ ]//g; $val',
        },
    },
);
1;  #end


Using the MyModel tag works well, but I don't like the end result so I decided to find a way to process the Model tag contents to suit my taste and needs. For this, I created a list of unique models with the following command:

exiftool -s -r -T -Model . | sort -u > models.txt

Then I wrote a small Perl script to apply some regular expressions to the list of camera models:

#!/usr/bin/perl -w
#

use strict;
use warnings;

my $InFile;
my $argnum;

foreach $argnum (0 .. $#ARGV) {
$InFile = $ARGV[$argnum];
open FILE, $InFile or die $!;
while (<FILE>) {
my($model) = $_;
chomp($model);
#
# Remove non-printable characters
#
$model =~ s/[^[:print:]]+//g;
# Remove excess horizontal and vertical whitespace
# e.g.: "DC200      (V01.00)" gets transformed into "DC200 (V01.00)"
#
$model =~ s/[\h\v]+/ /g;
#
# Remove whitespace from the start and end of the string
# e.g.: " PDC 5350" gets transformed into "PDC 5350"
#
$model =~ s/^\s+//;
$model =~ s/\s+$//;
#
# Replace slash and underscore characters with hyphen-minus
#
$model =~ tr{/_}{--};
#
# Removes characters different from [A-Z],[a-z],[0-9],'-','.',',','(',')' and space
# e.g.: "PENTAX *ist DL" gets transformed into "PENTAX ist DL"
#
$model =~ s/[^A-Za-z0-9\-\.\,\(\)\ ]//g;
#
# If after transformation the $model variable is empty give it the value "Unknown model"
#
if ($model eq '') {
$model ='Unknown model';
}
#
# For some Hewlett Packard cameras, cleans strange characters after (Vdd.dd)
# e.g.: "HP Photosmart M22 (V01.00) +ëÕKÄ" gets transformed into "HP Photosmart M22 (V01.00)"
#
$model =~ s{(\(V\d\d\.\d\d\)).*}{$1};


print "$model\n";
}
}
close FILE or die $!


Finally I got the results I was looking for, but now I have seven regular expressions and one conditional statement to work with.

My question is: how can I include multiple regular expressions and the conditional in a ValueConv expression in the MyModel user defined tag?

I guess that the solution is to use a code reference like ValueConv => sub { } just like the one used in the BigImage tag example in the sample .Exiftool_config file. Unfortunately I still do not find a way to correctly include my Perl code into the code reference for ValueConv.

I would appreciate any suggestion on this. Thank you!

Kind regards,

Christian W. Correa

Phil Harvey

Hi Christian,

Looking good so far.  I admit the user-defined tag documentation isn't very well organized.

The easiest way is like this:

ValueConv => q{
    my $model = $val[0];
    # place arbitrary regular expressions and other Perl code here
    ...
},


Using a code reference is also possible:

ValueConv => sub {
    my $val = shift;
    my $model = $$val[0];
    # place your code here
    ...
},


however, with the 2nd method the formatted values are not immediately accessible. (Although this shouldn't matter for you because there is no print conversion for the Model tag.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

CWCorrea

Hello Phil,

Thank you for your prompt answer. I feel like I've discovered a gem!

Here is my implementation on the MyModel tag using your first method, it really works great! Maybe others will find it useful. I included my comments so newbies like me can understand what each regular expression does:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
#
# Cleans Model text
#
        MyModel => {
            Require => 'Model',
ValueConv => q{
my $model = $val[0];
#
# Remove non-printable characters
#
$model =~ s/[^[:print:]]+//g;
# Remove excess horizontal and vertical whitespace
# e.g.: "DC200      (V01.00)" gets transformed into "DC200 (V01.00)"
#
$model =~ s/[\h\v]+/ /g;
#
# Remove whitespace from the start and end of the string
# e.g.: " PDC 5350" gets transformed into "PDC 5350"
#
$model =~ s/^\s+//;
$model =~ s/\s+$//;
#
# Replace slash and underscore characters with hyphen-minus
#
$model =~ tr{/_}{--};
#
# Removes characters different from [A-Z],[a-z],[0-9],'-','.',',','(',')' and space
# e.g.: "PENTAX *ist DL" gets transformed into "PENTAX ist DL"
#
$model =~ s/[^A-Za-z0-9\-\.\,\(\)\ ]//g;
#
# If after transformation the $model variable is empty give it the value "Unknown model"
#
if ($model eq '') {
$model ='Unknown model';
}
#
# For some Hewlett Packard cameras, cleans strange characters after (Vdd.dd)
# e.g.: "HP Photosmart M22 (V01.00) +ëÕKÄ" gets transformed into "HP Photosmart M22 (V01.00)"
#
$model =~ s{(\(V\d\d\.\d\d\)).*}{$1};
return $model;
},
        },
},
);

1;  #end


On a related topic, I think it would be interesting to have a section in the forum where users can share and discuss code snippets and recipes to do interesting things with ExifTool. That would be a great way to empower users so we can learn by example and help increase the knowledge of the ExifTool user community.


Christian W.

Phil Harvey

Hi Christian,

Glad that worked.  Thanks for posting your config file.

We tried having a "Solutions" board in this forum, which is close to what you suggested, but nobody ever posted there.  What you suggest is maybe better suited to a Wiki, but organizing the Wiki in a useful way would take some effort.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).