Using regex with Exiftool

Started by philbond87, July 22, 2020, 10:05:24 PM

Previous topic - Next topic

philbond87

I don't think this is possible with Exiftool but I want to be sure – Exiftool is so comprehensive I don't want to assume anything.

What I want to do is take a nested folder structure of image files and create string using parts of each filename (using regex) and write that created string to a tag.
I know that Exiftool will crawl recursively but I would be surprised if could create a unique string for each file based on the filename, then allow me to apply it to a tag of that file.

The reason I ask is that I have already written an application that does this – digs through the directories, creates the unique string then makes a call to Exiftool to apply that to the tag. With a great number of large files this can take quite a long time. If it could all be done within a single Exiftool command it would be significantly faster.

Thanks!

StarGeek

You're a bit light on details, but I don't see anything there that's exiftool can't do.

As an example, assume we have a filename that has a descriptive text at the beginning and then the date, but reversed (Day Month Year).  So, for a filename of Uncle John's third wedding-22-07-2020.jpg, you could use this (assuming Windows)
"-Keywords+<${Filename;m/(.*)-(\d\d)-(\d\d)-(\d{4})/;$_=\"$4-$3-$2 at $1\"}"

Example:
C:\>exiftool -g1 -a -s -keywords "Y:\!temp\cccc\Uncle John's third wedding-22-07-2020.jpg"
---- IPTC ----
Keywords                        : Original Keyword

C:\>exiftool -P -overwrite_original "-Keywords+<${Filename;m/(.*)-(\d\d)-(\d\d)-(\d{4})/;$_=\"$4-$3-$2 at $1\"}" "Y:\!temp\cccc\Uncle John's third wedding-22-07-2020.jpg"
    1 image files updated

C:\>exiftool -g1 -a -s -keywords "Y:\!temp\cccc\Uncle John's third wedding-22-07-2020.jpg"
---- IPTC ----
Keywords                        : Original Keyword, 2020-07-22 at Uncle John's third wedding


Add the -r (-recurse) option option and let it run.  Though I'm not sure what the results would be if there wasn't a match in this case.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

philbond87

Thank you for that, @StarGeek.
How would that differ for MacOS?
I've tried to apply it to my specific case but I'm not understanding the use of multiple variables that correspond to the four different capture groups.

I'm trying to capture six digits, two or more upper case letters between underscores then another six digit string preceded by an underscore. What I've come up with is:

'-comment<${filename;m/(d{6})(\_[A-Z]{2,}_)(\d{6})/;$_=$1$2$3}' FILE

Each of those alone does find the strings I'm searching for in the filenames however my syntax is clearly not right when I combine them.

Thanks

Phil Harvey

Can you give an example file name and what you want in the comment?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

philbond87

#4
Hi Phil,

A filename might be:

123456_some-arbitrary-words_ABC_987654.jpg

What I would like to end up with in the -comment, from that filename, is:

123456ABC987654

(Although the way I built the regex expressions I realize they include underscores I was thinking I could tr/_// them out at the end.)

Phil Harvey

The problem is that your regex doesn't allow for the arbitrary words.  Try this:

'-comment<${filename;m/(d{6}).*_([A-Z]{2,})_.*(\d{6})/;$_=$1$2$3}'

I have also moved the underlines out of the capture for $2.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Ooops.  There were a couple of other problems in your expression.  This should work:

'-comment<${filename;m/(\d{6}).*_([A-Z]{2,})_.*(\d{6})/;$_="$1$2$3"}'
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

philbond87

Thanks Phil.

I see that you've added double quotes around the $ arguments. Was there something else I was missing?

However with the file:
123456_some-arbitrary-words_JN_987654.HEIC

using the code:
'-comment<${filename;m/(\d{6}).*_([A-Z]{2,})_.*(\d{6})/;$_="$1$2$3"}'

I'm getting:
0 image files updated
1 image files unchanged

StarGeek

The Comment tag is a JPG only tag.  It doesn't exist in an HEIC.

I suspect this might be a case where you need to figure out the exact tag you want to write, as many programs give different names for the actual tag.  See FAQ #3.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

philbond87

Understand, thanks.
I will chose another tag.

I just found another issue related to what I'm trying to do.
In the second sequence of 6 characters I want to include the possibility of there being one (or more) of few specific letters.

What I've come up with is (for the jpg example):

'-comment<${filename;m/(\d{6}).*_([A-Z]{2,})_.*([abcDEF]*\d{6})/;$_="$1$2$3"}'

however for the filename:

123456_some-arbitrary-words_XY_a987654.jpg

it's not pulling the 'a' out of the second sequence of six numbers, preceded by the 'a', and just returning

123456XY987654

StarGeek

Quote from: philbond87 on July 23, 2020, 12:43:25 PM
it's not pulling the 'a' out of the second sequence of six numbers, preceded by the 'a', and just returning

That's because .* is "Greedy" and you want it to be "Lazy".  Try adding a question mark
_.*?([abcDEF]*\d{6})

Is there actually the possibility of arbitrary characters between capital letter group and the final group or is it always just a single underscore?  Maybe this instead
([A-Z]{2,})_([abcDEF]*\d{6})
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

philbond87

That worked a charm. Thank you both very much.
(and no, no arbitrary characters between the uppercase letter group and final sequence group.)

Phil Harvey

Quote from: philbond87 on July 23, 2020, 12:11:56 PM
I see that you've added double quotes around the $ arguments. Was there something else I was missing?

There was also a missing "\" for the first "\d".

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

philbond87

QuoteThere was also a missing "\" for the first "\d"

Ah, right.

So it's working as I would like from the terminal.
When I call it in a shell from within my application the regex isn't working as it did in the terminal – I suspect it's because of the placement and type of quotes.

In my app the command string I send to the shell instance is this:
theCommand = "usr/local/bin/exiftool -m -progress: -overwrite_original_in_place -r '-transmissionreference<${filename;m/(\d{6}).*_([A-Z]{2,})_.*?([iwOSEV]*\d{6})/;$_'$1$2$3'}' " + theFolder.ShellPath

Do I somehow need to escape the internal quotes in some way?

StarGeek

Quote from: philbond87 on July 23, 2020, 03:55:12 PM
$_'$1$2$3'}'

The quoting here is incorrect and the equal sign is missing.  There needs to be double quotes around $1$2$3 so perl will interpolate them as variables and replace them with the contents of those variables.  See Strings in Perl: quoted, interpolated and escaped.

I would guess that the double quotes would need to be escaped with backslashes, though you would have to check the docs for the language you're using.

Alternatively, you could use the concatenation operator without quotes.
$_=$1.$2.$3
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).