Batch write truncated filename to IPTC caption-abstract

Started by grole, December 22, 2016, 06:08:09 PM

Previous topic - Next topic

grole

Hello,

I've already learned a command to batch write the file name of a jpg to the caption (or caption-abstract as it appears to be called) field:

exiftool "-iptc:caption-abstract<filename" FILE

What I now need is a way to only write part of the filename.

I have a number of photos with 5 digit filenames such as "12345.jpg". Some of these files have an additional caption in the filename, such as "12345-caption text.jpg".

What I want is, of course, only the "caption text" and not the rest of the filename. So what I need is a way to truncate the first 5 digits, or everything before and including the "-", plus the extension. Those files without text would have nothing written, or just blank.

Thanks in advance!


StarGeek

Try this
exiftool -if "$filename=~/-/" "-Caption-Abstract<${filename;s/^\d{5}-//;s/(.*)\.[^.]+$/$1/}" FILE

The if checks to see if there is a dash in the filename.  If there is, that means the file has text to be written to Caption-Abstract.  Then, it will take the filename, remove the five digits and the dash from the beginning, remove everything from the last dot to the end, and copy the rest to Caption-Abstract.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

grole

That's great, thanks! Just what I'm looking for.

Didn't quite work yet though. I get the following warning:

Warning: [minor] Tag '1' not defined - f:temp/02465-Workshop about 1986.jpg
Warning: No writable tags set from f:temp/02465-Workshop about 1986.jpg


Command as entered:

exiftool -if "$filename=~/-/" "-Caption-Abstract<${filename;s/^\d{5}-//;s/(.*)\.[^.]+$/$1/}" f:temp -overwrite_original

StarGeek

Ah, sorry, my mistake.  The $/ near the end is interpreted as a new line.  Usually I remember that.  Two ways to fix it.  Either double the dollar sign or put it in parenthesis.  Either of these should work.

exiftool -if "$filename=~/-/" "-Caption-Abstract<${filename;s/^\d{5}-//;s/(.*)\.[^.]+$$/$1/}" f:temp -overwrite_original
exiftool -if "$filename=~/-/" "-Caption-Abstract<${filename;s/^\d{5}-//;s/(.*)\.[^.]+($)/$1/}" f:temp -overwrite_original
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

grole

Hmmm no dice :(. Still the same warning. I tried both $$ and ($), same result.

I'd troubleshoot myself but I'm not familiar wtih this syntax at all.

Thanks!

StarGeek

What version of exiftool? exiftool -ver to find out. 
I'm assuming this is on windows?

I just double checked and my original command and the command with the parenthesis worked properly (double $ didn't).

C:\>exiftool -if "$filename=~/-/" "-Caption-Abstract<${filename;s/^\d{5}-//;s/(.*)\.[^.]+$/$1/}" "X:\!temp\12345-Test-Copy.jpg"
    1 image files updated

C:\>exiftool -caption-abstract "X:!temp\12345-Test-Copy.jpg"
Caption-Abstract                : Test-Copy


The syntax is exiftool advanced formatting, which in this case is Perl regex substitution.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

To explain the $/ confusion:

Expressions which allow tag names with a preceding dollar sign (eg. arguments of -if and -p, and the STR of -TAG<STR) interpret $/ as a newline.  The advanced formatting expression doesn't allow tag names, so no pre-processing is done by ExifTool and $/ isn't special (ie. the expression is standard Perl).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

grole

It worked! I was using 8.74 on windows 7. I upgraded to 10.37 and it still wouldn't work, said the file xxx could not be read. I then moved them onto a local disk, as I was working on a USB HDD, and then it worked.

That will help me out for now, but I may need to modify the command for other situations. If I've understood it correctly, I could probably change the trigger of the IF condition by simply changing the first hyphen? For example if it was a blank instead. In that case I'd want to also remove the first x digits, let's say 7 this time, plus the blank. Would this work, or what is the syntax for a blank?

-if "$filename=~/' '/" "-Caption-Abstract<${filename;s/^\d{7}' '//;s/(.*)\.[^.]+($)/$1/}"

Thanks again!

StarGeek

Close.  No need for the single quotes.  A space is just a space.

-if "$filename=~/ /" "-Caption-Abstract<${filename;s/^\d{7} //;s/(.*)\.[^.]+($)/$1/}"

To break it down, there are two substitutions, separated by the semi-colon.  Perl regex substitutions take the pattern following the first slash and replace it by what's after the second slash (though there are other characters you can use beside slashes)
s/^\d{7} //
^ - Matches the start of the string
\d - Any Digit character (0-9)
{7} - The previous character, exactly 7 times.  In this case, 7 of any digit.
Then a single space.  Not as important for a filename, but there is also \s, which is any whitespace character, Space, Tab, New Line, Carriage Return, and Form Feed

The second substitution is more complex
s/(.*)\.[^.]+($)/$1/
The parens indicate we're going to capture the match that is between them. 
.*  The dot indicates any single character (except a New Line, but there's a mod for that).  The Asterisk means the previous character is matched 0 or more times, as many times as possible.  This pattern is sorta like a simple asterisk wildcard in Windows, but much more greedy.  For example, if you had a string "ABCDABCDA" and used A.*A as the pattern, the .* would match "BCDABCD", not "BCD".
\. Escapes the dot so it's matched literally
[^.]+ Brackets indicate a group of characters.  The caret inside the brackets indicate a negative, we don't want to match the characters in the brackets.  So the brackets in this case will match any non-dot character.  The + is similar to the asterisk, except it will match 1 or more characters.  This is what is stripping the extension off
The dollar sign matches the end of the string, opposite of the Caret.
The $1 in the second half takes the value that was captured between the parens in the first half as the replacement.

Regex is a complex subject.  The tutorial site I recommend is Regular-Expressions.info and you can test patterns out at Regex101.com.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

grole

Thanks for the explanation. It confirms that I'd be way over my head in Pearl! However I'll hang onto this for future reference.

Thanks, and Happy Holidays!