Help refining command

Started by philbond87, December 01, 2020, 08:17:14 AM

Previous topic - Next topic

philbond87

I am constructing a command that uses a regex to take parts of a filename to form a string that I will insert into an xmp tag.
The filename will have a four digit string, followed by an arbitrary number of characters, an underscore, two or more uppercase letters, an underscore, one or more specific letters ('a', 'b', or 'c') then four digits. (eg. 1234_some-text_XYZ_ab1234)
What I want it to do is construct a string from:

  • the first four numbers
  • the two or more capital letters
  • the last four numbers (removing the letters that may appear before those four digits).
The resulting string (for the example above) should be:
1234XYZ1234

Here is the command that I have thus far:
'-transmissionreference<${filename;m/(\d{4}).*_([A-Z]{2,})_?([abc]*\d{4})/;$_=$1.$2.($3=~ s/a|b|c//gr)}'

It almost works...but if my string has an 'a', 'b' or 'c' in the last group (before the final 4 digits) it removes the a, b or c and everything before that, leaving only the last four digits.

The message I get from the ExifTool command is:
Warning: Use of uninitialized value $2 in concatenation (.) or string for 'filename'


I would really appreciate any help in correctly constructing this command.
Thank you!

Phil Harvey

Ah.  Your regex to remove abc from $3 resets all of the $1,$2,$3 variables.

Maybe try this:

'-transmissionreference<${filename;m/(\d{4}).*_([A-Z]{2,})_?[abc]*(\d{4})/;$_=$1.$2.$3}'

...but are you sure you want the "?" after the second "_" ?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

philbond87

Thanks Phil. That does work.

However the reason I was trying to extract the characters from the last group is because of one additional requirement (that I failed to add) –
In addition to 'a', 'b' and/or 'c' there may also be a 'z'... which I do want to keep.

So:
1234_some-additional-characters_XYZ_abz1234

Should yield:
1234_XYZ_z1234

Phil Harvey

OK, so this:

'-transmissionreference<${filename;m/(\d{4}).*_([A-Z]{2,})_?[abc]*(z?\d{4})/;$_=$1.$2.$3}'

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

philbond87

Thanks Phil.

Adding z? to the last group didn't work but adding [z?] sort of did.
However if the 'z' character doesn't appear at the end of the other optional letters it doesn't seem to be found.

Phil Harvey

If the z doesn't come last in the letters then my suggestion won't work.

I guess I don't know enough about the possibilities to give you a complete answer.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

philbond87

I really appreciate the help you've provided. It has been tremendous help!

I apologize for not providing enough clarification at the outset, though. The letters that prefix the last 4 digits – including the 'z' – may appear in any order. I just want to keep the 'z' and remove the rest.

I will continue experimenting.
Thanks, Phil

Luuk2005

#7
Greetings Philbond87. Its hard to understand, because Im not knowing if you like to keep underscores?  Also the file-extensions are unknown, so I just make regex for any 2-4chars in the extension? Im also thinking maybe you like to add -if '$Filename =~ /(\d{4})_[^_]+_([A-Z]{2,})_(.*\d{4})\.[^.]{2,4}$$/' if there can be filenames in other formats?

Please to test on command line, then to modify like you really need it, because it keeps not only z, but everything besides [abc]
exiftool -fast4 -p '$Filename     ${Filename; s/^(\d{4})_[^_]+_([A-Z]{2,})_(.*\d{4})\.[^.]{2,4}$/\1\2\3/; $_=$1.$2.$3=~s/[abc]//gr}' . 
If preferring the m// method ...
exiftool -fast4 -p '$Filename     ${Filename; m/^(\d{4})_[^_]+_([A-Z]{2,})_(.*\d{4})\.[^.]{2,4}$/; $_=$1.$2.$3=~s/[abc]//gr}' .

Edit:
If there can only be [abcz] chars between Last_ and Last4Digits ??  Improve match by changing .* into [abcz]*
If there can be any chars between 1st4Digits and 1st_ ??                 Improve match by changing ^(\d{4})_ into ^(\d{4})[^_]*_
Windows8.1-64bit,  exiftool-v12.92(standalone),  sed-v4.0.7

philbond87

Hello Luuk2005,

Thank you for looking at this. I really appreciate the assistance.

Correct, a file name will always have:

  • four digits, followed by an underscore
  • an arbitrary number of characters, ending with an underscore
  • two or more uppercase letters
  • an underscore followed by 'a' and/or 'b' and/or 'c' and/or 'z', in any order (I want to remove all of those letters, if they appear, except for a 'z')
So a sample file name might look like this:
1234_any-arbitrary_underscores_and-dashes_XY_azbc5678
And with that example I would like the resulting string to be:
1234XYz5678

The current expression I am using is:
'-transmissionreference<${filename;s/^(\d{4})[^_]*_([A-Z]{2,})_([abcz]*\d{4}) $/\1\2\3/; $_=$1.$2.$3=~s/[abc]//gr}'

I'm getting the following message:
Warning: Use of uninitialized value $3 in concatenation (.) or string for 'filename'

StarGeek

Try this
'-transmissionreference<${filename;m/^(\d{4})_.*_([A-Z]{2,})_([abcz]*\d{4})/;$_=$1.$2.($3=~tr/abc//rd)}'

Example
C:\>exiftool -p "${filename;m/^(\d{4})_.*_([A-Z]{2,})_([abcz]*\d{4})/;$_=$1.$2.($3=~tr/abc//rd)}"   Y:\!temp\ccccc\1234_any-arbitrary_underscores_and-dashes_XY_azbc5678.jpg
1234XYz5678
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

philbond87

StarGeek,

Thank you – that seems to have done it.

Was the issue that tr/// returns the altered string and s/// simply returns the number of substitutions, if the characters are found?

Very much obliged.

StarGeek

Quote from: philbond87 on December 06, 2020, 01:49:14 PM
Was the issue that tr/// returns the altered string and s/// simply returns the number of substitutions, if the characters are found?

There were other problems in the whole thing, but the problem there was that you can't do a substitution (or use the tr translates operator) on a capture group variable.  An example in straight Perl
C:\>perl -e "$1=10;print $1"
Modification of a read-only value attempted at -e line 1.


For tr/abc//rd, the d option means it will delete characters and the r option will return a new string and leave the original untouched. 
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Luuk2005

Greetings Philbond87. Sorry Im not replying earlier (we have baby kittens!)  Both my regex assume names to be like first examples.
Im not realized that many underscores can be inside of "arbitrary number of characters, ending with an underscore".
So both my regex must change [^_]+ into .+ like StarGeek already fixes for you.

After that, Im thinking its ok to use either $3=~tr/abc//rd or $3=~s/[abc]//gr  ????
Im on Windows so maybe its different, but this often a big problem Im having when either tr///r or s///r ...

With ${Tag; s/whatever/$1$2$3$4/; $_=$1.$2.($3=~s///r).$4;   The $4 never conducts because /r-modifier resets groups like Phil says.
So instead Im just do this instead:     $_=$1.$2.$4($3=~s///r);   with another s/// to move $3 later.
Maybe there is better ways, but Im just learning and not discovered any yet.

So both my $Tags could be like ....
'${Filename; s/^(\d{4})_.+_([A-Z]{2,})_(.*\d{4})\.[^.]{2,4}$/\1\2\3/; $_=$1.$2.($3=~s/[abc]//gr)}'
'${Filename; m/^(\d{4})_.+_([A-Z]{2,})_(.*\d{4})\.[^.]{2,4}$/; $_=$1.$2.($3=~s/[abc]//gr)}'

The \.[^.]{2,4}$ makes sure that names end with _Uppers_4Digits, but not _Uppers_4DigitsAnyText.
I wish I have Linux to test ($3=~s///r) instead of $3=~s///r, so far Im only needing () to do the math.
Sorry if there was any confusion.
Windows8.1-64bit,  exiftool-v12.92(standalone),  sed-v4.0.7