Regex substitution and IPTC truncation

Started by chuft-captain, December 06, 2017, 12:05:38 PM

Previous topic - Next topic

chuft-captain

Hi,

Regex Substitution
This tool was recommended to me and it certainly looks like it's capable of doing just about anything you can imagine with metadata. This is my first time using ExifTool because I wanted to make some specific  targeted edits to IPTC data in a folder of images.
I'm using the IPTC:SpecialInstructions field to store this data.

I want to replace a particular section of this TAG data when it is found to match the pattern: "&rf=<num>", where <num> is a sequence of 1 or more digits.

If this data was in Notepad++ it would be an easy task to do the replacement using a very simple regular expression... something like: "&rf=[0-9]*")
I'm not sure of the exact syntax for regex in ExifTool, however I think the following command will pretty much do the tag modification I need to do (minus the necessary regex):

Quoteexiftool -api "Filter=s/&rf=<num>/&rf=119703/" -tagsfromfile @ -iptc:SpecialInstructions *.jpg

All I need to do is work out what the correct syntax for the regex required where "<num>" is (assuming you can just plug in some regex here).
If someone can help me out with the syntax of the regex implementation in ExifTools, it would be greatly appreciated.

IPTC truncation
Here's where I've hit a brick-wall, because like many other EXIF and photo tools (including the Nikon ViewNX software), ExifTool truncates certain TAGS when they exceed a maximum length imposed by the IPTC specification.
QuoteWarning: [Minor] IPTC:SpecialInstructions exceeds length limit (truncated)
As this data is not for use in camera or photographic equipment, I'm not concerned by the fact that it's length (3-400 chars) exceeds the specification.
I can create and maintain this data in Ifrfanview (which is the only software I've discovered to date which does not enforce this character limit and truncate), however whenever I perform any operation on these images with other software (ViewNX, ExifTool), the IPTC data is truncated, which effectively corrupts my metadata.

EDIT: I should also mention that I tried an export/edit in Notepad++/import approach:
Quoteexiftool -ICCProfileName -all -csv *.jpg > FILENAME.csv
... edit ...
exiftool -ICCProfileName -all -csv+=FILENAME.csv *.jpg
but on import, the fields were once again truncated, and therefore (at least for my purpose) corrupted.

Is there any particular reason why ExifTool truncates? It seems to me that this is a limit that is probably based on some historical camera hardware limit, and is either no longer relevant, or would be automatically truncated iif necessary by the hardware in question if the hardware couldn't handle the extra length.
I have no problem with the issuing of a warning message, but IMO there's no value in actually truncating the data during editing, as the specified limit probably only matters when the image hits some actual hardware (like a camera) at which time it would almost certainly be automatically truncated by the hardware in any case.

I don't see the logic of truncating this data when it's stored on a computer, and may never actually end up on a camera/hardware device? Fine to warn about it, but IMO leave the truncation up to the hardware.

Is there currently a way around this truncation issue in ExifTool? ... some sort of override ... or is a software change required ?

Any thoughts ?

EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

StarGeek

Quote from: chuft-captain on December 06, 2017, 12:05:38 PM
I'm not sure of the exact syntax for regex in ExifTool, however I think the following command will pretty much do the tag modification I need to do (minus the necessary regex):

Quoteexiftool -api "Filter=s/&rf=<num>/&rf=119703/" -tagsfromfile @ -iptc:SpecialInstructions *.jpg

All I need to do is work out what the correct syntax for the regex required where "<num>" is (assuming you can just plug in some regex here).

It looks like you pretty much nailed it.  To replace numbers, you can either use the same [0-9]* you used in notepad++ or a simple \d* (which also works in notepad++).

If you would like to know more (insert "Starship Troopers" meme here), you can search on Perl regex.  In the past I've found regular-expressions.info to be a good regex tutorial site.

QuoteExifTool truncates certain TAGS when they exceed a maximum length imposed by the IPTC specification.

Add -m to tell exiftool to ignore this limitation.

QuoteIs there any particular reason why ExifTool truncates?

I would assume it's because that's what the standard calls for.  Phil has a pretty strong stand on keeping to the standards but is flexible enough to provide workarounds.  Exiftool will happily read any length and with the -m, write it as well.

QuoteIt seems to me that this is a limit that is probably based on some historical camera hardware limit, and is either no longer relevant, or would be automatically truncated iif necessary by the hardware in question if the hardware couldn't handle the extra length.

If I recall correctly, IPTC was developed in the early 1990s for use with news organizations.  Cameras probably weren't part of process, most likely the computers in use at the time had a bigger influence.  Even now, I don't think very many cameras deal with IPTC.  Even XMP doesn't appear in many camera images as far as I know.  EXIF is pretty much the camera standard.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

chuft-captain

StarGeek,

Thanks very much for your help. (The -m is a godsend!) ... it does appear that Phil has thought of every scenario and contingency!  :D

The final result:
Quoteexiftool -overwrite_original -m -api "Filter=s/&rf=\d+/&rf=119703/" -tagsfromfile @ -iptc:SpecialInstructions *.jpg
and this works a treat!

I realized I'll probably use this more than once, so the only other thing that I might do is automate it a little ... In the very small amount of research I've done so far I came across a technique which I think involves renaming the .EXE with selected switches in ()'s. eg.
Quoteexiftool(-k).exe
... which I guess might translate to something like the following for my scenario:
Quoteexiftool(-overwrite_original -m -api "Filter=s/&rf=\d+/&rf=119703/" -tagsfromfile @ -iptc:SpecialInstructions).exe

Please excuse my ignorance, as my experience of ExifTool is only hours old, so I'm not exactly sure how this paradigm works and what the limits are, but in any case I would probably replace the "&rf=119703" with a parameter, so that I can have the flexibility of varying the replacement string on different executions.

Or alternatively... just create a custom windows shortcut, and specify the switches, parameters and paths in the usual way in the TARGET field. I guess that would also work.

Cheers!
CC
EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

Phil Harvey

See part ii of the Running in Windows section of the ExifTool home page for limitations of arguments in the .exe file.  Basically, you can include anything but these characters:  /\?*:|"<>

But you can get around this limitation with a Windows shortcut as you mentioned.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

chuft-captain

Thanks Phil,

This looks like a very powerful (and potentially dangerous  ;D) tool.
I think I'll start using ExifTool to generate default TAG values for my images by writing a batch file containing something like:
Quoteexiftool.exe -overwrite_original -m -By-lineTitle="mysite.com" -CopyrightNotice="(C)2017 myname" -Caption-Abstract="Default Caption Text" -Author="mysite.com" -Copyright="(C)2017 myname" -Caption="Default Caption Text" %1
... which I can execute for example with:
Quoteiptc_init *.jpg
or
Quoteiptc_init .

I might even add in a "-r" switch.  ;D

A couple of other questions come to mind:
1. I thought it might be a good idea to have the caption field default to the filename prefix (eg. foo.jpg would get "foo" as a default caption.) ... How do you do this?
2. It seems that the code above would do a brute force overwrite of existing TAGS. I don't want to overwrite certain TAGS if they already have a non-null value. eg. the Caption TAG is one field for which I would not want to brutally overwrite it if I've already customized it's value on some images. (ie. overwrite it only if empty).

Sorry for the extra questions, I'm sure the answers are somewhere in the documentation, but there's just SO MUCH documentation for a newbie!  ;)

Regards
CC

EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

StarGeek

Quote from: chuft-captain on December 07, 2017, 03:34:01 AM
2. It seems that the code above would do a brute force overwrite of existing TAGS. I don't want to overwrite certain TAGS if they already have a non-null value. eg. the Caption TAG is one field for which I would not want to brutally overwrite it if I've already customized it's value on some images. (ie. overwrite it only if empty).

Take a look at the -wm option-wm cg is probably what you want.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

chuft-captain

Thanks StarGeek,

I'll give that a try. One caveat perhaps ---it looks like it's a global mode which will apply to all TAGs.
How do I restrict it to just certain tags ... will I have to call ExifTool in 2 passes (1st pass with "-wm cg" for selected fields, and then a 2nd pass for brute force overwrites of all other fields?)

Cheers
CC
EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

chuft-captain

OK,
so moving the caption field mods into a another batch file and running it as a second pass through the files works, although a little ugly.

I also found an apparent solution to getting the filename prefix as a caption, when I found the special config file here

After installing that, I then modified my caption batch file to be this:
Quoteecho off
SET filespec=%1
if (%filespec%)==() set filespec=.
echo. filespec: %filespec%

echo on
exiftool.exe -wm cg -overwrite_original -m "-Caption-Abstract<basename" "-Caption<basename" %filespec%
which worked perfectly until ....

... I realized that before running this new BAT file, I also wanted to clear out all the old "default captions" (which happened to be set as the string "Caption Text HERE", so I ran the following to try and clear them all out (without affecting any other customized captions).
Quoteecho off
SET filespec=%1
if (%filespec%)==() set filespec=.
echo. filespec: %filespec%

echo on
exiftool.exe -overwrite_original -m -api "Filter=s/Caption Text HERE//" -tagsfromfile @ -iptc:Caption-Abstract %filespec%

This clears them out alright, but the gotcha is that for some reason, the filename prefix script from above will no longer update those cleared out caption tags (I suspect because, although the caption tags are now empty, the "-wm cg" option somehow sees them as having been modified:
Quote1 directories scanned
    0 image files updated
  353 image files unchanged

CC
EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

Phil Harvey

The problem is that you set Caption-abstract to an empty text instead of deleting it.  Since it exists, -wm cg won't write it.

You should have done this to delete the caption:

exiftool -caption-abstract-="Caption Text Here" FILE

But now that the caption-abstract is empty, use this to delete it:

exiftool -caption-abstract-= FILE

Then your other command should work.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

chuft-captain

#9
Thanks Phil,

I thought that was probably the reason!

I'm still finding my way around the syntactic and semantic conventions of your tool by trial and error a bit. There's quite a bit to it!!!

So in the spirit of trial  and error, my first idea was something like:
exiftool.exe -api "Filter=s/Caption Text HERE/$filename/" -tagsfromfile @ -iptc:Caption-Abstract %filespec%
... that didn't work, so next was:
exiftool.exe -api "Filter=s/Caption Text HERE/<basename/" -tagsfromfile @ -iptc:Caption-Abstract %filespec%
... and that didn't work.  :o

It was then, that I decided to replace with empty strings ... which got me into the strife with the -wm cg

The good news is that in the fulll knowledge that I'm a newbie and working definitely by the seat of my pants, I realized that there would be stuff ups, so while I'm learning I'm working only on a duplicate folder of the original images (I may be green with ExifTool, but I'm not THAT green!  ;D).
So in this case I decided just to restore from the originals and start from scratch again (no big deal), which enabled me to use your first suggestion:
exiftool -caption-abstract-="Caption Text Here" FILE

Wisdom in hindsight, I think it will be easier in the long run to just default the caption tag to some standard string such as "Caption Text Here" rather than the "basename" idea. This will make it easier to make bulk changes to that value if necessary, and defaulting the caption to the filename is not really that useful in any case.

Thanks for all the help. Very much appreciated.
CC
EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

chuft-captain

#10
Thank you very much Phil and StarGeek for the advice.

Final version:
Quoteexiftool.exe -preserve -overwrite_original -m -api "Filter=s/&rf=\d*/&rf=%num1%/;s/&lm=\d*/&lm=%num2%/" -tagsfromfile @ -iptc:SpecialInstructions %filespec%

One question more: What if I want the filter to search for a certain pattern, but not replace it?

ie. List all images which contain a certain value in a specified tag (or perhaps any tag).

eg. Extract Caption-Abstract of all images where the Caption tag has the value: AUTHOR_BYLINE

I tried:
Quoteexiftool -P -Caption-Abstract -if "$Caption eq "AUTHOR_BYLINE"" .
and
Quoteexiftool -P -Caption-Abstract -if "$Caption eq AUTHOR_BYLINE" .

In each case:
Quote1 directories scanned
  353 files failed condition
    0 image files read



CC
EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

StarGeek

Quote from: chuft-captain on December 09, 2017, 09:21:25 PM

One question more: What if I want the filter to search for a certain pattern, but not replace it?

ie. List all images which contain a certain value in a specified tag (or perhaps any tag).

To search for a value in a specific tag, using your SpecialInstructions example:
exiftool -if "$IPTC:SpecialInstructions=~/8675309/" -IPTC:SpecialInstructions FileOrDir[/tt]

I don't think it's possible to list only tags that contained a certain value. 
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

StarGeek

Ninja Edit!

In the case of Caption eq AUTHOR_BYLINE, that doesn't search part of a string, the whole thing has to exactly match.  Use the regex match as above.

For a case sensitive match (removed -P option since this is a read, not write operation)
exiftool -Caption-Abstract -if "$Caption=~/AUTHOR_BYLINE/" .

To make the search case insensitive, add an i after the last slash of the regex /AUTHOR_BYLINE/i
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

chuft-captain

#13
 ;D

Oops. I realized that the reason it's not working is because I was looking in the wrong tag. It should have been:
Quoteexiftool -if "$By-Line=~/AUTHOR_BYLINE/i" .

My bad!!!  :-\

BTW.
What's the difference between "eq" and "=~" operators?


Ninja Edit #2:

I noticed that the documentation uses a combination of single and double quotes:
Quoteexiftool -shutterspeed -if '$make eq "Canon"' dir

I'm assuming the use of single quotes is a linux syntax, and that Windows uses double quotes instead.

Thanks once again for the help!  ;D
EXIFTOOL Documentation: https://exiftool.org/exiftool_pod.html

StarGeek

Quote from: chuft-captain on December 09, 2017, 10:34:12 PM
What's the difference between "eq" and "=~" operators?

'eq' is an exact match.  'a' equals 'a'.  It doesn't equal 'ab', 'aaa', 'รก' or 'A'.  If they are not exactly the same, they don't match.

I tend to think of '=~' as the regex operator but technically it's called the Binding Operator and can do a few other things.  In this case, it's being used for a regex match. 
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype