Uncalled-for character prepended to IPTC:Caption-Abstract

Started by mazeckenrode, July 30, 2020, 01:09:00 PM

Previous topic - Next topic

mazeckenrode

My apologies if this isn't a bug, but any other explanation for it is eluding me, so far.

In the process of further testing and utilizing a monster of an ExifTool command line, assembled with much help from this topic thread, I've found that ExifTool seems to be prepending a character (displayed as a ? question mark in a JSON exported by ExifTool, and in the Description field of Directory Opus' Set Metadata panel) to the string I'm trying to have it write to IPTC:Caption-Abstract. My command line returns message:

Warning: Some character(s) could not be encoded in Latin

The same string is also being written to multiple other tags, but IPTC:Caption-Abstract is the only tag exhibiting the extra prepended character.

This defines IPTC:Caption-Abstract as string[0,2000].

I've also read and noted FAQ #10 regarding character encoding, especially as it pertains to IPTC tags, though as far as I can see, there are no characters in my string-to-be-written that don't exist in Latin1/cp1252.

Full ExifTool command line used as follows:


ExifTool -m "-EXIF:ImageDescription<${EXIF:ImageDescription;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-IPTC:Caption-Abstract<${EXIF:ImageDescription;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-XMP-dc:Description<${EXIF:ImageDescription;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-EXIF:XPSubject<${EXIF:XPSubject;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-XMP-dc:Subject<${EXIF:XPSubject;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-EXIF:XPTitle<${XMP:Title;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-IPTC:ObjectName<${IPTC:ObjectName;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-XMP:Title<${XMP:Title;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-XMP-dc:Title<${XMP:Title;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-EXIF:XPComment<${EXIF:XPComment;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-EXIF:UserComment<${EXIF:XPComment;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-XMP:UserComment<${XMP:UserComment;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" "-EXIF:XPKeywords<${IPTC:Keywords;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/(Page_?)\d+/$1$temp/}" "-IPTC:Keywords<${IPTC:Keywords;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/(Page_?)\d+/$1$temp/}" "-XMP:Subject<${XMP:Subject;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/(Page_?)\d+/$1$temp/}" "-EXIF:DateTimeOriginal-<0:0:${EXIF:XPComment;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;m/\d+\/(\d+)/;$_=$1-$temp}" "-XMP:DateTimeOriginal-<0:0:${EXIF:XPComment;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;m/\d+\/(\d+)/;$_=$1-$temp}" "-IPTC:TimeCreated-<0:0:${EXIF:XPComment;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;m/\d+\/(\d+)/;$_=$1-$temp}" .


(Am I pushing it?)

Relevant files in attached archive. "...(Original).png" is the file prior to processing by the above command line, but after having had multiple tags written to it by Directory Opus (not including IPTC:Caption-Abstract, which Directory Opus doesn't write, at least not in any of my tests so far), and a separate ExifTool command line to write EXIF/XMP:DateTimeOriginal, IPTC:DateCreated and IPTC:TimeCreated from the filename.

Attached: "IPTC Caption-Abstract Extra Char.7z" (5,081)

Contents:

"IPTC Caption-Abstract Extra Char\"
  "2020-03-04 17;00;00 - ABCD - XX-1234567 (Vehicles+Drivers) - 01 [Cover].png" (6,386) [1 x 1 x 1]
  "2020-03-04 17;00;00 - ABCD - XX-1234567 (Vehicles+Drivers) - 01 [Cover] (Original).png" (4,849) [1 x 1 x 1]

mazeckenrode

Addendum to post above:

ExifTool v12.0
Windows 7 x64 & Windows 10 x64

Phil Harvey

What encoding is your console using?  Did you try adding -L to the command (assuming your console encoding is Latin)?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

mazeckenrode

My command line is actually being launched by Directory Opus, and I assumed (apparently mistakenly) that the console opened by it would by default be Latin1/cp1252. Turns out it's DOSLatinUS/cp437, so my bad.

Before I thought to look into the console's actual encoding, I did try -L, which resulted in the same warning as previously, but  instead of ? being prepended to string for IPTC:Caption-Abstract.

Next I tried also placing the console command chcp 1252 immediately before my ExifTool command line. Result: No warning this time, but  still prepended to string for IPTC:Caption-Abstract.

String I'm expecting ExifTool to write to IPTC:Caption-Abstract: Insurance policy XX-1234567 [vehicles/drivers] documents, p 1/10 [cover page]; Mailed by ABCD (Alpha Bravo Charly Delta, Inc), PO Box 0000, New York, NY, 4 Mar 2020

String actually written by ExifTool, without using -L and/or chcp 1252, as represented in exported JSON: ?Insurance policy XX-1234567 [vehicles/drivers] documents, p 1/10 [cover page]; Mailed by ABCD (Alpha Bravo Charly Delta, Inc), PO Box 0000, New York, NY, 4 Mar 2020

String actually written by ExifTool, using -L and/or chcp 1252, as represented in exported JSON: Insurance policy XX-1234567 [vehicles/drivers] documents, p 1/10 [cover page]; Mailed by ABCD (Alpha Bravo Charly Delta, Inc), PO Box 0000, New York, NY, 4 Mar 2020

ALL of the characters in my expected string-to-be-written above are represented in both DOSLatinUS/cp437 AND Latin1/cp1252, as far as I can tell. The first character I certainly is. Where are the mystery characters coming from, and why do they only show up in IPTC:Caption-Abstract? The correct string gets successfully written to IPTC:ObjectName and a handful of similarly-purposed EXIF and XMP tags.

Phil Harvey

There is no difference in the way ExifTool writes Caption-Abstract compared with ObjectName.  You must have hidden characters in your command or something.  I suggest taking a step back and writing a very simple test value to both.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

mazeckenrode

Quote from: Phil Harvey on July 30, 2020, 04:04:21 PM
There is no difference in the way ExifTool writes Caption-Abstract compared with ObjectName. You must have hidden characters in your command or something.

It seems like there could be a difference in how metadata is extracted from EXIF:ImageDescription, operated on by the regex code I'm using, and then written to IPTC:Caption-Abstract. The same regex code is used on a multitude of tags in my full command line, literally copied and pasted for each tag I want to operate on, with only the names of the target and source tags changed as necessary. For all the other tags my command line writes, this code works, but not for IPTC:Caption-Abstract as written to with a string adapted from EXIF:ImageDescription. I tried issuing the offending segment by itself:

ExifTool -m "-IPTC:Caption-Abstract<${EXIF:ImageDescription;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" .

Same result, the message Warning: Some character(s) could not be encoded in Latin, and ? or  prepended to IPTC:Caption-Abstract.

On the other hand, if I use a different source tag to extract from, such as IPTC:ObjectName or EXIF:XPSubject, they both work.

I also noted that using EXIF:ImageDescription as both target and source, as my full command line also does, does not result in any wonky string, and I took the liberty of testing other combinations with EXIF:ImageDescription as the source:


EXIF:XPSubjectWorks
IPTC:ObjectNameSame result as for IPTC:Caption-Abstract
XMP:TitleWorks
EXIF:XPCommentWorks
XMP:UserCommentWorks

So it looks like it's something between EXIF:ImageDescription and various IPTC tags, maybe?

By the way, it turns out I was initially correct after all, the console windows opened by Directory Opus do indeed use Latin1/cp1252 by default.

Quote
I suggest taking a step back and writing a very simple test value to both.

Writing a string directly to IPTC:Caption-Abstract works just fine.

Phil Harvey

Quote from: mazeckenrode on July 30, 2020, 08:42:11 PM
ExifTool -m "-IPTC:Caption-Abstract<${EXIF:ImageDescription;my $temp=$1 if $self->GetValue('FileName')=~m/ - 0*(\d+)(?!.* - ).*\.(?:jpe?g|png|tiff?)$/;s/\d+(\/\d+)/$temp$1/}" .

This command works for me in tcsh on Mac if I swap the quotes and escape the exclamation point with a backslash.

Do you see the problem if you start with a simple value like "test" in ImageDescription?  This is what I was using.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

mazeckenrode

Quote from: Phil Harvey on July 31, 2020, 08:01:57 AM
Do you see the problem if you start with a simple value like "test" in ImageDescription?

It happens, and is going to keep happening, no matter what value I use, but I'm certain I found the culprit, and no longer suspect ExifTool. The difference is (and I think you'll agree that this has become a recurring theme with me) that my workflow includes initially using Directory Opus to write various metadata, then using ExifTool to perform adjustments. You were right about there being hidden characters, but I failed to identify it previously because it's zero-width and only exists (that I've found so far) in EXIF:ImageDescription as written by DOpus: U+FEFF (byte order mark). Not sure what, if anything, I'm going to do about it. But sorry for the false alarm.

I do wish Notepad++'s feature for showing all characters included showing BOM.

StarGeek

Heh, the idea that it might be a BOM crossed my mind, but I didn't get around to posting it.  It's a problem I've come across while web scraping stuff.  So many pages have BOMs embedded in the middle of a paragraph or even a sentence sometimes.  It becomes a real pain when you use that BOM embedded data to name a file.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

Yeah, I was thinking about the possibility of a BOM in there too...  And I can see it now in the ImageDescription of the original file that was uploaded.  (A UTF-8 BOM)  Doh.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

mazeckenrode


Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).