Question to: Output of Unknown tags

Started by herb, February 05, 2019, 08:08:27 AM

Previous topic - Next topic

herb

Hello Phil,

I use Exiftool 11.26 on a Windows 7 system.

A typical command to display tags of an image file is like the following
exiftool.exe -t -a -H -charset exif=utf8 -m -u -sort -g0:0 -All:all <imagefile>

My application does expect that the output of Exiftool is UTF-8.
But in cases of option -u is used my application sometimes receives an "invalid utf8" string.
At the moment I think this "invalid utf8" does come from some binary data that my application tries to handle as surrogate characters.
But I have to do more investigations.
Now I ask myself what happens e.g. to a line-feed character inside the value of such a binary.

My questions therefore are:
- how does Exiftool put the value for such an unknown tag into the output-string?
  I know that a long output is shortened and I guess some binary data are changed to dot or to "0" or to a "numeric string", some are not.
- is there a difference when also option -n (unconverted) is used?
  I guess: no
- is there a difference when in addition a filter function is used?
  In my case I use -api Filter=ReplaceNL($_) in order to e.g. change line-feed to printable characters "\n".
  I think the filter function is applied also.

Depending on the answers I ask myself what is a good solution to avoid such "invalid utf8" strings or how to handle it in my application.

Thanks for your comments and clarifications in advance

Best regards
Herb

Phil Harvey

Hi Herb,

There is no guarantee the Exiftool will output valid UTF-8 (in fact, in many cases it doesn't).  You must do UTF-8 validation yourself if you require this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).