ExifTool Forum

ExifTool => Developers => Topic started by: herb on February 05, 2019, 08:08:27 AM

Title: Question to: Output of Unknown tags
Post by: herb on February 05, 2019, 08:08:27 AM
Hello Phil,

I use Exiftool 11.26 on a Windows 7 system.

A typical command to display tags of an image file is like the following
exiftool.exe -t -a -H -charset exif=utf8 -m -u -sort -g0:0 -All:all <imagefile>

My application does expect that the output of Exiftool is UTF-8.
But in cases of option -u is used my application sometimes receives an "invalid utf8" string.
At the moment I think this "invalid utf8" does come from some binary data that my application tries to handle as surrogate characters.
But I have to do more investigations.
Now I ask myself what happens e.g. to a line-feed character inside the value of such a binary.

My questions therefore are:
- how does Exiftool put the value for such an unknown tag into the output-string?
  I know that a long output is shortened and I guess some binary data are changed to dot or to "0" or to a "numeric string", some are not.
- is there a difference when also option -n (unconverted) is used?
  I guess: no
- is there a difference when in addition a filter function is used?
  In my case I use -api Filter=ReplaceNL($_) in order to e.g. change line-feed to printable characters "\n".
  I think the filter function is applied also.

Depending on the answers I ask myself what is a good solution to avoid such "invalid utf8" strings or how to handle it in my application.

Thanks for your comments and clarifications in advance

Best regards
Herb
Title: Re: Question to: Output of Unknown tags
Post by: Phil Harvey on February 08, 2019, 01:08:16 PM
Hi Herb,

There is no guarantee the Exiftool will output valid UTF-8 (in fact, in many cases it doesn't).  You must do UTF-8 validation yourself if you require this.

- Phil