JSON output and invalid user comment

Started by jphautin, February 29, 2016, 12:57:42 PM

Previous topic - Next topic

jphautin

Hello,

First, thanks for the great tool.

I am using json output with the following command in a PHP Class to extract metadata with the following :
exitfool -f -struct -json <file location>

Most of the time all is working fine but sometimes, the user comment is not UTF-8 compliant.
In my own photos collection, this is mostly due to my old Samsung Phones (S2, S3) which add a strange user comment (i never wrote that !) :
"UserComment": "\u0012�\u000F;",
You can fin a example as an attachment.
But sometimes it is invalid encoding or something else.

I do not think exiftool can find the correct encoding and that is fine.
The issue is that the json generated is not UTF-8 compliant and that I can not read it (tried PHP, Python, JS, Java).
Is there a solution to encode it in base64 (with specific option) or removed it if not correctly encoded so that we can be sure the generated json is compliant with UTF-8 ?

regards,



Phil Harvey

I did a bit of research, and as far as I can tell that sequence is valid UTF-8.  Those control characters are valid I think.

However, you may be able to work around the problem by adding the -b option to your command.  In this particular case it will cause the value to be hex-encoded since one of the characters (the one that is converted to a question mark) is not valid UTF-8.

If you can find a reference that says U+0012 or U+000F are not valid UTF-8 characters, then maybe a change to ExifTool would be appropriate, but here is a reference that seems to indicate they are valid.

- Phil

Edit:  Hold on.  I get a question mark in the middle (U+003F), but you get something else.  Maybe this is the problem and not the other characters.  What command are you using, and what version of ExifTool?  This is what I get with ExifTool 10.11:

> exiftool ~/Desktop/20141231_193051.jpg -usercomment -json
[{
  "SourceFile": "/Users/phil/Desktop/20141231_193051.jpg",
  "UserComment": "\u0012?\u000F;"
}]


And the question mark here is a simple ASCII question mark character (U+003F).
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jphautin

Thanks for the help.

I was on CentOS6 with an old version (8.5x). I export to a file what I had.
Switch to a debian OS show a '?' character (version 9.7x).

I am sorry for the inconvenience.

I will re do the process on this machine.

Regards,

Phil Harvey

Yes.  This was a 4-year-old bug:

    Jan. 18, 2012 - Version 8.76
      - Patched -json output to filter out invalid UTF-8 characters


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).