Author Topic: JSON output and invalid user comment  (Read 1402 times)

jphautin

  • Newbie
  • *
  • Posts: 2
JSON output and invalid user comment
« on: February 29, 2016, 12:57:42 PM »
Hello,

First, thanks for the great tool.

I am using json output with the following command in a PHP Class to extract metadata with the following :
exitfool -f -struct -json <file location>

Most of the time all is working fine but sometimes, the user comment is not UTF-8 compliant.
In my own photos collection, this is mostly due to my old Samsung Phones (S2, S3) which add a strange user comment (i never wrote that !) :
"UserComment": "\u0012�\u000F;",
You can fin a example as an attachment.
But sometimes it is invalid encoding or something else.

I do not think exiftool can find the correct encoding and that is fine.
The issue is that the json generated is not UTF-8 compliant and that I can not read it (tried PHP, Python, JS, Java).
Is there a solution to encode it in base64 (with specific option) or removed it if not correctly encoded so that we can be sure the generated json is compliant with UTF-8 ?

regards,



Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 17049
    • ExifTool Home Page
Re: JSON output and invalid user comment
« Reply #1 on: February 29, 2016, 01:42:27 PM »
I did a bit of research, and as far as I can tell that sequence is valid UTF-8.  Those control characters are valid I think.

However, you may be able to work around the problem by adding the -b option to your command.  In this particular case it will cause the value to be hex-encoded since one of the characters (the one that is converted to a question mark) is not valid UTF-8.

If you can find a reference that says U+0012 or U+000F are not valid UTF-8 characters, then maybe a change to ExifTool would be appropriate, but here is a reference that seems to indicate they are valid.

- Phil

Edit:  Hold on.  I get a question mark in the middle (U+003F), but you get something else.  Maybe this is the problem and not the other characters.  What command are you using, and what version of ExifTool?  This is what I get with ExifTool 10.11:

Code: [Select]
> exiftool ~/Desktop/20141231_193051.jpg -usercomment -json
[{
  "SourceFile": "/Users/phil/Desktop/20141231_193051.jpg",
  "UserComment": "\u0012?\u000F;"
}]

And the question mark here is a simple ASCII question mark character (U+003F).
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jphautin

  • Newbie
  • *
  • Posts: 2
Re: JSON output and invalid user comment
« Reply #2 on: February 29, 2016, 02:20:51 PM »
Thanks for the help.

I was on CentOS6 with an old version (8.5x). I export to a file what I had.
Switch to a debian OS show a '?' character (version 9.7x).

I am sorry for the inconvenience.

I will re do the process on this machine.

Regards,

Phil Harvey

  • ExifTool Author
  • Administrator
  • ExifTool Freak
  • *****
  • Posts: 17049
    • ExifTool Home Page
Re: JSON output and invalid user comment
« Reply #3 on: March 01, 2016, 09:09:30 AM »
Yes.  This was a 4-year-old bug:

    Jan. 18, 2012 - Version 8.76
      - Patched -json output to filter out invalid UTF-8 characters


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).