Hello,
First, thanks for the great tool.
I am using json output with the following command in a PHP Class to extract metadata with the following :
exitfool -f -struct -json <file location>
Most of the time all is working fine but sometimes, the user comment is not UTF-8 compliant.
In my own photos collection, this is mostly due to my old Samsung Phones (S2, S3) which add a strange user comment (i never wrote that !) :
"UserComment": "\u0012�\u000F;",
You can fin a example as an attachment.
But sometimes it is invalid encoding or something else.
I do not think exiftool can find the correct encoding and that is fine.
The issue is that the json generated is not UTF-8 compliant and that I can not read it (tried PHP, Python, JS, Java).
Is there a solution to encode it in base64 (with specific option) or removed it if not correctly encoded so that we can be sure the generated json is compliant with UTF-8 ?
regards,
I did a bit of research, and as far as I can tell that sequence is valid UTF-8. Those control characters are valid I think.
However, you may be able to work around the problem by adding the -b option to your command. In this particular case it will cause the value to be hex-encoded since one of the characters (the one that is converted to a question mark) is not valid UTF-8.
If you can find a reference that says U+0012 or U+000F are not valid UTF-8 characters, then maybe a change to ExifTool would be appropriate, but here is a reference (http://www.fileformat.info/info/charset/UTF-8/list.htm) that seems to indicate they are valid.
- Phil
Edit: Hold on. I get a question mark in the middle (U+003F), but you get something else. Maybe this is the problem and not the other characters. What command are you using, and what version of ExifTool? This is what I get with ExifTool 10.11:
> exiftool ~/Desktop/20141231_193051.jpg -usercomment -json
[{
"SourceFile": "/Users/phil/Desktop/20141231_193051.jpg",
"UserComment": "\u0012?\u000F;"
}]
And the question mark here is a simple ASCII question mark character (U+003F).
Thanks for the help.
I was on CentOS6 with an old version (8.5x). I export to a file what I had.
Switch to a debian OS show a '?' character (version 9.7x).
I am sorry for the inconvenience.
I will re do the process on this machine.
Regards,
Yes. This was a 4-year-old bug:
Jan. 18, 2012 - Version 8.76
- Patched -json output to filter out invalid UTF-8 characters
- Phil