Main Menu

decoding 'num' format

Started by jwag, April 16, 2019, 09:55:50 PM

Previous topic - Next topic

jwag

I recently ran into the following output from exiftool:
    "AFPointsInFocus1D": {
      "desc": "AF Points In Focus 1D",
      "num": "J\u0000\u0000\u0000\u0004",
      "val": "C6 (C6)"
    },

and this caused by python DB encoder to freak out about an invalid unicode constant.

Just how should I interpret the "num" field?

This is the command line:
exiftool -groupHeadings -All -json -long

Phil Harvey

Use the -v3 option without -json to see what the field actually contains.  The numerical value is a Unicode representation of a binary value.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jwag

thanks -
AFPointsInFocus1D = J.
  | | |     - Tag 0x0094 (8 bytes, undef[8]):
  | | |         0ad0: 4a 00 00 00 04 00 00 00

jwag

Ok - working on this some more - the JSON produced is:

"num": "J\u0000\u0000\u0000\u0004",

but the raw value is:

0ad0: 4a 00 00 00 04 00 00 00

So it isn't really the same right? the binary value is 8 bytes, where as the 'encoded' value is 5 bytes - the final 3 (null) bytes are left off - which sort of makes sense for strings - but doesn't make sense if this is trying to represent the 'binary' value. This means one cant actually recreate the actual binary value...

Phil Harvey

You're right of course.  The default ExifTool JSON routines don't handle this "num" value very well.  To do this properly I should probably use ASCII-hex as an intermediate "num" value, but I don't know if this would be worth the trouble (considering there are probably more tags like this and in general nobody is interested in the "num" values of these anyway).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jwag

Fair enough - I am not that interested either  8) - until it broke my DB encoder. Easy enough to work around.

Thanks

Phil Harvey

I'm interested to know why your encoder thinks that "\u0000" is an invalid Unicode character?

As far as I know this is valid (for example).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

jwag

This appears to be a limitation / convention of postgresql - a few years back:

Here is a stackoverflow article that described precisely what I saw:

https://stackoverflow.com/questions/31671634/handling-unicode-sequences-in-postgresql

From the changelog:
"The json type did not have the storage-ambiguity problem, but it did have the problem of inconsistent de-escaped textual output. Therefore \u0000 will now also be rejected in json values when conversion to de-escaped form is required. This change does not break the ability to store \u0000 in json columns so long as no processing is done on the values. This is exactly parallel to the cases in which non-ASCII Unicode escapes are allowed when the database encoding is not UTF8."

My thought is that if you really want to make arbitrary binary into a json acceptable 'string' - using something like base64 encoding is the simplest and universally understood.

Phil Harvey

Thanks for the explanation.

Quote from: jwag on April 17, 2019, 08:03:30 PM
My thought is that if you really want to make arbitrary binary into a json acceptable 'string' - using something like base64 encoding is the simplest and universally understood.

Right.  In fact, this does happen to the binary values that make it to the final converted "val" (with the -b option).  But the "num" values aren't necessarily designed to be human readable.  I should maybe rethink this though, and reformat these tags.  It probably isn't a large number of tags that do this, but going through the 22000+ pre-defined tags to figure out which ones will take a bit of time.  Or an alternative would be just to sanitize all output JSON strings to remove the nulls.  This would be quick and dirty, but may be the thing to do.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).