XML files format strings as floats when json flag is used

Started by ytreister, October 14, 2021, 08:51:29 AM

Previous topic - Next topic

ytreister

Given the following simple XML file:

<?xml version="1.0"?>
<item>
    <a id="1E500"></a>
    <b id="1A500"></b>
    <c id="1E"> </c>
    <d id="1E1"> </d>
</item>

If I use the JSON output flag exiftool -j sample.xml, I get:

[{
"ItemAId": 1E500,
"ItemBId": "1A500",
"ItemCId": "1E",
"ItemDID": 1E1
}]

When using defaults, the output contains:

Item A Id       : 1E500
Item B Id       : 1A500
Item C Id       : 1E
Item D Id       : 1E1

I feel like the fact that JSON output interprets certain strings to floats is problematic. I attached what happens when I read the exiftool output into Python.  Item D gets set to 10.0 and Item A to inf in the Python dictionary.

ytreister

Forgot to mention in original post that this was running exiftool version 12.25 on Linux OS.

As a temporary workaround for my Python ingest of the exiftool output:

with open('output.json', 'r') as f:
j = json.load(f, parse_float=str)

the parse_float=str option ensures that I do not have this problem.

StarGeek

Quote from: ytreister on October 14, 2021, 08:51:29 AM
I feel like the fact that JSON output interprets certain strings to floats is problematic.

How would you differentiate between 1E500 as a string and 1E500 as scientific notation, e.g. 1x10500?
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

ytreister

#3
Quote from: StarGeek on October 14, 2021, 10:53:46 AM
Quote from: ytreister on October 14, 2021, 08:51:29 AM
I feel like the fact that JSON output interprets certain strings to floats is problematic.

How would you differentiate between 1E500 as a string and 1E500 as scientific notation, e.g. 1x10500?
You don't. Exiftool should just keep it as a string, and whoever parses the exiftool output can decide if they want to convert the string to a float, etc.  The output could represent many things: int, float, hexadecimal, base64, string, etc.

Phil Harvey

The same problem would apply to integers.  They are strings too, but more useful in JSON if they are integers.  I think if I quoted everything then some people would be unhappy.  This should probably be an option, but I don't think it is worth creating a new option just for this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).