Difference in output between text, JSON and XML format

Started by Mac2, September 14, 2021, 07:35:03 AM

Previous topic - Next topic

Mac2

I have a JPG file (and a NRW) produced by a Nikon COOLPIX P330.
The file contains data in the [Nikon] location field.

When I extract the data with exiftool -G1 -Nikon:location I get "Chiesa di San Leonardo".

When I extract the data with exiftool -G1 -Nikon:location -json I get:

[{
  "SourceFile": "E:/data/download/community/11746/DSCN9362.JPG",
  "Nikon:Location": "Chiesa di San Leonardo"
}]


But when I use XMP output with exiftool -G1 -Nikon:location -json I get:

<Nikon:Location rdf:datatype='http://www.w3.org/2001/XMLSchema#base64Binary'>
Q2hpZXNhIGRpIFNhbiBMZW9uYXJkbwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
</Nikon:Location>


which is unexpected. My application uses XML output to import ExifTool data.
Why does XML use Base64-encoding but the normal and JSON formats do not?
When I decode the Base64 stream, I get

Chiesa di San Leonardo���������....

so there seems to be an issue which makes ExifTool switch to Base64-encoding for XML output only?


Phil Harvey

Trailing nulls are removed from the JSON output unless the -b option is used.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Phil, thank you for looking into this.
Maybe I did not explain it right. When I extract the tag value from the file, the XML data returned is a binary blob, while the JSON and plain text output is correct:

C:\exiftool.exe -G1 -Nikon:location file.jpg
[Nikon]         Location                        : Chiesa di San Leonardo

C:\exiftool.exe -G1 -Nikon:location -json file.jpg
[{
  "SourceFile": "E:/data/download/community/11746/DSCN9362.jpg",
  "Nikon:Location": "Chiesa di San Leonardo"
}]

C:\exiftool.exe -G1 -Nikon:location -X file.jpg
<?xml version='1.0' encoding='UTF-8'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

<rdf:Description rdf:about='E:/data/download/community/11746/DSCN9362.jpg'
  xmlns:et='http://ns.exiftool.org/1.0/' et:toolkit='Image::ExifTool 12.26'
  xmlns:Nikon='http://ns.exiftool.org/MakerNotes/Nikon/1.0/'>
<Nikon:Location rdf:datatype='http://www.w3.org/2001/XMLSchema#base64Binary'>
Q2hpZXNhIGRpIFNhbiBMZW9uYXJkbwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
</Nikon:Location>
</rdf:Description>
</rdf:RDF>


How can I make ExifTool return the same value for the XML output it does for JSON and plain text?
When I add -b to the -X, there is no difference. My application uses XML to ingest data from ExifTool.


Phil Harvey

Perhaps I didn't explain clearly.  The JSON output removes trailing nulls by default.  The XML output doesn't.  You can make both outputs the same by adding -b to the JSON command, but that isn't the direction you want.  I can't say offhand why the XML preserves trailing nulls, but I'm sure there's a historical reason for this.  There is no way to change this currently.

- Phil

Edit:  Perhaps you could use the -api filter option to remove the nulls if you are only interested in textual tag values.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

As far as I understand this, the Nikon:location tag contains some 0s or binary data after the actual text (maybe padding?)

For the plain text and JSON output, ExifTool somehow deals with this and produces the clear text output.
For the XML output, ExifTool sees that the tag value contains non-printable characters and decides to output it as a Base64-encoded blob.
Which is what my application sees and ignores - since it has no clue what to do with the data.

The only way I see to deal with this, for now, is to add a special case in my application for this tag.
Decoding the Base64, stripping all non-printable characters and importing the rest. This should do.

Mac2

I've added a special case for this tag so the user is able to see the data.

I noticed that the tag value in the XMP starts with a \n (line-feed) which is not a valid character for Base64-encoded data.
Which is why the Base64-decoding lib I use rejected the data initially. I now strip all characters not allowed for Base64 (a-z|A-Z|0-9|+/=) from the data returned by ExifTool in the XMP before handing it over to the Base64 decoder. This works well.

I just assume that the difference in the returned value between plain text/JSON and XML has historical reasons and is as it is.