ExifTool Forum

ExifTool => Developers => Topic started by: Mac2 on February 07, 2024, 10:11:14 AM

Title: XML parser error for images containing [JSON] / [JUMBF] tags
Post by: Mac2 on February 07, 2024, 10:11:14 AM
When I import data from ExifTool into my application (IMatch) and these files contain CAI/C2PA data, the import fails because of the XML produced by ExifTool is invalid.

I've used the official sample images from the project's GitHub: https://github.com/c2pa-org/public-testfiles/tree/main/image/jpeg

Both the Microsoft XML parser (Windows) and the XML-Parser/Validator in Visual Studio Code report this node:

<JSON:Author>
  <rdf:Description et:id='author' et:table='JSON::Main'>
   <et:desc>Author</et:desc>
   <et:prt rdf:parseType='Resource'>
    <JSON:@type>Person</JSON:@type>
    <JSON:name>Adobe make_test</JSON:name>
   </et:prt>
  </rdf:Description>
 </JSON:Author>

and complain about the JSON:@type as "Element or attribute do not match QName production: QName::=(NCName':')?NCName".

When I replace the node name with <JSON:at-type></JSON:at-type> before parsing it with MSXML, the error is gone and my software can ingest the data as usual.
Title: Re: XML parser error for images containing [JSON] / [JUMBF] tags
Post by: Phil Harvey on February 07, 2024, 11:11:44 AM
I am having trouble trying to reproduce this.

Which specific sample image, and what ExifTool command line did you use?

- Phil

Edit: Ah, OK.  I can see it now by adding the -struct option.
Title: Re: XML parser error for images containing [JSON] / [JUMBF] tags
Post by: Phil Harvey on February 07, 2024, 11:55:15 AM
Got it.  I'll add strict XML attribute name validation for structure elements in the next release (12.77), which should fix the issue.

- Phil
Title: Re: XML parser error for images containing [JSON] / [JUMBF] tags
Post by: Mac2 on February 07, 2024, 12:12:37 PM
Sorry, I should have included the parameters I use to extract the data. Which indeed use -struct.

That's for looking into this. It's not urgent, but I expect to see more and more of images with embedded CAI data in the future.
Title: Re: XML parser error for images containing [JSON] / [JUMBF] tags
Post by: Mac2 on February 07, 2024, 12:47:15 PM
There are more XML errors.
The data in the attached JPG file fails to load, with the error message
<CBOR:actions[1].action rdf:parseType='Resource'>
'A name contained an invalid character.'
Every node name with [1] is rejected as illegal.

I have created this JPG image for testing purposes with Stable Diffusion and used Photoshop to save it with Content Credentials enabled.

These are the parameters used to extract the metadata:

-overwrite_original
-charset
FILENAME=UTF8
-tagsfromfile
c:\images\001 copy.jpg
-all:all
-api
struct=2
-use
MWG
--preview:all
-@
v:\exiftool\arg_files\exif2xmp.args
--Exif:rating
-@
v:\exiftool\arg_files\iptc2xmp.args
-@
v:\exiftool\arg_files\gps2xmp.args
C:\temp\imt8B23A0F7-3976-422C-A096-2AA8F83C5D26.xmp
-execute


Title: Re: XML parser error for images containing [JSON] / [JUMBF] tags
Post by: Phil Harvey on February 07, 2024, 02:42:55 PM
Thanks.  The patch will force all structure field names to conform with the XML specification, removing all invalid characters.  The only exception is that I will allow "xml" as the first 3 letters in a field name, which may not be strictly allowed by the spec, but I've tested and it still passes my XML validator (https://www.w3schools.com/xml/xml_validator.asp).

- phil
Title: Re: XML parser error for images containing [JSON] / [JUMBF] tags
Post by: Mac2 on February 08, 2024, 03:09:45 AM
Excellent. Thank you.
Hopefully the Microsoft XML parser will accept these node names too.