When I import data from ExifTool into my application (IMatch) and these files contain CAI/C2PA data, the import fails because of the XML produced by ExifTool is invalid.
I've used the official sample images from the project's GitHub: https://github.com/c2pa-org/public-testfiles/tree/main/image/jpeg
Both the Microsoft XML parser (Windows) and the XML-Parser/Validator in Visual Studio Code report this node:
<JSON:Author>
<rdf:Description et:id='author' et:table='JSON::Main'>
<et:desc>Author</et:desc>
<et:prt rdf:parseType='Resource'>
<JSON:@type>Person</JSON:@type>
<JSON:name>Adobe make_test</JSON:name>
</et:prt>
</rdf:Description>
</JSON:Author>
and complain about the JSON:@type as "Element or attribute do not match QName production: QName::=(NCName':')?NCName".
When I replace the node name with <JSON:at-type></JSON:at-type> before parsing it with MSXML, the error is gone and my software can ingest the data as usual.
I am having trouble trying to reproduce this.
Which specific sample image, and what ExifTool command line did you use?
- Phil
Edit: Ah, OK. I can see it now by adding the -struct option.
Got it. I'll add strict XML attribute name validation for structure elements in the next release (12.77), which should fix the issue.
- Phil
Sorry, I should have included the parameters I use to extract the data. Which indeed use -struct.
That's for looking into this. It's not urgent, but I expect to see more and more of images with embedded CAI data in the future.
There are more XML errors.
The data in the attached JPG file fails to load, with the error message
<CBOR:actions[1].action rdf:parseType='Resource'>
'A name contained an invalid character.'
Every node name with [1] is rejected as illegal.
I have created this JPG image for testing purposes with Stable Diffusion and used Photoshop to save it with Content Credentials enabled.
These are the parameters used to extract the metadata:
-overwrite_original
-charset
FILENAME=UTF8
-tagsfromfile
c:\images\001 copy.jpg
-all:all
-api
struct=2
-use
MWG
--preview:all
-@
v:\exiftool\arg_files\exif2xmp.args
--Exif:rating
-@
v:\exiftool\arg_files\iptc2xmp.args
-@
v:\exiftool\arg_files\gps2xmp.args
C:\temp\imt8B23A0F7-3976-422C-A096-2AA8F83C5D26.xmp
-execute
Thanks. The patch will force all structure field names to conform with the XML specification, removing all invalid characters. The only exception is that I will allow "xml" as the first 3 letters in a field name, which may not be strictly allowed by the spec, but I've tested and it still passes my XML validator (https://www.w3schools.com/xml/xml_validator.asp).
- phil
Excellent. Thank you.
Hopefully the Microsoft XML parser will accept these node names too.