ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: blue-j on August 05, 2024, 02:45:07 PM

Title: XML Parsing
Post by: blue-j on August 05, 2024, 02:45:07 PM
I am always learning, and often wrong.  I'm curious if parsing XML-based metadata might be much easier if a DTD or XSD were provided for each supported namespace?  I see these:

    https://metacpan.org/dist/XML-Validator-Schema (https://metacpan.org/dist/XML-Validator-Schema)

    https://metacpan.org/pod/XML::LibXML::Schema (https://metacpan.org/pod/XML::LibXML::Schema)

    https://xerces.apache.org/xerces-p/ (https://xerces.apache.org/xerces-p/)

From my amateur viewpoint, they look promising.  The only XML that gets recognized and parsed is that which has a schema document!  All others are ignored/unparsed.  Thoughts?

- J
Title: Re: XML Parsing
Post by: Phil Harvey on August 05, 2024, 08:06:35 PM
I haven't considered using XSD.  Most of the XML that ExifTool parses is proprietary anyway, so I would have to generate the XSD myself, and write the code to interpret the XML based on the XSD.  It just doesn't sound like much fun.

- Phil
Title: Re: XML Parsing
Post by: blue-j on August 07, 2024, 04:08:16 PM
Fair!  I have nothing but gratitude for your work.  : )

Upon more research, I've discovered there are a number of mature, respected utilities for converting multiple XML documents into a single XSD schema very quickly.  I'm installing a few now to test.  I don't have access to a Windows machine (macOS at home, Ubuntu on servers), but XMLSpy (https://www.altova.com/xmlspy-xml-editor) looks nice:

https://www.altova.com/blog/generating-a-schema-from-multiple-xml-instances/
https://www.altova.com/xmlspy-xml-editor

Not cheap though.  I'm currently testing Apache XMLBeans libraries (https://xmlbeans.apache.org/) first, and I see that Microsoft also has a very well-regarded tool (https://learn.microsoft.com/en-us/dotnet/standard/serialization/xml-schema-definition-tool-xsd-exe) that I've read can work with Mono on macOS.  Will keep you apprised!

- J


Title: Re: XML Parsing
Post by: blue-j on August 07, 2024, 05:56:11 PM
Wow.  OK.  so, i installed Apache Ant (https://ant.apache.org/) and insured i had JDK 1.8, then installed Apache XMLBeans (https://xmlbeans.apache.org/).  i then used the command line tool inst2xsd to assess a folder of XML documents and emit a schema.  (i am leaving out all the PATH party).  (i also installed Apache Log4j (https://logging.apache.org/log4j/2.x/) for logging, which is optional.)

Because i was testing with Capture One Settings (.cos) files, and inst2xsd (https://xmlbeans.apache.org/guide/Tools.html#inst2xsd) only processes files with the .xml extension, i wrote a command that pipes renaming them and then renaming them back.  this bash command only works on the current directory, and uses the defaults:

XML_DIR=$(pwd); for file in $XML_DIR/*; do mv "$file" "$file.xml"; done && inst2xsd $XML_DIR/*.xml && for file in $XML_DIR/*.xml; do mv "$file" "${file%.xml}"; done
seems to work without any issues.  this was somewhat helpful: link (https://www.infoworld.com/article/2162353/generate-xml-schemas-from-xml-with-inst2xsd.html)

i then validated:

XML_DIR=$(pwd); for file in $XML_DIR/*; do mv "$file" "$file.xml"; done && inst2xsd -validate $XML_DIR/*.xml && for file in $XML_DIR/*.xml; do mv "$file" "${file%.xml}"; done
and achieved total joy, as far as I can tell?

the entire installation and test took a couple hours. i'll keep researching.

- J