Difference in XML output between 11.54 and 11.55

Started by StarGeek, August 03, 2019, 11:21:30 AM

Previous topic - Next topic

StarGeek

I'm not sure if this is an actual bug or just a normal change, but I thought I'd bring it up.

The XML output using this command
exiftool -G -X -D -t -q -q -l FILE_NAME > output.xml
changes from exiftool version 11.53 to 11.54 11.54 to 11.55.  Three items in my test file in the Composite group change from
<Composite:ImageSize>
  <rdf:Description et:id='ImageSize' et:table='Composite'>
   <et:desc>Image Size</et:desc>
   <et:prt>100x100</et:prt>
   <et:val>100 100</et:val>
  </rdf:Description>
</Composite:ImageSize>
<Composite:Megapixels>
  <rdf:Description et:id='Megapixels' et:table='Composite'>
   <et:desc>Megapixels</et:desc>
   <et:prt>0.010</et:prt>
   <et:val>0.01</et:val>
  </rdf:Description>
</Composite:Megapixels>
<Composite:SubSecDateTimeOriginal>
  <rdf:Description et:id='SubSecDateTimeOriginal' et:table='Composite'>
   <et:desc>Date/Time Original</et:desc>
   <et:prt>2019:07:30 11:54:18-07:00</et:prt>
  </rdf:Description>
</Composite:SubSecDateTimeOriginal>


to this, adding Exif:: to the tag name, even though they're not part of the exif group.
<Composite:ImageSize>
  <rdf:Description et:id='Exif::ImageSize' et:table='Composite'>
   <et:desc>Image Size</et:desc>
   <et:prt>100x100</et:prt>
   <et:val>100 100</et:val>
  </rdf:Description>
</Composite:ImageSize>
<Composite:Megapixels>
  <rdf:Description et:id='Exif::Megapixels' et:table='Composite'>
   <et:desc>Megapixels</et:desc>
   <et:prt>0.010</et:prt>
   <et:val>0.01</et:val>
  </rdf:Description>
</Composite:Megapixels>
<Composite:SubSecDateTimeOriginal>
  <rdf:Description et:id='Exif::SubSecDateTimeOriginal' et:table='Composite'>
   <et:desc>Date/Time Original</et:desc>
   <et:prt>2019:07:30 11:54:18-07:00</et:prt>
  </rdf:Description>
</Composite:SubSecDateTimeOriginal>


The file used is attached and has only some minimal data.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Thanks.  I'll look into this and post back here when I know more.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Hi StarGeek,

I've looked into this.  The relevant change was made in 11.55, not 11.54:

July 12, 2019 - Version 11.55
  - Changed internal handling of Composite tag ID's to include module name


So this side-effect makes sense.

True the Composite tag isn't part of the "EXIF" group, but this tag was defined in the "Exif" module so its ID reflects this now.  (aside: I have no idea why I capitalized "EXIF" as a group name whereas the module name I used was mixed-case "Exif" -- this has caused me a headache or two with failed tests on case-sensitive filesystems when I get the module name wrong.)

I hope this doesn't cause too much confusion, but the change helps avoid conflicts between same-named Composite tags defined in different modules.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Phil Harvey on August 06, 2019, 08:06:51 AM
I've looked into this.  The relevant change was made in 11.55, not 11.54:

Yep, my mistake during posting.  My testing was done between 11.54 and 11.55, but made a mistake while posting this.  Editing previous posts and titles.

The reason I brought it up is because this update causes a strange problem in IMatch 5 and this was the only Diff between the two files.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Can you provide any details about the "strange problem"?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Imatch doesn't read location data (not GPS), such as State, City, Location, etc when exiftool is updated to 11.55 or later (including 11.61), though it seems to read all the other data correctly.  This appears to be the only difference in the XML output between the two versions.  Which is why I'm at a loss to explain the problem, as the location data is exactly the same.

Here's the thread on the Photools site.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Odd, yes.

Thanks for the link, and thanks for looking into this.

I can't imagine why the location data would be affected.  (aside: I could see the Composite tags being affected, but only if iMatch somehow maintains a separate database of ExifTool tags (ie. the output of the -listx command), because this is the only place where the "et:id" tag ID's in the -X output would be useful.)

There seems to be something else going on here that we aren't seeing.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, I've just found this thread. Sorry. I've already created by own thread reporting this:

https://exiftool.org/forum/index.php/topic,10364.msg54565.html#msg54565

IMatch indeed uses a database and indexes tag data extracted from files via the tag id (and other things).
IMatch also uses internal lookup tables to quickly find tag data "by numbers" and these lookup tables use hard-coded tag ids ("Country" instead of "MWG::Country").

The changed id for the standard MWG tags breaks this mechanism. And affects all databases currently in use (when they upgrade to an ExifTool version newer than 11.54).
I always ship the latest ExifTool version available when I release a new update so currently only users who manually update ExifTool are affected. I need to find a solution for this before I can ship an update.

I did not find anything in the release note about this change of ids. I'm sure there is a good reason for this?

Phil Harvey

#8
In an earlier post I pointed out the relevant entry in the change log:

July 12, 2019 - Version 11.55
  - Changed internal handling of Composite tag ID's to include module name


Before I added the module name to the Composite tag ID's, same-named Composite tags got a number added to them to avoid conflicts.  Unfortunately this number was different depending on the order in which the modules were loaded, so it was not possible to know which tag was which.  Now with the module name added, the ID's are unique and constant.

Since the MWG tags are implemented as Composite tags, this change affected their ID's.  Note that the "et:id" entries are internal ExifTool tag ID's, and only meant as a lookup in the -listx output.  True, these are the physical ID numbers/strings for most tags, but Composite tags don't have a physical ID since they are based on multiple source tags.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

This is all true and correct. This is not really a problem on the ExifTool side, just a bit inconvenient on my side.

For explanation: My software works as follows:

It imports the output of -listx into a set of database tables, for tags, groups, names. And data.
Data extracted from files is stored and linked to the corresponding tag via a numerical id.

When a new ExifTool version is installed, the output of listx is analyzed and compared with the data already in the database. New tags are added to the database.
This is standard procedure and does not affect existing data in the database. A typical user manages between 25 and 100 million tag values in a database...

The problem with the Composite id change is that it breaks the connection between, for example, "Composite.Country" after the new ExifTool version is installed.
Because the listx contains now some Composite tags with ids including the "MWG:" prefix. These will be added as new tags to the database, without connection to the "old" Composite country tags.
The Composite tag with the id "MWG::Country" is not the same as the Composite tag with the id "Country".

From then on, the data for new files processed with ExifTool will be linked to Composite MWG:* and the old data is still linked to the old Composite tags. And that's a problem of course.

To handle this, I need to add a database migration step for my next product version which first renames the existing Composite tag entries in the database so they use the new id format introduced with ET newer than 11.54. And then I can run the normal ingest/merge process for the new listx output.
This "folds" the old data and the new data together in existing databases. I also need to update some explicit references to tag names I use in my software for performance reasons.

This is not hard to do, but I have to do it before shipping with an updated ExitTool version. Else there will be data loss.
When I'm not mistaken, this  affects only these tags (new id format):

MWG::City
MWG::Copyright
MWG::Country
MWG::CreateDate
MWG::Creator
MWG::DateTimeOriginal
MWG::Description
MWG::Keywords
MWG::Location
MWG::ModifyDate
MWG::Orientation
MWG::Rating
MWG::State


Is this correct?

Phil Harvey

Thanks for the explanation, and sorry for the trouble, but this was a loophole in the Composite tag logic that really needed filling.

This is the complete list of Composite MWG tags:

> exiftool -list -MWG:all
Available MWG tags:
  City Copyright Country CreateDate Creator DateTimeOriginal Description
  Keywords Location ModifyDate Orientation Rating State


So your list looks complete.  However, not only MWG Composite tags are affected.  Other Composite tag ID's will now be prefixed by a module name as well:

> exiftool -list -Composite:all
Available Composite tags:
  AdvancedSceneMode Aperture AudioBitrate AutoFocus AvgBitrate BaseName
  BigImage BlueBalance CDDBDiscPlayTime CDDBDiscTracks CFAPattern
  CircleOfConfusion ConditionalFEC DOF DateCreated DateTimeCreated
  DateTimeOriginal DepthMapTiff DigitalCreationDateTime DigitalZoom DriveMode
  Duration ExtenderStatus FOV FileExtension FileNumber FileTypeDescription
  Flash FlashType FocalLength35efl FocusDistance FocusDistance2 GPSAltitude
  GPSAltitudeRef GPSDateTime GPSDestLatitude GPSDestLatitudeRef
  GPSDestLongitude GPSDestLongitudeRef GPSLatitude GPSLatitudeRef GPSLongitude
  GPSLongitudeRef GPSPosition HyperfocalDistance IDCPreviewImage ISO
  ImageHeight ImageSize ImageWidth Lens Lens35efl LensID LensSpec LightValue
  MPImage Megapixels OriginalDecisionData PeakSpectralSensitivity
  PhysicalImageSize PreviewImage PreviewImageSize RedBalance RedEyeReduction
  RicohPitch RicohRoll Rotation RunTimeSincePowerUp ScaleFactor35efl
  ShootingMode ShutterCurtainHack ShutterSpeed SubSecCreateDate
  SubSecDateTimeOriginal SubSecModifyDate ThumbnailTIFF VolumeSize WB_RGBLevels
  WB_RGGBLevels ZoomedPreviewImage


(Note that the Composite list above doesn't include MWG tags since MWG wasn't mentioned on the command line).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Ouch. That's way more complex than I had anticipated.

QuoteOther Composite tag ID's will now be prefixed by a module name as well:

Is there a rule to figure out which module name is used? Some kind of table I can query or a command line?
I need to update each tag id in the database in an automated way. And this must be safe because it runs on users computers, with live databases.

Or maybe a simple algorithm like

Run ExifTool to get listx output
For all tags in the Composite group
If the tag id has a prefix, strip it (e.g. "MWG::City" => "City")
Lookup the tag by this id in the database.
If found, update the id of the tag in the database with the prefix used in the listx output: "City" => "MWG::City"

would do...

Phil Harvey

Your algorithm sounds potentially dangerous.  You should only update tags which belong to the "Composite" table (ie. table name='Composite' in the -listx output).   For example, there is also an XMP tag with an ID of "City".

- Phil

Edit:  But I still don't understand why a different tag ID is causing you problems.  Could you use the names of the Composite tags instead of the IDs?  The ID for a Composite tag is arbitrary, and has no physical significance.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

I would restrict this to the Composite group of course.
I use "tag keys" made of the group\tag id\tag name for a 10 years and there are thousands of databases out there. This scheme is basically unchangeable (or only with a massive amount of work).
I did not anticipate that the tag ids may change when I designed this many years ago. Especially not the composite tags for standard EXIF/MWG data.

I looked at various output variants of listx but I did not find one that tells me where the prefix comes from...
For example, "Nikon::LensID". Is there a way to know where the Nikon:: prefix comes from, for this tag?

Phil Harvey

The "Nikon::" indicates that this Composite tag was defined in the Nikon.pm module.  But I guess I don't know what you are asking.  Here, for example, is a comparison of the ID's for the Composite LensID tag between 11.54 and 11.55:

> Image-ExifTool-11.54/exiftool -listx -composite:all | grep LensID
<tag id='LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='LensID-2' name='LensID' type='?' writable='false'>
<tag id='LensID-3' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='LensID-4' name='LensID' type='?' writable='false' g2='Camera'>

> Image-ExifTool-11.55/exiftool -listx -composite:all | grep LensID
<tag id='Exif::LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='Nikon::LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='Ricoh::LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='XMP::LensID' name='LensID' type='?' writable='false'>


How did you deal with the "-2", "-3", etc suffixes of 11.54?   (Note that these suffixes would change depending on the order the modules were loaded, which is the reason for the change.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).