Difference in XML output between 11.54 and 11.55

Started by StarGeek, August 03, 2019, 11:21:30 AM

Previous topic - Next topic

StarGeek

I'm not sure if this is an actual bug or just a normal change, but I thought I'd bring it up.

The XML output using this command
exiftool -G -X -D -t -q -q -l FILE_NAME > output.xml
changes from exiftool version 11.53 to 11.54 11.54 to 11.55.  Three items in my test file in the Composite group change from
<Composite:ImageSize>
  <rdf:Description et:id='ImageSize' et:table='Composite'>
   <et:desc>Image Size</et:desc>
   <et:prt>100x100</et:prt>
   <et:val>100 100</et:val>
  </rdf:Description>
</Composite:ImageSize>
<Composite:Megapixels>
  <rdf:Description et:id='Megapixels' et:table='Composite'>
   <et:desc>Megapixels</et:desc>
   <et:prt>0.010</et:prt>
   <et:val>0.01</et:val>
  </rdf:Description>
</Composite:Megapixels>
<Composite:SubSecDateTimeOriginal>
  <rdf:Description et:id='SubSecDateTimeOriginal' et:table='Composite'>
   <et:desc>Date/Time Original</et:desc>
   <et:prt>2019:07:30 11:54:18-07:00</et:prt>
  </rdf:Description>
</Composite:SubSecDateTimeOriginal>


to this, adding Exif:: to the tag name, even though they're not part of the exif group.
<Composite:ImageSize>
  <rdf:Description et:id='Exif::ImageSize' et:table='Composite'>
   <et:desc>Image Size</et:desc>
   <et:prt>100x100</et:prt>
   <et:val>100 100</et:val>
  </rdf:Description>
</Composite:ImageSize>
<Composite:Megapixels>
  <rdf:Description et:id='Exif::Megapixels' et:table='Composite'>
   <et:desc>Megapixels</et:desc>
   <et:prt>0.010</et:prt>
   <et:val>0.01</et:val>
  </rdf:Description>
</Composite:Megapixels>
<Composite:SubSecDateTimeOriginal>
  <rdf:Description et:id='Exif::SubSecDateTimeOriginal' et:table='Composite'>
   <et:desc>Date/Time Original</et:desc>
   <et:prt>2019:07:30 11:54:18-07:00</et:prt>
  </rdf:Description>
</Composite:SubSecDateTimeOriginal>


The file used is attached and has only some minimal data.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

Thanks.  I'll look into this and post back here when I know more.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Hi StarGeek,

I've looked into this.  The relevant change was made in 11.55, not 11.54:

July 12, 2019 - Version 11.55
  - Changed internal handling of Composite tag ID's to include module name


So this side-effect makes sense.

True the Composite tag isn't part of the "EXIF" group, but this tag was defined in the "Exif" module so its ID reflects this now.  (aside: I have no idea why I capitalized "EXIF" as a group name whereas the module name I used was mixed-case "Exif" -- this has caused me a headache or two with failed tests on case-sensitive filesystems when I get the module name wrong.)

I hope this doesn't cause too much confusion, but the change helps avoid conflicts between same-named Composite tags defined in different modules.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Phil Harvey on August 06, 2019, 08:06:51 AM
I've looked into this.  The relevant change was made in 11.55, not 11.54:

Yep, my mistake during posting.  My testing was done between 11.54 and 11.55, but made a mistake while posting this.  Editing previous posts and titles.

The reason I brought it up is because this update causes a strange problem in IMatch 5 and this was the only Diff between the two files.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

Can you provide any details about the "strange problem"?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Imatch doesn't read location data (not GPS), such as State, City, Location, etc when exiftool is updated to 11.55 or later (including 11.61), though it seems to read all the other data correctly.  This appears to be the only difference in the XML output between the two versions.  Which is why I'm at a loss to explain the problem, as the location data is exactly the same.

Here's the thread on the Photools site.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

Odd, yes.

Thanks for the link, and thanks for looking into this.

I can't imagine why the location data would be affected.  (aside: I could see the Composite tags being affected, but only if iMatch somehow maintains a separate database of ExifTool tags (ie. the output of the -listx command), because this is the only place where the "et:id" tag ID's in the -X output would be useful.)

There seems to be something else going on here that we aren't seeing.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Hi, I've just found this thread. Sorry. I've already created by own thread reporting this:

https://exiftool.org/forum/index.php/topic,10364.msg54565.html#msg54565

IMatch indeed uses a database and indexes tag data extracted from files via the tag id (and other things).
IMatch also uses internal lookup tables to quickly find tag data "by numbers" and these lookup tables use hard-coded tag ids ("Country" instead of "MWG::Country").

The changed id for the standard MWG tags breaks this mechanism. And affects all databases currently in use (when they upgrade to an ExifTool version newer than 11.54).
I always ship the latest ExifTool version available when I release a new update so currently only users who manually update ExifTool are affected. I need to find a solution for this before I can ship an update.

I did not find anything in the release note about this change of ids. I'm sure there is a good reason for this?

Phil Harvey

#8
In an earlier post I pointed out the relevant entry in the change log:

July 12, 2019 - Version 11.55
  - Changed internal handling of Composite tag ID's to include module name


Before I added the module name to the Composite tag ID's, same-named Composite tags got a number added to them to avoid conflicts.  Unfortunately this number was different depending on the order in which the modules were loaded, so it was not possible to know which tag was which.  Now with the module name added, the ID's are unique and constant.

Since the MWG tags are implemented as Composite tags, this change affected their ID's.  Note that the "et:id" entries are internal ExifTool tag ID's, and only meant as a lookup in the -listx output.  True, these are the physical ID numbers/strings for most tags, but Composite tags don't have a physical ID since they are based on multiple source tags.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

This is all true and correct. This is not really a problem on the ExifTool side, just a bit inconvenient on my side.

For explanation: My software works as follows:

It imports the output of -listx into a set of database tables, for tags, groups, names. And data.
Data extracted from files is stored and linked to the corresponding tag via a numerical id.

When a new ExifTool version is installed, the output of listx is analyzed and compared with the data already in the database. New tags are added to the database.
This is standard procedure and does not affect existing data in the database. A typical user manages between 25 and 100 million tag values in a database...

The problem with the Composite id change is that it breaks the connection between, for example, "Composite.Country" after the new ExifTool version is installed.
Because the listx contains now some Composite tags with ids including the "MWG:" prefix. These will be added as new tags to the database, without connection to the "old" Composite country tags.
The Composite tag with the id "MWG::Country" is not the same as the Composite tag with the id "Country".

From then on, the data for new files processed with ExifTool will be linked to Composite MWG:* and the old data is still linked to the old Composite tags. And that's a problem of course.

To handle this, I need to add a database migration step for my next product version which first renames the existing Composite tag entries in the database so they use the new id format introduced with ET newer than 11.54. And then I can run the normal ingest/merge process for the new listx output.
This "folds" the old data and the new data together in existing databases. I also need to update some explicit references to tag names I use in my software for performance reasons.

This is not hard to do, but I have to do it before shipping with an updated ExitTool version. Else there will be data loss.
When I'm not mistaken, this  affects only these tags (new id format):

MWG::City
MWG::Copyright
MWG::Country
MWG::CreateDate
MWG::Creator
MWG::DateTimeOriginal
MWG::Description
MWG::Keywords
MWG::Location
MWG::ModifyDate
MWG::Orientation
MWG::Rating
MWG::State


Is this correct?

Phil Harvey

Thanks for the explanation, and sorry for the trouble, but this was a loophole in the Composite tag logic that really needed filling.

This is the complete list of Composite MWG tags:

> exiftool -list -MWG:all
Available MWG tags:
  City Copyright Country CreateDate Creator DateTimeOriginal Description
  Keywords Location ModifyDate Orientation Rating State


So your list looks complete.  However, not only MWG Composite tags are affected.  Other Composite tag ID's will now be prefixed by a module name as well:

> exiftool -list -Composite:all
Available Composite tags:
  AdvancedSceneMode Aperture AudioBitrate AutoFocus AvgBitrate BaseName
  BigImage BlueBalance CDDBDiscPlayTime CDDBDiscTracks CFAPattern
  CircleOfConfusion ConditionalFEC DOF DateCreated DateTimeCreated
  DateTimeOriginal DepthMapTiff DigitalCreationDateTime DigitalZoom DriveMode
  Duration ExtenderStatus FOV FileExtension FileNumber FileTypeDescription
  Flash FlashType FocalLength35efl FocusDistance FocusDistance2 GPSAltitude
  GPSAltitudeRef GPSDateTime GPSDestLatitude GPSDestLatitudeRef
  GPSDestLongitude GPSDestLongitudeRef GPSLatitude GPSLatitudeRef GPSLongitude
  GPSLongitudeRef GPSPosition HyperfocalDistance IDCPreviewImage ISO
  ImageHeight ImageSize ImageWidth Lens Lens35efl LensID LensSpec LightValue
  MPImage Megapixels OriginalDecisionData PeakSpectralSensitivity
  PhysicalImageSize PreviewImage PreviewImageSize RedBalance RedEyeReduction
  RicohPitch RicohRoll Rotation RunTimeSincePowerUp ScaleFactor35efl
  ShootingMode ShutterCurtainHack ShutterSpeed SubSecCreateDate
  SubSecDateTimeOriginal SubSecModifyDate ThumbnailTIFF VolumeSize WB_RGBLevels
  WB_RGGBLevels ZoomedPreviewImage


(Note that the Composite list above doesn't include MWG tags since MWG wasn't mentioned on the command line).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Ouch. That's way more complex than I had anticipated.

QuoteOther Composite tag ID's will now be prefixed by a module name as well:

Is there a rule to figure out which module name is used? Some kind of table I can query or a command line?
I need to update each tag id in the database in an automated way. And this must be safe because it runs on users computers, with live databases.

Or maybe a simple algorithm like

Run ExifTool to get listx output
For all tags in the Composite group
If the tag id has a prefix, strip it (e.g. "MWG::City" => "City")
Lookup the tag by this id in the database.
If found, update the id of the tag in the database with the prefix used in the listx output: "City" => "MWG::City"

would do...

Phil Harvey

Your algorithm sounds potentially dangerous.  You should only update tags which belong to the "Composite" table (ie. table name='Composite' in the -listx output).   For example, there is also an XMP tag with an ID of "City".

- Phil

Edit:  But I still don't understand why a different tag ID is causing you problems.  Could you use the names of the Composite tags instead of the IDs?  The ID for a Composite tag is arbitrary, and has no physical significance.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

I would restrict this to the Composite group of course.
I use "tag keys" made of the group\tag id\tag name for a 10 years and there are thousands of databases out there. This scheme is basically unchangeable (or only with a massive amount of work).
I did not anticipate that the tag ids may change when I designed this many years ago. Especially not the composite tags for standard EXIF/MWG data.

I looked at various output variants of listx but I did not find one that tells me where the prefix comes from...
For example, "Nikon::LensID". Is there a way to know where the Nikon:: prefix comes from, for this tag?

Phil Harvey

The "Nikon::" indicates that this Composite tag was defined in the Nikon.pm module.  But I guess I don't know what you are asking.  Here, for example, is a comparison of the ID's for the Composite LensID tag between 11.54 and 11.55:

> Image-ExifTool-11.54/exiftool -listx -composite:all | grep LensID
<tag id='LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='LensID-2' name='LensID' type='?' writable='false'>
<tag id='LensID-3' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='LensID-4' name='LensID' type='?' writable='false' g2='Camera'>

> Image-ExifTool-11.55/exiftool -listx -composite:all | grep LensID
<tag id='Exif::LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='Nikon::LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='Ricoh::LensID' name='LensID' type='?' writable='false' g2='Camera'>
<tag id='XMP::LensID' name='LensID' type='?' writable='false'>


How did you deal with the "-2", "-3", etc suffixes of 11.54?   (Note that these suffixes would change depending on the order the modules were loaded, which is the reason for the change.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

QuoteHow did you deal with the "-2", "-3", etc suffixes of 11.54?   (Note that these suffixes would change depending on the order the modules were loaded, which is the reason for the change.)

Rather simple. My databases stores the data under a tag "key" made up composed from the group name, the id and the tag. To make each data value uniquely identifiable.

When I import data for a file, each value delivered by ExifTool is checked.
If the tag key (group|id|tag) exists in the database, they data linked to the existing tag. Else a new tag record is created and the data is linked to this tag.

Since ExifTool now delivers data with the key

Composite|Nikon::LensID|LensID

instead of

Composite|LensID|LensID,

my database creates a new tag, without knowing that this is just a new name for an existing tag.
Data for new files will be associated with the new tag. Data for files processed previously will remain associated to the original tag. Although both are the same, just with different ids.

I need to 'fold' the old tag and new tag together in the databases. Basically linking all data associated with "Composite|City|City" to "Composite|MWG::City|City" and all will be well.

This can be done with a simple update statement for the database.
I only need to find a way to know how the the id has changed. Or I need to make a manual "old tag key => new tag key" bridging table in the code.

I also have to deal with user presets, templates, layouts etc. which use the old tag keys, e.g. in variables. A variable like {File.MD.Composite.City.City} will no longer work. Or at least not anymore for databases migrated to the new ExifTool version. I need to communicate this to my users so they know that they have to check.

Not many users use Composite tags directly, though.

Luckily I have a "short keys" mechanism in place already which allows users to write {File.MD.city} and IMatch internally maps this to the actual tag. So I can change this in a central place.
But this does not work everywhere and I'm still figuring out how many things in my software this tag id change breaks...



Phil Harvey

I see.  For your purposes then it sounds like you don't care, for example, which Composite LensID tag was generated.  A quick patch for you could be to set the id to the tag name for all Composite tags when parsing the -listx or -X output.  There should only be 2 places in your code which would need to be changed.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

I did ponder this. It would be an easy solution.
But then the 'real' tag ids used by ExifTool and what my software uses would no longer match. And I try to keep this layer as close to ExifTool as possible. This is better, long-term.

I will create a manual old id => new id bridging table manually from listx outputs and then write some code which uses this to migrate existing databases on first open. From then on my software will only use the new tag id throughout.

Most of the work for this is done by running

exiftool -s1 -listx -composite:all

for both versions and a diff, I guess.

I will use the results of this also to write some documentation for my users. To help them figure out if they are affected by the name changes and how fix.

Phil Harvey

Sounds good.

Quote from: Mac2 on August 12, 2019, 12:35:25 PM
Most of the work for this is done by running

exiftool -s1 -listx -composite:all

for both versions and a diff, I guess.

Be sure to add -use MWG to this command to load the optional MWG Composite tags.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Right. Thanks for reminding me about the -MWG. I use them in my code normally.

Just to be sure: The tag with the ids
id            name
Duration-2    Duration
Duration-3    Duration


would also become AIFF::Duration-2 and AIFF::Duration-3 and similar for other tags which a numbered that way?

Phil Harvey

Quote from: Mac2 on August 12, 2019, 12:53:08 PM
Right. Thanks for reminding me about the -MWG. I use them in my code normally.

Just to be sure: The tag with the ids
id            name
Duration-2    Duration
Duration-3    Duration


would also become AIFF::Duration-2 and AIFF::Duration-3 and similar for other tags which a numbered that way?

No.  This is what happens:

> Image-ExifTool-11.54/exiftool -listx -composite:all -s1 | grep Duration
<tag id='Duration' name='Duration' type='?' writable='false'/>
<tag id='Duration-2' name='Duration' type='?' writable='false'/>
<tag id='Duration-3' name='Duration' type='?' writable='false' g2='Audio'/>
<tag id='Duration-4' name='Duration' type='?' writable='false' g2='Video'/>
<tag id='Duration-5' name='Duration' type='?' writable='false'/>
<tag id='Duration-6' name='Duration' type='?' writable='false'/>
<tag id='Duration-7' name='Duration' type='?' writable='false'/>

> Image-ExifTool-11.55/exiftool -listx -composite:all -s1 | grep Duration
<tag id='AIFF::Duration' name='Duration' type='?' writable='false'/>
<tag id='APE::Duration' name='Duration' type='?' writable='false' g2='Audio'/>
<tag id='FLAC::Duration' name='Duration' type='?' writable='false'/>
<tag id='MPEG::Duration' name='Duration' type='?' writable='false' g2='Video'/>
<tag id='RIFF::Duration' name='Duration' type='?' writable='false'/>
<tag id='RIFF::Duration2' name='Duration' type='?' writable='false'/>
<tag id='Vorbis::Duration' name='Duration' type='?' writable='false'/>


But your guess is as good as mine as to which Duration tag from the old version maps into which one from the new version because as I said the suffix numbers depend on the order in which the modules were loaded.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Ah, that's what these numbers mean...Mhm.

I use exiftool.exe for Windows.
Is there a determined sequence in which these 'modules' are loaded?
Or does ExifTool load them in the order in which it encounters files of a specific format?
Then it would be random and no way to tell which module has created Duration-2 or GPSAltitude-2  ...

Phil Harvey

Quote from: Mac2 on August 12, 2019, 01:12:12 PM
Or does ExifTool load them in the order in which it encounters files of a specific format?
Then it would be random and no way to tell which module has created Duration-2 or GPSAltitude-2  ...

Exactly.  That's the reason that this was fixed.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Oh.  I just realized that earlier you may have been trying to point out that there are two RIFF Duration tags (in version 11.55 and later).  One of them has an ID of RIFF::Duration2, but in this case the suffix is permanent, and not generated by the loader.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Such things just happen.

I'll bridge all existing Composite tag ids to the new format in my next release.
I'll leave the dynamic ids (with the -nn suffixes) alone. I doubt that any user has used such tags suffixes for anything.
The old data will be still accessible with the original tag id (with suffix) if really needed.

Then a FAT entry in the release notes and an explicit popup message in the software. That should do it.

Phil Harvey

Sorry about this, but ExifTool 12.03 has another change to the Composite tag ID's which may affect you.  I've changed the "::" to "-" in all of the Composite tag ID's to make them conform more easily to group names since they are now exposed via the new family 7 groups.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Mac2

Oh, no! That's bad news indeed... :)

I'm still getting bug reports caused by the old name changes. So many users have used Composite tags in their layouts, staistics, apps, exports, metadata templates and whatnot.
And these all break when ExifTool changes tag names. And it sometimes takes weeks or months before a user recognizes that something no longer works, and then opens a bug report.

I have a set of standard tags which I maintain in a central configuration file. Basically these tags map short names like "title" to a corresponding ExifTool tag name. Mostly XMP, but also some Composite tags.
As long as users use these standard tags, a renamed Composite tag only requires me to update the configuration file and then ship an update. No problem.
But when they use composite tags directly, things will break. And that's just what users do. I did so too, in the past. Never anticipated tag names could change. My error.

I perfectly understand that you want to 'cleanup' ExifTool or make changes to incorporate new things.

My application just allows users to do amazing things with metadata delivered by ExifTool, much more than can be done in other applications. Unfortunately, that also means that the user base may be hit hit harder by breaking changes.

Thanks for the warning. When I have a free slot I will download the 12.03 and implement all required changes in my software.

Phil Harvey

The tag names are not changing.  It is the tag ID's.  The users should be interacting with the tag names.

But if I understand correctly your are using the tag ID's from the -listx output in some sort of database, and so I am giving you this heads up.

- Phil

Edit.  I see you explained all this above.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).