Feature request : hex output and vary limits

Started by greybeard, November 02, 2024, 09:02:58 AM

Previous topic - Next topic

greybeard

Following on from some of my recent posts where I'm trying to extract as much information as possible from a metadata tag using exiftool - I'd like to make the following feature requests:

A) add an extra parameter (similar to the existing -n) which displays the original hex format of the tag value (similar to the way the hex output is displayed with the verbose output).

This is particularly important with unknown tags as its often much easier to decode a hex string than a decimal value - especially for those fields that have more than one value.

An example would be 4 byte int32u field which currently shows as 408948800 but the meaning becomes more obvious when shown as 0x18601040 i.e. it contains two values for height and width combined within a single tag.

Ideally this would also include the byte length, field type and number of component fields (again the same as with the verbose output) but I'd be happy with just the hex output.

I've spent a lot of time trying to match the verbose output to other outputs but there are too many inconsistencies and I haven't been able to create an algorithm to transform one to the other which works in all cases.

B) provide some way of changing the string limits

Strings are frequently chopped (or snipped) - the limit varies (such as 60, 96 and 2048 characters) but there doesn't appear to be any way for the end user to control this limit.

Mostly this doesn't matter - most tag values are below the limit and many of the longer values are essentially useless (such as an image).

But there are many fields - such as those related to AF where there are multiple components, each component may be short but the combination exceeds the limit.

To be meaningful its necessary to access the complete string.

Phil Harvey

#1
A) I could perhaps add this to the -json -l output for selected tags (eg. EXIF and maker notes).  Would that work for you?

B) You can set this via the API LimitLongValues option in output formats other than -v and -htmldump.  For -v -and -htmldump, adding more -v's generally does this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

Quote from: Phil Harvey on November 04, 2024, 04:00:35 PMA) I could perhaps add this to the -json -l output for selected tags (eg. EXIF and maker notes).  Would that work for you?

B) You can set this via the API LimitLongValues option in output formats other than -v and -htmldump.  For -v -and -htmldump, adding more -v's generally does this.

- Phil

Thanks.

A) I'll think about this.

The existing Exiftool JSON output is tough to deal with using a strongly typed language.

The value data types are used inconsistently - I find it easier to work with other Exiftool output formats. 

B) Got it - thanks. Adding more -vs to the verbose output has advantages and disadvantages as it adds other stuff I don't need (like those tens of thousands JPEG RST lines).

Phil Harvey

A) You can use the API StructFormat=JSONQ option to quote all JSON values.  This allows you to read everything as a string, which should bypass some problems of a strongly typed language.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

#4
Quote from: Phil Harvey on November 05, 2024, 08:12:36 AMA) You can use the API StructFormat=JSONQ option to quote all JSON values.  This allows you to read everything as a string, which should bypass the problem of a strongly typed language.

- Phil

Thanks a lot - I obviously need to look a lot closer at what is possible using various API options.

So yes - having the hex output as part of json would be very helpful - thanks.

Am I right in thinking I need to use the -g5 option to uniquely identify tags?

(I've been looking at Canon cr3 samples and the maker notes seems to be very complex with multiple duplicate hex id values).

And am I right in thinking that the only two JSON data types possible when using API StructFormat=JSONQ are "String" and "Array of Strings"?

Phil Harvey

Quote from: greybeard on November 05, 2024, 08:26:27 AMAm I right in thinking I need to use the -g5 option to uniquely identify tags?

That won't necessarily do it.  Do you just want tags to be unique in the output (in which case -G4 would do, but I should tidy this up for -j and remove the "Unknown:" for the primary tag), or do you also want the groups to match the same location in other files (in which case -G5:6 might do, but then there is still a chance of duplicates in the output if, say, the file has 2 EXIF segments).

QuoteAnd am I right in thinking that the only two JSON data types possible when using API StructFormat=JSONQ are "String" and "Array of Strings"?

Yes, as long as the -struct option isn't used.  But you may reduce this to only "String" type by using the -sep option to stringify arrays.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

As far as the duplicates go - I want to be able to match between files from the same camera model. Mostly for unknown tags.

The objective is to simplify and automate the process of figuring out the purpose of unknown tags.

A use case example would be to take a batch of image files that have been taken with different settings from the same camera - the unknown tags in each image are then compared. Values that are all the same - especially spaces or zeroes or nulls can be ignored and then the unknown tags with different values examined.

Depending on the data type for the unknown tag the values will be shown in a variety of formats such as a number or hex representation.

For some data types (such as 32 bit integers) they can also be shown as separate 16 or 8 bit integers - signed and unsigned.

If, for example, the 32 bit tag has been used to hold height and width values it should be fairly easy to identify by looking at the separate 16 bit values.

This is the sort of thing that I do now with Fujifilm image files but there is a lot of creating output files - running diffs and converting tag values between different formats manually.

This process will evolve over time.

greybeard

The addition of hex values to the json output in version 13.02 is much appreciated.

It may be a case of "be careful of what you wish for" as there are examples where it creates very large outputs - where preview images are converted to hex - and I couldn't find a way of limiting the size of the new hex output (the LimitLongValues does not appear to limit the hex output).

I've been running the following parameters to extract the info I need:

-G1 -l -u -H -json -api LimitLongValues=0 -api StructFormat=JSONQ  -api SaveBin=1 -sep ","
Perhaps I will have to restrict this type of output to specific tags - for example just unknown tags - and I'm proposing the following to restrict to unknown tags:

-G1 -l -u -H -json -api LimitLongValues=0 -api StructFormat=JSONQ  -api SaveBin=1 -sep "," -"*_0x*"
I will spend more time on working through different image samples - but again the prompt response to the feature response is much appreciated.

Phil Harvey

I could apply LimitLongValues to the hex output if you wanted, and you could set a reasonable upper limit.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

Yes please - I'll have to experiment where to set it so that I filter out those preview images but keep the (potentially) long AF tags.

Does my method of filtering unknown tags make sense to you?

Phil Harvey

That filter should pick up all unknown EXIF/TIFF and maker note tags.

I'll impose LimitLongValues to the hex strings.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).