Determine the maximum size (e.g., string length) of a specific tag

Started by Hayo Baan, August 15, 2014, 05:29:28 AM

Previous topic - Next topic

Hayo Baan

Is there a way to determine the size limitations that are in effect for a certain tag? For instance, most (all?) IPTC tags have a limit to how much data they can hold. Exiftool (unlike some other software) normally checks this limit and truncates e.g., the text written. This is all very good and proper, but I would like a way to tell beforehand if a tag write will fail/get truncated so I can take this into account in my code and e.g., perform my own string trimming.

The (format) data is present in the exiftool code, but I have not been able to determine an easy way to get to it from just the (fully qualified) tag name. Is there a way to do this?
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

Hi Hayo,

On the command line, you can get the maximum string lengths from the -listx -iptc:all output.

But I have provided no way to extract this information via the API.  If you want, you could use the IgnoreMinorErrors option to avoid the truncation and write any length you want.  While this is against the IPTC specification, any reasonable software should be able to deal with this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Hi Phil,

Hmm, I don't want to use IgnoreMinorErrors. I actually (also) need to know the max length before writing as I want to be able to compare strings from different tags to see if they're the same. Here I need to take the max length into account as some software does chop strings at max length as per spec.

Note that by digging around in the code, I did find a possible solution using undocumented functions and features. Certainly doable this way, but not straightforward and perhaps not future proof as you may change your approach... Another solution would be to create a list of limits (e.g. from the output of the lists command you gave), but that wouldn't be ideal either. But as we are only dealing with a small number of tags, perhaps still the easiest/safest way (besides, the specs for those "old" IPTC tags is not going to change, ever, any way).

But still, it would be nice if there was a way in Exiftool to determine the exact type and format of a tag  ;)

Cheers,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

Hayo Baan

Hmm, going through the list of all tags from -listx, there are actually quite a number of tags that impose size restrictions. Also from the info there is no info regarding any minimum number of entries/characters. For my, at this moment, limited needs in this respect this is not a problem, but I would like to be prepared for the future too ;)

So I think I will go for a solution involving the (currently undocumented) use of Image::ExifTool::TagLookup::FindTagInfo and the (also undocumented?) use of GetGroup to determine the exact tag to use from the FindTagInfo result. Fun ;D

Enjoy your vacation!

Cheers,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

Hi Hayo,

You're right about no minimum size in the -listx output.  It would be very difficult to design a consistent interface to reveal details like this via the API because the various types of metadata have very different restrictions, and the ExifTool internals are often different.

If you want to use undocumented features like FindTagInfo(), you can get this information as you mention (note that this function only works for writable tags).  I am not likely to change this in the future, so it should be future proof for the standard metadata types (EXIF, IPTC, XMP).  Once you have a reference to the tagInfo structure, you can check the Format element for IPTC tags to see the size range (ie. for ObjectListReference, the Format is 'string[4,68]').  The problem is that the format of Format is different for different metadata types.  For binary data, a string looks like 'string[8]' because it must have a fixed length.  For EXIF ASCII values, it looks like 'string' because the length is variable.  There are also special Format types that are valid only for certain types of metadata.  So you can't use this technique for a tag of an arbitrary metadata type.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Hi Phil,

Quote from: Phil Harvey on August 16, 2014, 08:15:48 AM
You're right about no minimum size in the -listx output.  It would be very difficult to design a consistent interface to reveal details like this via the API because the various types of metadata have very different restrictions, and the ExifTool internals are often different.

Yeah, I'm already impressed by how you were able to capture all those different formats and constraints in the first place (long live the flexibility of Perl!), so I'm not surprised things can be difficult to make generic. In this case, however, how about implementing a "mincount" value (in case there is one), similar to the "count" value? Wouldn't that be still generic (and probably fairly easy to implement)?

Quote from: Phil Harvey on August 16, 2014, 08:15:48 AM
If you want to use undocumented features like FindTagInfo(), you can get this information as you mention (note that this function only works for writable tags).

Ah, but how does the -listx do it then? if interpret its output correctly, it does list count for tags that are not writeable (at least they say writable=false). Incidently, is there a function that tells me if a tag is writeable?

Quote from: Phil Harvey on August 16, 2014, 08:15:48 AM
I am not likely to change this in the future, so it should be future proof for the standard metadata types (EXIF, IPTC, XMP).  Once you have a reference to the tagInfo structure, you can check the Format element for IPTC tags to see the size range (ie. for ObjectListReference, the Format is 'string[4,68]').  The problem is that the format of Format is different for different metadata types.  For binary data, a string looks like 'string[8]' because it must have a fixed length.  For EXIF ASCII values, it looks like 'string' because the length is variable.  There are also special Format types that are valid only for certain types of metadata.  So you can't use this technique for a tag of an arbitrary metadata type.

Yep, already found the Format element and the different forms it could come in. It's not too difficult to get the size restrictions off its specs though :)

Thanks again,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

Quote from: Hayo Baan on August 16, 2014, 09:50:14 AM
Ah, but how does the -listx do it then?

It iterates through all tags in a table (using TagTableKeys() then GetTagInfoList()), then it does this (from TagInfoXML.pl):

                my $format = $$tagInfo{Writable} || $$table{WRITABLE};
                $format = $$tagInfo{Format} || $$table{FORMAT} if not defined $format or $format eq '1';
                $format = 'struct' if $$tagInfo{Struct};
                if (defined $format) {
                    $format =~ s/\[.*\$.*\]//;   # remove expressions from format
                } elsif ($isBinary) {
                    $format = 'int8u';
                } else {
                    $format = '?';
                }
                my $count = '';
                if ($format =~ s/\[.*?(\d*)\]$//) {
                    $count = " count='$1'" if length $1;
                } elsif ($$tagInfo{Count} and $$tagInfo{Count} > 1) {
                    $count = " count='$$tagInfo{Count}'";
                }


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Quote from: Phil Harvey on August 16, 2014, 02:23:19 PM
It iterates through all tags in a table (using TagTableKeys() then GetTagInfoList()), then it does this (from TagInfoXML.pl):
...

Right, from the code in TagInfoXML, I could definitely develop some code that would let me determine everything I would (ever) need to know about each tag :)

For now, however, I have implemented the max "length" function using this simple piece of code (self is an object of a class describing a specific tag in a file).
sub maxLength($) {
    my $self = shift;

    my $maxlength;   
    # Determine tag info from tagname and use it to determine max length
    for my $ti (FindTagInfo($self->tagNameClean)) {
        if ($self->exifTool->GetGroup($ti, "0:1") eq $self->groupName) {
            my $fmt = $ti->{Format} // "undef";
            ($maxlength) = $fmt =~ /\[(?:\d+,)?(\d+)\]/;
            # Progress::debug("Found format $fmt" . (defined $maxlength ? " maxlength=$maxlength" : "") . " for " . $self->fullTagNameClean);
            last;
        }
    }
    return $maxlength;


I see in your code that you first look at Writable and then at Format, and also that you use a name all in upper-case), would I need to do that too?

Cheers,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

The tagInfo's Writable element (and its default value set by the table's WRITABLE) does need to be checked.  Sometimes these may be different than the stored tagInfo Format (and default table FORMAT).  You can get hold of a reference to the table from the tagInfo Table element if you are starting from FindTagInfo().

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Hi Phil,

Ah right, so it was a bit more complex than I expected, but I think I got everything covered now and wrote my own "formatCount" function (which maxLength then uses to just return the count). This will allow me to extend things if I ever need to.

# Return the format and count of a value source
# Arguments:
#   Object reference
# Returns:
#   Format specification
sub tagFormatCount($) {
    my $self = shift;

    # Determine tag info from tagname and use it to determine its format
    # Code mimicked from exiftool sources
    for my $ti (FindTagInfo($self->tagNameClean)) {
        if ($self->exifTool->GetGroup($ti, "0:1") eq $self->groupName) {
            my $format = $ti->{Writable} || $ti->{Table}->{WRITABLE};
            $format = $ti->{Format} || $ti->{Table}->{FORMAT} if not defined $format or $format eq '1';
            $format = 'struct' if $ti->{Struct};
            if (defined $format) {
                $format =~ s/\[.*\$.*\]//; # remove expressions from format
            } elsif (($ti->{Table}->{PROCESS_PROC} and
                      $ti->{Table}->{PROCESS_PROC} eq \&Image::ExifTool::ProcessBinaryData)) {
                $format = 'int8u';
            } else {
                $format = '?';
            }
            my $count;
            if ($format =~ s/\[.*?(\d+)\]$//) {
                $count = $1;
            } elsif (defined $ti->{Count}) {
                $count = $ti->{Count};
            }
            return ($format, $count);
        }
    }
    return;
}


Thanks again for all your input :)
Hayo Baan – Photography
Web: www.hayobaan.nl