ExifTool Forum

ExifTool => The "exiftool" Application => Topic started by: martinrwilson on September 16, 2011, 05:55:55 AM

Title: Extracting binary data from multiple XMP elements
Post by: martinrwilson on September 16, 2011, 05:55:55 AM
Hi,
I have an indd document containing XMP data that contains multiple elements with the same tag (in <rdf:li> elements within a <rdf:Seq> element)  containing binary data.

I actually only want the data in the first element, which does seem to be what happens if I refer to the element by tag name, e.g.
exiftool -xmp:pageimage -b test.indd > thumbnail.jpg

So, my question is:
- What does ExifTool do with multiple elements that contain binary data with the same tag name? It seems to just output the first, which is what I want - is this the case and, if so, will this continue to be the case in the future? (Is there a safer way to get the first?)

I found some information in this post but it's not clear from this what does happen in the case of multiple matching elements:
/exiftool/forum/index.php/topic,2105.msg9233.html#msg9233 (https://exiftool.org/forum/index.php/topic,2105.msg9233.html#msg9233)

Any help will be much appreciated!
Thanks,
Martin
Title: Re: Extracting binary data from multiple XMP elements
Post by: Phil Harvey on September 16, 2011, 07:44:13 AM
Hi Martin,

You're quite right, I thought this was documented but I can't find where.  I will add it to the -b documentation.  However, the post you linked explains it well:

When extracting lists with the -b option, all list items from a single tag are extracted separated by newlines. (I think this is what you are calling "multiple matching elements".) This behaviour will not change.  Currently you can't address individual list items from the command line.  I have toyed with the idea of adding this feature, but the way I have done things this would be more difficult than it seems.

- Phil
Title: Re: Extracting binary data from multiple XMP elements
Post by: martinrwilson on September 16, 2011, 08:14:20 AM
Thanks for your response.
Just to be clear - my observation that only the first image (in this case) is being output is wrong - in fact all the images (from data matching the tag) will be output, separated by new lines.
So I'm probably ending up with the bytes from two jpegs in one file (which suprisingly, displays fine as the first image).
FYI, the relevant XMP data is shown below, showing the two images (in <xmpGImg:image> elements)

<xmp:PageInfo>
    <rdf:Seq>
       <rdf:li rdf:parseType="Resource">
     <xmpTPg:PageNumber>1</xmpTPg:PageNumber>
     <xmpGImg:format>JPEG</xmpGImg:format>
     <xmpGImg:width>256</xmpGImg:width>
     <xmpGImg:height>256</xmpGImg:height>
     <xmpGImg:image>[lots of bytes making up the image]</xmpGImg:image>
       </rdf:li>
       <rdf:li rdf:parseType="Resource">
     <xmpTPg:PageNumber>2</xmpTPg:PageNumber>
     <xmpGImg:format>JPEG</xmpGImg:format>
     <xmpGImg:width>256</xmpGImg:width>
     <xmpGImg:height>256</xmpGImg:height>
     <xmpGImg:image>[lots of bytes making up the image]</xmpGImg:image>
       </rdf:li>
    </rdf:Seq>
</xmp:PageInfo>

Many thanks,
Martin
Title: Re: Extracting binary data from multiple XMP elements
Post by: martinrwilson on September 16, 2011, 08:23:37 AM
Ok, I've just verified that this is the case.
Thanks for your help.
Regards,
Martin
Title: Re: Extracting binary data from multiple XMP elements
Post by: Phil Harvey on September 16, 2011, 09:54:04 AM
Hi Martin,

Yes, you can add any random data to the end of a JPEG image without causing problems.  In this case it isn't random data, but a newline followed by the other images in the list.

However, exiftool may also be used to remove any JPEG trailer:

exiftool -trailer:all= a.jpg

So doing this on the -b output effectively gives you the first JPEG from the list.

- Phil
Title: Re: Extracting binary data from multiple XMP elements
Post by: martinrwilson on September 16, 2011, 03:43:55 PM
Good to know - thanks Phil!
Title: Re: Extracting binary data from multiple XMP elements
Post by: jbverschoor on November 10, 2011, 01:01:17 PM
I'm kind of stuck :-)

I'd like to extract all the images :-)
Ideally, I'd first extract "Page Image Page Number"
And then extract each page in the list, or just a single page
Is this already possible?

Page Image Page Number          : 1, 2
Page Image Format               : JPEG, JPEG
Page Image Width                : 256, 256
Page Image Height               : 256, 256
Page Image                      : (Binary data 8544 bytes, use -b option to extract), (Binary data 6012 bytes, use -b option to extract)
Title: Re: Extracting binary data from multiple XMP elements
Post by: Phil Harvey on November 10, 2011, 01:10:03 PM
Unfortunately the command-line application doesn't have a feature to allow a single item to be extracted from a list, so the best you can do is to write them all to a single file:

exiftool -pageimage -b SOURCEFILE > out.jpg

But then you would have to split up the output jpg to recover the individual pages.  The file would be split at the "ff d9 0a ff d8" (hex) pattern.

An alternative is to write a simple Perl script using the ExifTool API to do what you want.  This is trivial, and I can help you with this if you have Perl installed.
Title: Re: Extracting binary data from multiple XMP elements
Post by: Phil Harvey on November 10, 2011, 01:20:03 PM
I have a solution:

Since this has come up in the past, I will add a -listItem option to extract a specific item from a list.  Then you could do what you want with this:

exiftool -pageimage -b -listitem 0 > page1.jpg
exiftool -pageimage -b -listitem 1 > page2.jpg

This feature will appear in ExifTool 8.70.

- Phil