Indeterminate output from INDD file using -tagsfromfile

Started by lambart, December 01, 2017, 09:09:39 PM

Previous topic - Next topic

lambart

Hello, we're having some trouble extracting XMP from an INDD file.
It seems that multiple runs of exiftool generate different output, alternating between two vastly different blobs, as shown by this example:


$ exiftool -tagsfromfile 8.5x11-FranchiseFlyer.indd foo.xmp
    1 image files created
$ exiftool -tagsfromfile 8.5x11-FranchiseFlyer.indd foo2.xmp
    1 image files created
$ ls -al *.xmp
-rw-rw-r-- 1 eric eric 661444 Dec  1 15:49 foo2.xmp
-rw-rw-r-- 1 eric eric 209659 Dec  1 15:49 foo.xmp


I can keep running the command over and over... sometimes it extracts the ~200K blob, other times it extracts the ~650K blob.

This INDD file contains two page images, and a page info image:


$ exiftool 8.5x11-FranchiseFlyer.indd  | grep Binary
Page Image                      : (Binary data 21217 bytes, use -b option to extract), (Binary data 21044 bytes, use -b option to extract)
Page Info Image                 : (Binary data 184025 bytes, use -b option to extract)


It seems that the XMP blob written to the file either contains the two page images, OR the page info image.

This is the main issue that I'd like to report, and which is causing us problems (FWIW we're only interested in the page images). But here some other potentially-interesting stuff I discovered:

When I used some undocumented features found here on the forum, I'm able to get all 3 images.... although for the "page info image", the "-b" option isn't generating a binary file; as you can see, it's a base64-encoded image that needs to be decoded to reveal the actual binary JPEG data.


$ exiftool 8.5x11-FranchiseFlyer.indd -listitem 0 -pageimage -b -m > pageimage0.jpg
$ exiftool 8.5x11-FranchiseFlyer.indd -listitem 1 -pageimage -b -m > pageimage1.jpg
$ exiftool 8.5x11-FranchiseFlyer.indd -listitem 0 -pageinfoimage -b -m > pageinfoimage.jpg
$ file page*.jpg
pageimage0.jpg:    JPEG image data, JFIF standard 1.02, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 197x256, frames 3
pageimage1.jpg:    JPEG image data, JFIF standard 1.02, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 198x256, frames 3
pageinfoimage.jpg: ASCII text
$ base64 -d pageinfoimage.jpg > pageinfoimage-decoded.jpg
$ file page*.jpg
pageimage0.jpg:            JPEG image data, JFIF standard 1.02, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 197x256, frames 3
pageimage1.jpg:            JPEG image data, JFIF standard 1.02, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 198x256, frames 3
pageinfoimage-decoded.jpg: JPEG image data, JFIF standard 1.02, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 612x792, frames 3
pageinfoimage.jpg:         ASCII text


But to reiterate: We're purely interested in getting the pure XMP blob here (from which we use another library to extract the images). I can't share the (37.5MB) INDD file publicly but will send it directly to Phil Harvey upon request.

Thanks,
Eric

Phil Harvey

Hi Eric,

Thanks for this report.  Send the file to me and I'll take a look (philharvey66 at gmail.com)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I got the file, thanks.

Here is the source of the problem:

> exiftool ~/Desktop/8.5x11-FranchiseFlyer-DisneyInfinity2-CWtest.indd -'page*image' -G1:2 -a
[XMP-xmp:Preview] Page Image                    : (Binary data 21217 bytes, use -b option to extract), (Binary data 21044 bytes, use -b option to extract)
[XMP-CWLI:Unknown] Page Info Image              : (Binary data 191746 bytes, use -b option to extract)
[XMP-CWLI:Unknown] Page Info Image              : (Binary data 184025 bytes, use -b option to extract)


PageInfoImage is not decoded from base64 is because it is unknown.  A user-defined tag would have to be created to decode this.

I still have to figure out why ExifTool is sometimes copying the PageInfoImage, but it is certainly related to the fact that PageImage has a tag ID of "PageInfoImage" (see here).  I'll post back when I know more.

But if you are interested in getting the binary XMP blob instead of rewriting all metadata as XMP, maybe you should do this:

exiftool -tagsfromfile 8.5x11-FranchiseFlyer.indd -xmp foo.xmp

This will cause the XMP to be extracted as a block.  Without the -xmp, ExifTool writes the tags individually, so it may move information around to same-named tags in other namespaces, and you will miss the unknown tags.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Ah, this is why the XMP-WCLI PageInfoImage is written to XMP-xmp...  XMP is copied as structures by default, and the two structures are named the same (both called "PageInfo"):

> exiftool ~/Desktop/8.5x11-FranchiseFlyer-DisneyInfinity2-CWtest.indd -struct "-page*" -G1:2
[XMP-xmp:Image] Page Info                       : [{Format=JPEG,Height=256,Image=(Binary data 21217 bytes|, use -b option to extract),PageNumber=1,Width=256},{Format=JPEG,Height=256,Image=(Binary data 21044 bytes|, use -b option to extract),PageNumber=2,Width=256}]
[XMP-CWLI:Unknown] Page Info                    : [{Format=JPEG,Image=(Binary data 191746 bytes|, use -b option to extract)},{Format=JPEG,Image=(Binary data 184025 bytes|, use -b option to extract)}]


The reason that sometimes one structure is copied and sometimes the other is because the order of extracted structures was indeterminate.  I will fix this in ExifTool 10.68.  Thanks for pointing out this inconsistency.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

lambart

Quote from: Phil Harvey on December 05, 2017, 08:39:51 AM
The reason that sometimes one structure is copied and sometimes the other is because the order of extracted structures was indeterminate.  I will fix this in ExifTool 10.68.  Thanks for pointing out this inconsistency.

Sounds great, Phil. Happy to help. I realize now that I neglected to mention any version number in my post, but I'd seen the problem with ExifTool 10.10, 10.20, and 10.67.

Anyway, it's pretty clear the file is mangled, but I'm glad it'll allow you make ExifTool even better.

Thanks so much for your great tool and prompt response.

Eric