ExifTool crash

Started by gquerret, December 06, 2011, 05:36:54 AM

Previous topic - Next topic

gquerret

Hello,


Following my previous thread on out of memory crash, I can confirm that the latest version is working correctly, thanks.
We are currently analyzing several gigabytes of files, and we've found an Indesign file which always crash ExifTool.

File can be accessed at http://riverside-software.fr/00000446.indd ( 230Mb)

ExifTool version used : 8.69 (reproduced on 8.65)
OS : Win 2003 32 bits
Command line : %PERL%\bin\perl.exe exiftool/ExifTool 00000446.indd
Output : Out of memory exception. Process grows to 1Gb in "Private Data" (shown in process explorer) and crashes
Output/behavior expected : no crash :-) Or at least know if there's an option to prevent it from crashing (exclude some tags for example). Or if the file is screwed.

Edit: PH - Added link to previous thread

gquerret

Sorry, file isn't fully uploaded yet, I'll send another message when it's done.


Gilles QUERRET

gquerret

File is uploaded.

Gilles QUERRET

Phil Harvey

Hi Gilles,

All I have to say is:  HOLY CRAP!

I have complained about Adobe's use of XMP for information other than useful metadata, but this is ABSOLUTELY RIDICULOUS!

The XMP in this INDD file is 107531307 bytes long!  (YES, 107 MB!)

That's totally insane.  Dumb, dumb, dumb.  XML is absolutely the wrong format for storage of large amounts of binary data like this.

But I can't blame this entirely on Adobe.  The Adobe part of this XMP is only 221 kB in size.  The bulk of the problem is due to metadata in the www.northplains.com namespace.

Even with this huge XMP, ExifTool has no problems parsing this file on my old Mac Mini (with 2 GB of RAM), but it requires 2.3 GB of virtual memory and about a minute to do it.   Unfortunately the memory handling of ActivePerl for Windows is not as good, and it tends to barf on large allocations like this.

I'm not sure what the best solution is.  Perhaps a minor warning if the XMP is excessively large (> 20 MB ?), which would allow the XMP to be processed with the -m option.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

gquerret

Hi Phil,


Thanks for your answer. I absolutely agree with you on bad usage of XMP for this kind of data. I guess that ExifTool is using DOM to parse XML data, and this produces out of memory problems ?
I have no control on the files ExifTool receive in our application. All I can say is I'm pretty sure binary data larger than 20Mb will be discarded by the user. So your suggestion of discarding binary metadata large than a specified parameter can be good (at least for my use case).
And one more thing (not directly related to ExifTool) : what are the good alternatives to ActivePerl under Windows ? We're using Perl exclusively for ExifTool, so no constraints on other Perl modules. And as the server will be migrated to Win64, do 64 bits implementation solve these out of memory problems ?

Best regards,

Gilles QUERRET

Phil Harvey

Hi Gilles,

I'm not using XML DOM for parsing the XMP, but I break the XMP elements down into individual strings from which the values are extracted then stored separately.  This isn't particularly memory efficient, and from this test it looks as if it requires about 20x more memory than the size of the original XMP, which isn't unreasonable.

If I add the patch I was thinking about, XMP over 20 MB in size would be ignored entirely.  Not just the binary data parts.  Is this OK?

I don't have any experience with other Windows versions of Perl, but I haven't seen any problems like this when running Perl under Cygwin in Windows.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

gquerret

Hi Phil,


Would it be possible to have this size as a parameter ?
And do you know if the same problem could apply to other metadata types ? Or is this restricted to XMP ? My knowledge of metadata is too much limited to know if there are other formats storing data as XML.
By the way, I'll try to setup Cygwin and run ExifTool on this version to know if memory is handled correctly.

Gilles

Phil Harvey

I don't want to make the size a parameter because this would require adding another option to exiftool.

But I will think about this.  Maybe I can improve the memory handling of XMP, or change it to handle the large binary data elements differently to avoid this problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Hi Gilles,

I have been playing around with optimizing the memory requirements of the ExifTool XMP processing algorithm.  I need more testing to make sure I haven't broken something, but I now have a version running which leaves the original XMP as one large block instead of breaking it up into individual elements before extracting the values.  This version runs on your test file using 10x less memory (230 MB virtual), so it may solve the problem with the apparent 1 GB memory constraint of ActivePerl provided the XMP size stays below about 400 MB or so.

I'll do some more testing and let you know tomorrow how it went.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Great news.  Testing is going really well so far.  On Windows running the new version of ExifTool under ActivePerl uses 300 MB of memory to process your test file.

So my current plan is to implement this change, which should solve the problem you observed.

For additional protection, I will issue a minor warning and not process XMP larger than 300 MB in INDD files unless the -m option is used.

These changes will appear in ExifTool 8.72

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

gquerret

Impressive !
Just to be sure, you'd like to limit XMP processing only in INDD files ? I guess I may be able other file types with huge XMP sections (not to say this is good usage of XMP, but I'm not in control of what is being sent to this system).


Gilles

Phil Harvey

Hi Gilles,

Yes there could potentially be other formats with this problem, but I'll tackle those as we find them.  I could add a test in the XMP module, but by the time the processing gets to this stage the XMP data has already been read into memory.  So to stop it from being read in the first place, the InDesign module had to be patched.

However, ExifTool also reads XMP from JPG, JP2, TIFF, GIF, EPS, PDF, PSD, INX, PNG, DJVU, SVG, PGF, MIFF, XCF, CRW, DNG and a variety of proprietary TIFF-based RAW images, as well as MOV, AVI, ASF, WMV, FLV, SWF and MP4 videos, and WMA and audio formats supporting ID3v2 information, and patching all of these isn't something I want to do.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

gquerret

Hi Phil,


I've tested 8.72 on this image, and it works for me. I guess we'll have other screwed INDD files, so I'll keep you posted. Thanks a lot !


Gilles QUERRET