PDF document takes 3 minutes to process

Started by felixge, May 06, 2013, 05:21:45 AM

Previous topic - Next topic

felixge

System: Ubuntu 12.04
Exiftool: 9.28
Command line: exiftool -v 34749_ePrint_2.pdf
Output: see output.txt

Exiftool takes a very long time to process the attached PDF. I suspect this is because the PDF contains a lot of history events and similar data. I don't need this and tried to exclude the data via '--History*' and '-x History*', but unfortunately that only seems to exclude the history events from the output, but all the processing is still happening.

Many thanks for any advice in advance!

Phil Harvey

Thanks for the bug report and sample.  I can reproduce this on my system here.

The problem is that this document contains a very large encrypted metadata stream.  Unfortunately, the entire stream must be decrupted to extract any of the information.  Complain to Adobe about this (and about storing editing information inside the metadata, which is stupid).  I have had to implement the AES decryption myself (in Perl, which is slow) because I couldn't find a standard library to do this.  From my AES module documentation:

        BUGS

        This code is blindingly slow.  But in truth, slowing down processing is the
        main purpose of encryption, so this really can't be considered a bug.


I have no idea why PDF information is encrypted like this when there is no password protection.  Again, stupid Adobe.

So the bottom line is that I don't think there is anything I can do about this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

4allportal-ite

Hello everyone,
Is it possible to disable the encrypted metadata or skip decryption for these blocks? This is actually a big problem for us.
Thanks

Phil Harvey

If you skip the encrypted metadata stream then it is likely that ExifTool won't extract any metadata.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Using the PDF file in the first post, it took exiftool a 44 seconds on my system that has a pretty good CPU to list the data.

I then used this command using qpdf
qpdf 34749_ePrint_2.pdf  --decrypt 34749_ePrint_2-decrypted.pdf

This took less than a second (0.10s) to convert.

Then, running exiftool on the decrypted file took less than a second (0.86s) to list all the data.

It might be worth looking into decrypted any non-password protected files first. Is there a reason to encrypt the data without a password in the first place?

For qpdf, there is a --replace-input option which will skip the creation of a separate output file.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).