PDF file size getting doubled after during download after processing by exiftool

Started by AmanRaj, April 22, 2024, 03:19:11 AM

Previous topic - Next topic

AmanRaj

Hi,

I uploaded a pdf of size 8 MB and then downloaded it from portal it becomes 16 MB. Downloading from portal involves exiftool processing. I checked the metadata of both the original and downloaded file, the only difference I can observe is ITPC metadata description added in downloaded file and Linearized attribute value get changed to 'No' in downloaded file. Using exiftool command I tried removing description but it has no impact on size. Only parameter I doubt causing this problem is Linearized.
Can you please help me with how size impacting, attaching original pdf here.


StarGeek

Exiftool uses PDFs Incremental Update ability to edit PDFs. It doesn't have the ability to remove previous data in the file, only update the index with the new data.

As such, exiftool edits to PDFs will always increase the file size and such edits are reversible. To make the edits permanent and remove the previous data completely, you have to run a PDF specific program such as qpdf to re-linearizethe file and make the changes permanent.

See the docs at the top of the PDF Tags page for more details.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

It is unlikely that 8 MB of metadata was added by portal, however I can't tell without seeing the modified file.

I'm saying that if your goal is to reduce the file size back down to 8 MB, then I wouldn't think that ExifTool is the answer.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

AmanRaj

Hi Phil,

Attached downloaded file as well with size 16 MB. Please have a look if you get something.

-Aman

StarGeek

I ran my qpdf BAT file that re-linearizes as well as unpack and repack everything and here are the results

C:\>fixpdfs Y:\!temp\x\z\
"Y:\!temp\x\z\"
Processing "MFL-05397-PL_Download.pdf" in "Y:\!temp\x\z\"
qpdf --stream-data=compress --object-streams=generate --linearize --replace-input "Y:\!temp\x\z\MFL-05397-PL_Download.pdf"
WARNING: Y:\!temp\x\z\MFL-05397-PL_Download.pdf (offset 210890): input stream is complete but output may still be valid
WARNING: Y:\!temp\x\z\MFL-05397-PL_Download.pdf (offset 210890): input stream is complete but output may still be valid
WARNING: Y:\!temp\x\z\MFL-05397-PL_Download.pdf (offset 210890): input stream is complete but output may still be valid
qpdf: there are warnings; original file kept in Y:\!temp\x\z\MFL-05397-PL_Download.pdf.~qpdf-orig
qpdf: operation succeeded with warnings; resulting file may have some problems
0

C:\>exiftool -ext * -filesize Y:\!temp\x\z
======== Y:/!temp/x/z/MFL-05397-PL_Download.pdf
File Size                       : 8.5 MB
======== Y:/!temp/x/z/MFL-05397-PL_Download.pdf.~qpdf-orig
File Size                       : 16 MB
    1 directories scanned
    2 image files read

The final result is that it's back down to ~8MB.

Because of the errors, qpdf doesn't delete the original as it normally would, but instead adds the ~qpdf-orig suffix.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

AmanRaj

Thanks for the update.
But now the question is since increaed file size is concern for some customers. Is this something exiftool team should fix or we have to find a way out from our side to keep Linearized same as original file.
I doubt because Linearized attribute can be updated from our end.

-Aman

Phil Harvey

OK.  It was the metadata that is the size issue.  The XMP is almost 8 MB in size.  This is due to a canto:CantoMetadata property in the XMP which presumably contains either editing information or an entire second copy of the document.  It really isn't good to store editing information in XMP, but Adobe themselves do this (which I have complained to them about), so it isn't something new.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: AmanRaj on April 24, 2024, 04:27:13 AMIs this something exiftool team should fix

This isn't something exiftool can "fix".  As I said in my first post, exiftool uses the PDFs ability to have incremental updates. This is unlikely to change as I doubt Phil want's to turn exiftool into a full-fledged PDF editor.

Quotewe have to find a way out from our side to keep Linearized same as original file.
I doubt because Linearized attribute can be updated from our end.

You'll have to use a different program that is specific to editing PDFs.  Loading it into Adobe Acrobat and resaving with File > Save as Other > Optimized PDF will do it.  The command I show above for qpdf will do it. There are plenty of other PDF editing programs that you can use.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

AmanRaj

For original files XMP Toolkit is Adobe XMP Core 9.1-c001 79.675d0f7
I observed multiple files and found for downloaded files XMP Toolkit is Image::ExifTool 11.70 where we are getting increased file size.

StarGeek

I can only repeat what has already been said

Quote from: StarGeek on April 22, 2024, 10:32:02 AMExiftool uses PDFs Incremental Update ability to edit PDFs. It doesn't have the ability to remove previous data in the file, only update the index with the new data.

As such, exiftool edits to PDFs will always increase the file size and such edits are reversible. To make the edits permanent and remove the previous data completely, you have to run a PDF specific program such as qpdf to re-linearizethe file and make the changes permanent.

Quote from: Phil Harvey on April 23, 2024, 09:45:18 AMI'm saying that if your goal is to reduce the file size back down to 8 MB, then I wouldn't think that ExifTool is the answer.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).