ExifTool Forum

ExifTool => The Image::ExifTool API => Topic started by: danisowa on January 15, 2013, 08:18:52 AM

Title: delete a tag that is available multiple times
Post by: danisowa on January 15, 2013, 08:18:52 AM
Is there a solution to remove a Tag from a PDF file that is available more than once?

i have a pdf documente with the tags:
Author
Author (1)
Author (2)
Author (3)
Author (4)
Author (5)

I want to remove them and set ony the Author field to my new value.

I have tried several things but i was unable to get success :-(

Title: Re: delete a tag that is available multiple times
Post by: Phil Harvey on January 15, 2013, 08:27:07 AM
Where are these tags stored (extract with -a -G1)?

Without seeing the file I can only suggest things to try, but it is possible that this may take 2 commands:

exiftool -author= FILE

exiftool -author="some author" FILE

But if you want to email me the file (philharvey66 at gmail.com), I may be able to help more.

- Phil
Title: Re: delete a tag that is available multiple times
Post by: danisowa on January 15, 2013, 08:28:56 AM
i'm working with the perl api.....
Title: Re: delete a tag that is available multiple times
Post by: danisowa on January 15, 2013, 08:42:41 AM
i've extracted on the commandline

all Author values are stored in [PDF]

the command linke commands

exiftool -author= FILE

exiftool -author="some author" FILE


have no effect to the file :-/
Title: Re: delete a tag that is available multiple times
Post by: Phil Harvey on January 15, 2013, 08:45:05 AM
Interesting.  Could you send me the file?

- Phil
Title: Re: delete a tag that is available multiple times
Post by: Phil Harvey on January 16, 2013, 07:30:06 AM
Thanks for the sample.

Here is what I get:

> exiftool ~/Desktop/test.pdf -author -a -G1
[PDF]           Author                          : all,,DCFR,,Vx
[PDF]           Author                          : all,,DCFR,,Vx
[PDF]           Author                          : all,,DCFR,,Vx

> exiftool ~/Desktop/test.pdf -author=me
    1 image files updated

> exiftool ~/Desktop/test.pdf -author -a -G1
[PDF]           Author                          : me
[XMP-pdf]       Author                          : me
[PDF]           Author                          : all,,DCFR,,Vx
[PDF]           Author                          : all,,DCFR,,Vx


So ExifTool reports 3 Author tags, but only changes one of them.

Looking more closely at the PDF (using the ExifTool -v option) I can see that it has been modified twice, apparently using Hewlett Packard MFP software.  I do not believe that it was updated correctly because the Info dictionary is duplicated each time instead of being replaced.  This results in 3 copies of the Author tag, but ExifTool will edit only the first one.

I don't think that multiple Info dictionaries are allowed by the PDF specification, so I can't fault ExifTool's behaviour here.

I tried rewriting the PDF using Adobe Bridge, and it fixed the duplicate Info dictionary problem.  After this, the writing/deleting the Author tag with ExifTool behaves as one would expect:

  # delete Author
  $exifTool->SetNewValue('Author');
  $exifTool->WriteInfo('test.pdf');


- Phil
Title: Re: delete a tag that is available multiple times
Post by: danisowa on January 16, 2013, 09:14:24 AM
Hi Phil,

thanks for your answer.

I have tried to reproduce the "wrong" pdf.

i was able to reproduce by doing the following:
create new pdf set author to test
save document
open document with acrobat pro
change author to test1
save pdf

then i have two entries for the author and exiftool only changes the first one.

for me that means acrobat pro will produce a pdf thats not in the pdf standard right?

i was able to remove the doublicate entries by saving the pdf in reduced size (with acrobat pro)
Title: Re: delete a tag that is available multiple times
Post by: Phil Harvey on January 16, 2013, 09:49:47 AM
So Acrobat Pro behaves differently than Bridge.  Odd.

But if Acrobat writes it like this, it must be OK.  (Adobe defines the standard.)

So ExifTool must be wrong by displaying information from the other Info dictionaries.

Could you send me a copy of the PDF after you set Author to "test" using Acrobat Pro?  (I don't have Acrobat Pro myself.)

Thanks.

- Phil
Title: Re: delete a tag that is available multiple times
Post by: Phil Harvey on January 17, 2013, 08:23:28 AM
Thanks for the sample.

I re-read the PDF 1.7 specification, and all I can say is that that Adobe sucks.  It is clear from the specification that any modified object in an incremental PDF update should have the same object and generation number as before:

Page 63:
Together, the combination of an object number and a generation number uniquely identifies an indirect object. The object retains the same object number and generation number throughout its existence, even if its value is modified.

Page 99:
Because updates are appended to PDF files, a file can have several copies of an object with the same object identifier (object number and generation number). This can occur, for example, if a text annotation (see Section 8.4, "Annotations") is changed several times and the file is saved between changes. Because the text annotation object is not deleted, it retains the same object number and generation number as before.

Also, this is how it is done in the examples in appendix G.6 (page 1075) when modifying text annotations.

But for some reason Acrobat Pro is creating a new Info object instead of replacing the old one (in this case, the new Info object/generation number is 21/0, and the old Info object is 4/0).  Grrrr...  It really seems to me as if they are ignoring their own specification here. :(

The effect is that the old Info object remains visible since it still exists in the cross reference table as a valid entry.

I am convinced that Acrobat Pro is updating the PDF Info dictionary incorrectly, and will submit a bug report.  For this, I need to know the version of Acrobat Pro that you are using, and what system you are running.

The bottom line is that I don't want to patch ExifTool to read only the most recent Info dictionary.  But what I will do is change the priority of the tags in ExifTool so that tags in this Info dictionary take precedence.  This will at least display only the most recently written value of a tag when the -a option is not used.

- Phil

Edit: Thanks, I got the Acrobat Pro version via email, and have submitted the bug report to Adobe.