fixing PDF "Root object" "not found at offset" errors with i7j RUPS

Started by traycerb, October 13, 2023, 04:35:59 PM

Previous topic - Next topic

traycerb

I wanted to provide a solution to those, like me, who still encounter the following error when scanning a PDF: Root Object not found at offset
My specific error was: Root object (29 0 obj) not found at offset 14142663
Looking through some previous reports, it's either a problem with the program which created the .pdf or with exiftool's parsing of the metadata.

In any case, it prevented a batch file I was using from retrieving and updating the metadata on a pdf.  My workaround was to use i7j-rups, a Java PDF metadata analyzer/editor, to delete all the metadata (using the red 'X' in the lower pane in the attached screenshot).

I have no idea what the actual issue is with the .pdf itself (and the data is sensitive enough that I can't share the pdf itself), so perhaps my solution is imperfect, naive, or will cause more problems down the line, but it allowed exiftool to scan without errors, which in turn allowed my .bat to complete and that was good enough.

Phil Harvey

Basically, any utility that will rewrite the PDF should fix the Root offset problem.  It is possible there is a blank line or something at that offset, which ExifTool wouldn't skip but other utilities might.  If you could tell me the first 12 bytes or so at offset 14142663 in the original file it would help diagnose this.

- Phil

Edit: Actually, I checked the code and ExifTool will skip blank lines, so the problem is something else (some other type of unexpected white space?), but it would help to know what bytes are at that address.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

traycerb

Quote from: Phil Harvey on October 13, 2023, 08:49:51 PMBasically, any utility that will rewrite the PDF should fix the Root offset problem. 

understood.  It may be very basic from your standpoint, but I wasn't sure what would trigger that rewrite using the typical freeware pdf programs I use (foxit, Adobe reader), so wanted to lay out a method for myself and others.

Quote from: Phil Harvey on October 13, 2023, 08:49:51 PMIt is possible there is a blank line or something at that offset, which ExifTool wouldn't skip but other utilities might.  If you could tell me the first 12 bytes or so at offset 14142663 in the original file it would help diagnose this.

sure:
Offset(d) 14142663
32 38 20 30 20 6F 62 6A 0D 3C 3C 0D 2F 54 79 70 65 20 2F 50 61 67 65 0D 2F

attached a screenshot with a little bit more of that area for context.

Phil Harvey

Thanks.

The problem is that the Root object is "29 0", and that offset points to object "28 0".  I don't know how other software would deal with this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

traycerb

Quote from: Phil Harvey on October 14, 2023, 06:33:35 AMThe problem is that the Root object is "29 0", and that offset points to object "28 0".  I don't know how other software would deal with this.

just fyi it opens fine in all the other apps i tried (libreoffice was the only open source one), but exiftool may be more correct in flagging the error.

in any case, it's a minor issue and the workaround is satisfactory. thanks.