REQ: Detect if a PDF has bookmarks

Started by StarGeek, April 22, 2023, 02:50:11 PM

Previous topic - Next topic

StarGeek

I've been working with some PDFs and learning how to edit them. One thing I haven't been able to find an easy answer for is to see if a PDF already has bookmarks or if I need to add them.  Right now, I'm stuck with using this command
qpdf --json file.pdf | jq ".outlines" | grep -Poi "Title\": "

It uses qpdf to dump all the data in json format, then jq (aka sed for json data) to look for the "outlines" structure, then finally grep to see if there are any title entries, finally check the return code, 0=bookmarks, 1=no bookmarks.  A bit much when I'm just looking for a True/False answer.

If this wouldn't be an easy add then this can be ignored.

edit: Just figured out I could use pdftk to remove a step
pdftk file.pdf dump_data | grep "BookmarkBegin"
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

Can you upload a sample with a bookmark so I can take a look?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

Hi StarGeek,

Try this outlines.config file:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::PDF::Root' => {
        Outlines => {
            SubDirectory => { TagTable => 'Image::ExifTool::UserDefined::Outlines' },
        },
    },
);

%Image::ExifTool::UserDefined::Outlines = (
    Count => { Name => 'NumOutlines' },
);

1; #end

And this command:

exiftool -config outlines.config -numoutlines FILE

I think that the PDF contains bookmarks if NumOutlines exists.  It works for the files you sent, but you should probably do more testing.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

sevy

Hello,

I use ExifTool to get info concerning various collections of pdf files (title, author, keywords/subject, number of pages). I recently add -pagemode.
When bookmark is available, the value is "UseOutlines". When the value is "UseNone" or is empty, there is no bookmarks.


StarGeek

Quote from: Phil Harvey on April 22, 2023, 10:30:37 PMIt works for the files you sent, but you should probably do more testing.

A quick check looks good.  Many thanks.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

StarGeek

Quote from: sevy on April 23, 2023, 01:04:13 AMWhen bookmark is available, the value is "UseOutlines". When the value is "UseNone" or is empty, there is no bookmarks.

A quick check shows that I have PDF where this isn't the case.  Several files have bookmarks (and the above config works on it), but it does not have a PageMode tag.

Running on my Calibre library, I have 926 pdfs.  Checking to see which files have a NumOutlines tag from the above config and are either missing a PageMode or have a $PageMode!~/UseOutlines/i results in 435 files.  Of those, 60 of them come back with a NumOutlines of 0 (I'll have to edit the config to return not defined instead of 0).  That leaves 375 pdfs that have a usable NumOutlines but do not have a matching PageMode.

I still have to take a closer look at some of these to make sure but I do not think PageMode gives an accurate result regarding the existence of bookmarks.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

sevy

Thanks for pointing that. I will have to adapt my workflow.