Are there any good tools with GUI for pdf/djvu tagging and searching by tags?

Started by jbionic, July 24, 2023, 08:56:38 AM

Previous topic - Next topic

jbionic

Hi there,

I've tried to run a search before posting the question here, but quickly found that although the questions over pdf tagging have been raised quite often, the context seems a bit different.

I've been long using FastPhotoTagger for jpg/png/mp4 tagging and searching. While the tool suffices for my personal needs, someone asked me if I could recommend an equally good tool for pdf/djvu tagging, which made me stumble. Because I am accustomed to using a simple folder structure to group and keep the only pdf files that I have. While the person who asks is seemingly an owner of a larger collection.

The only contribution to the question from my side would be if there is any way to assign tags to separate pages of pdf files rather than the whole file? Surely, I'd expect such tags for such pages to be quickly searchable too with or without the use of the same tool.

Thanks in advance.

Phil Harvey

I don't know about other tools.  ExifTool only writes top-level metadata, not page-level.  It writes PDF files, but doesn't currently have the ability to write DjVu files.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

I simply dump all my PDFs (and epub, mobi, etc) into Calibre.  While I haven't checked, I'm pretty sure it can write data into PDFs.  I know it has plug-ins to do so for epub/mobi files.  And for actual books, it can download metadata from sites like Amazon and Goodreads.

While normally I want to actually embed metadata in the files, ebooks aren't like images where you can still do duplicate checking regardless of metadata changes.  So I tend to leave the metadata in Calibre.  At least until I can figure out a way to check for duplicates better.  I'm currently playing around with creating a preview sheet of the first 10 pages and then using Czkawka on the command line to detect similar previews.  Then a more detailed manual review for any matches.

I haven't heard of anything to tag only certain pages.  Unless you mean add annotations.  Quite a few PDF readers can add annotations.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

jbionic

Thanks, guys. Just to clarify, the question has been raised in the context of discussion over papers published at academia.edu.

The site has all publications grouped by topics: https://www.academia.edu/topics

One can even manage to get regular RSS updates (by using sophisticated workarounds) on most recent papers published within topics of interest, such as, for instance,
https://www.academia.edu/Documents/in/Metadata/MostRecent
https://www.academia.edu/Documents/in/Semantic_Web/MostRecent

The person who asked me the question has a collection of downloaded articles in pdfs from different topics of academia.edu. So he kinda thought about how to tag the papers (a part of his collection in djvu originates from a different source) in order to facilitate searching later. 

StarGeek

Calibre still sounds like the best option to me.

Calibre will read the metadata of the file upon import, extract info such as authors, publishers, date published, etc.  These show up in the tag browser on the left.  If something is incorrect, you can drag/drop them into the write place.  I think it will read Keywords/Subject for keyword type tags


The configure button at the bottom of that image is where you can create your own categories to sort and search on.

The one downside is that it imports everything and has its own directory structure based upon author and book title, so you can't keep your own directory structure. And the metadata is held in an index, though there are plugins for epub/mobi to write new/changed metadata to the files. I'm not sure about PDFs as I haven't looked into it, but I would assume that there is something to allow that.

It is free, open source, expandible via plugins, and cross platform (Windows, Mac, Linux).

Metadata editor window


I was about to say the only thing I would otherwise need would be a full text indexing of PDFs, but surprise, surprise, that was actually added in ver 6.0!  Though that obviously will take up extra disk space for the indexing.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

jbionic

Quote from: StarGeek on July 25, 2023, 10:49:03 AMCalibre still sounds like the best option to me.

Thanks a lot, StarGeek. I've heard of Calibre before, someone mentioned it on a Russian discussion board. Though I never managed to try it 'coz my personal collection ain't that big, and I am somehow accustomed to screenshoting and tagging the most interesting bits from the ebooks that I read instead of collecting whole ebooks

Can Calibre metatag ebooks in other languages (esp. Russian/Spanish) too? My apologies if you are not in the know of such details..   

StarGeek

Quote from: jbionic on July 25, 2023, 02:53:07 PMCan Calibre metatag ebooks in other languages (esp. Russian/Spanish) too? My apologies if you are not in the know of such details..



;)
(hopefully, that's spelled right, one of the only three Russian phrases I remember from when I attempted, and failed, to learn Russian)

Edit: One neat thing is that the top level groupings of the various tag entries is dynamic.  So the more entries there are in a group, the more likely it will get split off into it's own group.

So if I get enough Q and R tags, those will get separated, as would X, Y, Z, and Я.

I'm pretty sure that there's a setting to control this, in case you want a straight flat listing.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype