Day 2 of my induction/introduction to this software.
I have a large number of PDF's (>=10,000) that I would like to search the metadata in order to find if, and where, the search "string" exists and report a) the filename & b) either the tag name(s) that contain the string with their content and/or c) all the tags for the files that comply. Is this feasible?
The type of question I am trying to answer is something like -
"What files refer to 'Ennerdale' in their metadata and which are the tags that do so and what is their current content/context that relates to 'Ennerdale'
On a Mac I would do the following:
exiftool -a -G1 -s -ext pdf . | egrep "Ennerdale|====="
Thank you. I should have stated that I am running on a Windows platform.
I will have to look up what the impact on the code you provide would be to get the same when running Windows.
What is "egrep"?
Any pointers would be most welcome.
Quote from: Athlete on June 06, 2025, 07:21:12 AMThank you. I should have stated that I am running on a Windows platform.
I will have to look up what the impact on the code you provide would be to get the same when running Windows.
What is "egrep"?
Any pointers would be most welcome.
I don't know about Windows - egrep is a command that runs on MacOS or Linux and allows you to filter output using a regular expression.
So I am piping the complete metadata from all of the pdf files to the egrep filter which displays the name of every pdf file (regardless whether there is a match) followed by the tag name and value for each tag that contains the "Ennerdale" string).
I would add -r to the command if the pdf files were contained within multiple sub-folders.
Thank you. As far as I am aware "egrep" is not a command/function within Windows.
Unfortunately, it's not possible to do exactly as you say. There's no option to print only the tags that match a pattern. You would still end up printing most or all of the tags.
Try this (remove the i from "/Ennerdale/i" if you want a case-sensitive match)
exiftool -G1 -a -s -ext pdf -if "$All:All=~/Ennerdale/i" -PDF:All -XMP:All /path/to/files/
This will print out all PDF and XMP tags in a pdf if exiftool finds "Ennerdale" (case-insensitive) in any of the tags. From there, you would have to search the output to find exactly which tag contains "Ennerdale".
Exiftool's processing of PDF can be slow sometimes, especially when the PDF file is encrypted.
Thank you. Can you point me in the direction of understanding the $All:All component.
Is simply that you are searching across all tags?
Can that be refined to just say searching the PDF ones buy using $PDF:All syntax?
Having search round to try and answer my original post I came by the term filter as an API option. Does this functionality mimic that of the if?
Quote from: Athlete on June 06, 2025, 12:44:31 PMThank you. Can you point me in the direction of understanding the $All:All component.
From the docs on the
-TAG option (https://exiftool.org/exiftool_pod.html#Tag-operations)
QuoteA special tag name of All may be used to indicate all meta information (ie. -All)
QuoteIs simply that you are searching across all tags?
Yes. It is doing a Perl RegEx (Regular Expression) against all the tags.
QuoteCan that be refined to just say searching the PDF ones buy using $PDF:All syntax?
It doesn't look like it. It would work with XMP tags (and you want to include XMP tags because that is the more up to date PDF standard), but
XMP:All is handled a bit differently than
PDF:AllExample:
C:\>exiftool -G1 -a -s -if "$all:all=~/Adobe Acrobat/i" -pdf:all test.pdf
[PDF] PDFVersion : 1.6
[PDF] Linearized : Yes
[PDF] CreateDate : 2011:04:07 20:51:13-05:00
[PDF] Creator : Adobe Acrobat 10.0
[PDF] ModifyDate : 2012:09:06 13:46:21-05:00
[PDF] Producer : Adobe Acrobat 10.0 Paper Capture Plug-in
[PDF] PageCount : 66
C:\>exiftool -G1 -a -s -if "$pdf:all=~/Adobe Acrobat/i" -pdf:all test.pdf
1 files failed condition
QuoteHaving search round to try and answer my original post I came by the term filter as an API option. Does this functionality mimic that of the if?
No. The
-api Filter option (https://exiftool.org/ExifTool.html#Filter) applies a bit of Perl code to all the tags in the file. A common use of this would be to replace some characters or words with others in multiple tags. For example, you might want to replaces all Carriage Returns/Line Feeds with Line Feeds if the output is on a Linux/Mac system.
There's a way to use it with the
-if option (https://exiftool.org/exiftool_pod.html#if-NUM-EXPR) to find files that match things, but it doesn't work with
PDF:All
Quote from: StarGeek on June 06, 2025, 02:01:05 PMXMP:All is handled a bit differently than PDF:All
I don't understand this statement. In a
-if expression,
any $GROUP:all variable should evaluate to 1 if any tag exists in that group. From the
-p option documentation:
When "All" is used as a tag name, a
value of 1 is returned if any tag exists in the specified group,
or 0 otherwise (unless the "All" group is also specified, in which
case the values of all matching tags are joined).
So you may return the values of all PDF tags as you wanted by using
$all:pdf:all.
- Phil
Quote from: Phil Harvey on June 06, 2025, 03:13:16 PMQuote from: StarGeek on June 06, 2025, 02:01:05 PMXMP:All is handled a bit differently than PDF:All
I don't understand this statement. In a -if expression, any $GROUP:all variable should evaluate to 1 if any tag exists in that group.
Sorry, you're correct. I thought I had used something that allowed a comparison against the XMP data in bulk, but I can't figure out what I may have done, so I must have been wrong.
QuoteSo you may return the values of all PDF tags as you wanted by using $all:pdf:all.
So this
-if option would work?
-if "$All:xmp:all=~/Search Term/i or $all:pdf:all=~/Search Term/i"
Yes.
- Phil
I think I figured out what led me to my error. I was trying a lot of things out and one thing I tried was
-api "filter=s/Adobe Acrobat//i" -if "$XMP# ne $XMP"
That would change the value from Binary data XXXX bytes to Binary data YYYY bytes and register as true. But since there isn't a similar tag for PDF, that was failing. I just forgot that I wasn't using XMP:All.
Ah. Interesting way to do this. Using $all:xmp:all=~/Adobe Acrobat/ is maybe a bit more straightforward, but note that this was a fairly recent feature:
Jan. 23, 2024 - Version 12.74
- Enhanced tag name strings (eg. -if and -p option arguments) to allow values
of multiple matching tags to be concatenated when a group name of "All" is
specified
- Phil
Sorry for the late reply of "Thanks" as I have been offline for a couple of days.
I appreciate the answers and an insight to your "coding" but that was way beyond my level of understanding.
Should you be able to include the ability "to print only the tags that match a pattern" as a future enhancement that would be an asset for me.
This is fascinating - I'll make a note of this technique.
Much better than my first response - which doesn't strictly work because it also searches for matching strings in the tag and group names.
I should have realized earlier, but if you need to search across all files and all tags on a regular basis, you should start using a Digital Asset Management (DAM) program, such as Lightroom (paid), Darktable (https://www.darktable.org/), or DigiKam (https://www.digikam.org/) (both free).
These programs will create a database of all the metadata in the files, making it significantly faster to search through all the metadata. They will still save the data in the file or in an XMP sidecar file, so you're not locked into using only that program.
Exiftool only knows what is in the file it is currently processing and has to read all the files again if you need to do another search. Once you start getting into thousands, tens of thousands, or more files, exiftool is the less optimal solution.
Quote from: StarGeek on June 08, 2025, 08:55:28 AMI should have realized earlier, but if you need to search across all files and all tags on a regular basis, you should start using a Digital Asset Management (DAM) program, such as Lightroom (paid), Darktable (https://www.darktable.org/), or DigiKam (https://www.digikam.org/) (both free).
These programs will create a database of all the metadata in the files, making it significantly faster to search through all the metadata. They will still save the data in the file or in an XMP sidecar file, so you're not locked into using only that program.
Exiftool only knows what is in the file it is currently processing and has to read all the files again if you need to do another search. Once you start getting into thousands, tens of thousands, or more files, exiftool is the less optimal solution.
Possibly so - I'm not familiar with Darktable or Digikam but Lightroom doesn't have access to the same range of metadata as Exiftool.
Out of curiosity I ran a search of 11,000 images on external drives (with tens of thousands of other non-image files on those drives) and it took less than 40 seconds.
And it didn't involve importing into a DAM.
The search was:
-r -FileName -if '$All:All=~/Animal Face/i' -ext jpg -ext raf so it was a full text scan of all tags from all 11,000 jpg and raf files.
Not sub-second response I agree but when you want to search for obscure maker specific metadata then Exiftool is the only way to go.
What kind of drives?
I defninately can't get that kind of speed from my drives, mostly 5200 rpms drives.
Imatch (Windows only), created by Mac2 on these forums, uses exiftool on the back end, and you can tell it to index anything that exiftool can read.
Quote from: StarGeek on June 08, 2025, 02:12:03 PMWhat kind of drives?
I defninately can't get that kind of speed from my drives, mostly 5200 rpms drives.
Imatch (Windows only), created by Mac2 on these forums, uses exiftool on the back end, and you can tell it to index anything that exiftool can read.
These were Samsung T7 SSDs - I'm been migrating my spinning disks off to backup duty - and as prices come down I'll be moving more to the faster Thunderbolt SSDs.
Imatch looks interesting - I wonder if there is something similar for the Mac.