News:

2023-03-15 Major improvements to the new Geolocation feature

Main Menu

Extracting only keywords from lot of source files

Started by Jom, December 18, 2019, 12:17:42 AM

Previous topic - Next topic

Jom

Hi there.
How to extracting only keywords from lot of source files to one text file without both tag names and filenames, only list of keywords without their duplicates.

StarGeek

Exiftool can't do this by itself because it processes each file independently of the others.

The basic command to get all the keywords listed one after the other would be (see third paragraph under the -sep option)
exiftool -sep "\n" -sep "\n" -b /path/to/files

On linux (and mac?), you could do this to remove duplicates
exiftool -sep "\n" -sep "\n" -b /path/to/files | sort | uniq

You can install GNU utilities for Win32 to get access to various unix commands on Windows, including sort and uniq.

Powershell has some cmdlet that can replicate this behavior (see this SuperUser answer) but you have to remember that PS will corrupt binary output when using redirection or piping.

And using some sort of script is always an option.  For example, any time I need to get a unique list of items, I copy the list to the clipboard and run a small AutoIt3 script that grabs the clipboard, removes duplicates, and puts the results back on the clipboard.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Jom

I have not been able to deal with your suggestion, however the requirement to remove duplicates can be eliminated.
The main thing is that all keywords are will be in one file.
It is not necessary to format the list of keywords, they should be in their original form according to how they stored in the metadata.

Phil Harvey

This command will put all Keywords into a .txt file (one keyword per line):

exiftool -keywords -sep "\n" -sep "\n" DIR > out.txt

If you want to also remove duplicates, on Mac/Linux you can do this:

exiftool -keywords -sep "\n" -sep "\n" DIR | sort | uniq > out.txt

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Jom

exiftool -keywords -sep "\n" -sep ", " . > out.txt
Error: File is empty - ./out.txt


But file not empty

======== ./20191106_135630_CanonEOS600D_163066096287_100_0766.jpg
Keywords                        : drill, bit, wood, tool, hole, power, metal, work, macro, industry, home, carpentry, background, diy, steel, set, workshop, woodworking, twist, spiral, screw, repair, woodworker, woodwork, twisted, timber, technology, make, instrument, equipment, drillbit, different, cutting, craft, black, closeup, size, drilling, isolated, white


Why does he write that the file is empty?
How to get a clean list without file name and tag name?
Only like this:

drill, bit, wood, tool, hole, power, metal, work, macro, industry, home, carpentry, background, diy, steel, set, workshop, woodworking, twist, spiral, screw, repair, woodworker, woodwork, twisted, timber, technology, make, instrument, equipment, drillbit, different, cutting, craft, black, closeup, size, drilling, isolated, white

Phil Harvey

You can ignore the "File is empty" message.  It was empty when ExifTool ran on the file, because it was being written as ExifTool was running.

Ooops.  I forgot the -b option. Add this to get a clean output file:

exiftool -keywords -sep "\n" -sep "\n" -b --ext txt DIR > out.txt

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Jom

This code does what I need:

exiftool -keywords -b -sep ", " -sep "," . > keywords.txt

drill, bit, wood, tool, hole, power, metal, work, macro, industry, home, carpentry, background, diy, steel, set, workshop, woodworking, twist, spiral, screw, repair, woodworker, woodwork, twisted, timber, technology, make, instrument, equipment, drillbit, different, cutting, craft, black, closeup, size, drilling, isolated, white,drill, bit, wood, tools, hole, power, metal, work, macro, industry, home, carpentry, background, diy, steel, set, workshop, woodworking, twist, spiral,

Thanks Phill, thanks StarGeek.

The last comma probably can't be removed?


Phil Harvey

The final terminator is specified by the second -sep option.  There is no way around this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Jom