Batch copy Description to Keywords with stop words

Started by Rusam, November 13, 2020, 11:21:17 AM

Previous topic - Next topic

Rusam

Dear Phil and other program developers!

Can the exiftool.exe program on Windows do the following.

IPTC .jpg or .mov files have Description:

QuoteThis is a beautiful elephant against the background of savvana

Is it possible to form a command line that copies Description to Keywords? At the same time, they should be separated by commas and remove the words that are blacklisted.

That is, the following should appear in the keywords in IPTC:

Quotebeautiful, elephant, against, background, savvana

Words "This, is, a,  of" are blacklisted and should not be inserted.

This should work in batches for the entire directory.

Thanks in advance!

StarGeek

For images, it's possible, but the blacklist might get really messy.  For video, the problem is figuring out where the keywords would go.  That depends upon what program you're using to view the keywords.

For images, your command would be something like
exiftool -sep "," "-Subject<${Description;s/word1|word2|word3|etc|wordN//g;s/ +/,/g}" /path/to/files/

Any blacklisted word would be separated by the pipe | character.  This would apply to alpha-numeric characters, as certain punctuation characters, such as ()[]!+^${} would have special meaning in this context.

I assumed in this that you meant IPTC Core, not IPTC IIM/Legacy tags.  If you meant the latter than it would be Keywords/Caption-Abstract instead of Subject/Description.

For videos, it would be similar, but it would also depend on what you would be viewing the keywords with.  If you are using Windows Properties->Details->Tags, than it isn't possible as Windows uses the Microsoft:Category tag which pretty much no software other than Windows has the ability to write.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

#2
StarGeek, you are a great  and this program is great!
Everything works!

  To remove the dot, I put a slash:
| \. |

It is important for me to remove the quotes: "
Nothing helps yet.

Backup files appear: _original
They are not needed. :)

StarGeek

Quote from: Rusam on November 13, 2020, 01:19:37 PM
It is important for me to remove the quotes: "
Nothing helps yet.
Are the quotes in the original description?  If so, then try adding \" as a blacklist word, e.g.  word1|word2|\"|word3  Quotes are tricky on the Windows command line.

If quotes are not in the original description, most programs will display quotes when listing keywords that have spaces or similar characters in them.

QuoteBackup files appear: _original
They are not needed. :)
Add the -overwrite_original option.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

Quote from: StarGeek on November 13, 2020, 01:31:47 PM

Are the quotes in the original description?  If so, then try adding \" as a blacklist word, e.g.  word1|word2|\"|word3  Quotes are tricky on the Windows command line.
I did both forward slashes and backslashes. Nothing works.

Quote from: StarGeek on November 13, 2020, 01:31:47 PM
Add the -overwrite_original option.
Thank you!


Rusam


And further:

If the blacklist contains the the script destroys the word, for example, clothes. It turns out clos.
Is it possible to make the script respond not to a part of the word, but to the whole word?

StarGeek

Ah, yes, sorry.  Try this
"-Subject<${Description;s/\b(?:word1|word2|word3|etc|wordN)\b//g;s/ +/,/g}"

That will only cut on word boundaries.  There will still be some exceptions, such as contractions.  For example, blacklisting John will change John's into 's

Quote from: StarGeek on November 13, 2020, 01:31:47 PM
Are the quotes in the original description?
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

#7


Quote from: StarGeek on November 13, 2020, 01:31:47 PM
Are the quotes in the original description?
Yes, I'm sorry. Here's an example:
Quote"Scarlet Rose" cream cake.
Me need it like this:
Scarlet, Rose, cream, cake
And it turns out like this:
Quote"Scarlet, Rose", cream, cake.

StarGeek

As I said, double quotes are tricky to get passed through on the command line.  But you can use hex notation (had to look this up).  Add \x22 as a blacklisted word.
C:\>exiftool -g1 -a -s -description -subject y:\!temp\Test4.jpg
---- XMP-dc ----
Description                     : "Scarlet Rose" cream cake.

C:\>exiftool -P -overwrite_original "-subject<${Description;s/\x22|\.//g;s/ +/,/g}" -sep "," y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -g1 -a -s -description -subject y:\!temp\Test4.jpg
---- XMP-dc ----
Description                     : "Scarlet Rose" cream cake.
Subject                         : Scarlet, Rose, cream, cake
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

Quote from: StarGeek on November 13, 2020, 02:18:08 PM
Add \x22 as a blacklisted word.

Thanks a lot, StarGeek, it is works!

I used to make php-file to insert from the description into the keywords. But there was a lot of manual copy-paste work there. And one file at a time.
exiftool has great possibilities!

StarGeek

There's a problem with your second command.  It makes a single, long keyword instead of multiple keywords (see FAQ #17).

So in the example above, you get one keywords of
Scarlet, Rose, cream, cake
instead of four separate keywords
Scarlet
Rose
cream
cake

For the second command, add -sep ", "

To double check the tag, try using different characters for the -sep option output
C:\>exiftool -g1 -a -s -sep "##" -subject y:\!temp\Test4.jpg
---- XMP-dc ----
Subject                         : Scarlet##Rose##cream##cake

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

Quote from: StarGeek on November 13, 2020, 02:30:46 PM

For the second command, add -sep ", "


Когда я добавил -sep ", ", у меня стало:
QuoteScarletRosecreamcake

А если такой код, который я написал выше, то все нормально:
QuoteScarlet, Rose, cream, cake

Rusam

StarGeek, you are a great specialist. I have re-read all threads with the same problem but couldn't find a solution anywhere.

I can explain with pictures. I've simplified everything for the experiment.

The d disk contains the m directory.

There are two files: 1.jpg and meta.csv .



Here is the content of the meta.csv file:
Quotefilename, keywords, description, title, content warnings, country, poster timecode
"1.jpg", "Women, Beautiful, Fashion, People, Caucasian Ethnicity", "", "", "", "", ""

I want to transfer keywords from the csv file to IPTC images.
I run the command:
exiftool.exe -csv = D:\m\meta.csv --Subject d:\m\

But nothing happens. Major error No SourceFile 'd:/m/1.jpg' in imported CSV database .



I set uppercase and lowercase drive letters - nothing works. :(

StarGeek

Quote from: Rusam on November 13, 2020, 03:05:12 PM
But nothing happens. Major error No SourceFile 'd:/m/1.jpg' in imported CSV database .

From the docs on the -csv option
   A special "SourceFile" column specifies the files associated with each row of information

Your example CSV has a "filename" column, not a "SourceFile" column.

Also take note that the "SourceFile" column must be a fairly exact match.  If you only have the filenames, no path, in that column, then the current dir must be the same as the file.

Also, as mentioned in the docs, you're going to need the -sep option to make sure the Keywords are split up into separate keywords as I mentioned above.

Another thing to watch for is that you're mixing groups.  Keywords is an older IPTC IIE/Legacy tag, while Description is a newer XMP tag, part of the IPTC Core.  The Subject tag is the place for keywords in the IPTC Core schema.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

#14
Thanks a lot, StarGeek.

We'll have to edit large csv files every time.

StarGeek

From the docs on the -csv option
   Special feature: -csv+=CSVFILE may be used to add items to existing lists. This affects only list-type tags
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

#16
Quote from: StarGeek on November 13, 2020, 04:41:19 PM
From the docs on the -csv option
   Special feature: -csv+=CSVFILE may be used to add items to existing lists. This affects only list-type tags

Alas, it doesn't work. Neither does -j work.
Anyway, existing keywords tags are removed and inserted from the csv instead.

StarGeek

Does you CSV file have Keywords or Subject?  See my previous note about mixing groups.

It works here
C:\>type test.csv
Sourcefile,Subject
y:\!temp\Test4.jpg,"CSV Test1,CSV Test2"

C:\>exiftool -g1 -a -s -Subject y:\!temp\Test4.jpg
---- XMP-dc ----
Subject                         : Original 1, Original 2, Original 3

C:\>exiftool -P -overwrite_original -sep , -csv+=test.csv y:\!temp\Test4.jpg
    1 image files updated

C:\>exiftool -g1 -a -s -Subject y:\!temp\Test4.jpg
---- XMP-dc ----
Subject                         : Original 1, Original 2, Original 3, CSV Test1, CSV Test2
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

I wrote my code above - it's three lines.

The first two lines add keywords from the Description.

Yes, keywords appear.

But when the third line works (with -csv + =), the keywords from the Description disappear and instead of them keywords from csv appear.

Rusam

Excuse me, please. I think I begin to understand. I need not Subject, and Keywords. Now I try to change.

Rusam

#20
I redid the commands and everything works!


I thank you so much!

StarGeek

Try
"-Subject<$Description,$Subject"

Though if you're worrying about the order the keywords are in, you might be better off re-thinking things and seeing if there's another tag that would work better.  Any program that deals with keywords doesn't care about the order (some will re-order the keywords anyway) and you'll be ending doing a lot more work than is really needed.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

#22
Thank you, StarGeek.
I read the forum. There is a lot of delight here!
I discovered NoDups - it's a fairy tale!
I discovered $_= lc
But this does not work at all -api 'Filter=tr/[A-Z]/[a-z]/. But it is not important.
But I so couldn't beat the doubling of commas. keyword1, keyword2, , keyword3, keyword4, keyword5, keyword6
If I specify deletion, then nothing happens. "-Subject<${Subject;s/, ,// g; s / + / / g}"
If I write one comma, it removes all commas in Subject. "-Subject <$ {Subject; s /, // g; s / + / / g}"
I have not found a command on the forum to replace two commas with one comma - that would be great!

I read https://exiftool.org/forum/index.php?topic=8495.0 and https://exiftool.org/forum/index.php?topic=8265.0 - It did not help.

StarGeek

#23
Quote from: Rusam on January 02, 2021, 05:15:52 PM
But this does not work at all -api 'Filter=tr/[A-Z]/[a-z]/.

The tr operator doesn't take regex, just a character range. That's a mistake I keep making. So 'Filter=tr/A-Z/a-z/' would work.  But lc() is a better option.

QuoteBut I so couldn't beat the doubling of commas. keyword1, keyword2, , keyword3, keyword4, keyword5, keyword6

Remember, the commas are separators.  They're don't actually exist in the tag unless you put them in a keyword.  You most likely have an empty or spaces only entry.  Try changing the separator with the -sep option to see where the separations are.  Something like -sep '##'

Try
exiftool -Subject-= /path/to/files/
to remove empty 0-length keywords.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).


Rusam

Greetings, dear developers!

Can the program cope with such a problem?

There are files with different number of keywords. So, me only need 10 keywords. That is, me need to remove all the keywords at the end of the keyword list so that only the first ten words remain.

StarGeek

Based upon this previous post, try
exiftool -sep ### "-Subject<${Subject;my @a=split /###/,$val;my @b=splice(@a,0,10);$_=join '###',@b}" /path/to/files/
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Rusam

Thank you, StarGeek!
You are together with Phil Harvey - Great Masters! And exiftool is a magical, which can a lot!