Remove everything before a specific word in a text

Started by johnicepick, February 02, 2024, 07:39:51 AM

Previous topic - Next topic

johnicepick

Hello everyone,

I would like to know, how I can remove text before a specific word. I know how to delete it after, but not before
In this case I want to delete everything before Dutch Eredivisie

The description reads like:

ROTTERDAM, NETHERLANDS - JANUARY 28: Referee Allard Lindhout during the Dutch Eredivisie match between Feyenoord and FC Twente at Stadion Feyenoord on January 28, 2024 in Rotterdam, Netherlands. (Photo by John Doe)

I use this code to get rid of everything after and including "match". That works.
 exiftool -k -overwrite_original "-Description<${Description;s/\match.*?$//;}" c:\FTP-Root\ftp-databak\Description-Test1
After I applied the code it codes like this:
ROTTERDAM, NETHERLANDS - JANUARY 28: Referee Allard Lindhout during the Dutch Eredivisie
Which is good, now I want to get rid of the text before the green Dutch Eredivisie.

But I can't figure out how the code would look if I want to get rid of everything before (and including the word) "during the".

Why this procedure you might ask?
I have a customer who only delivers the full description, but all other IPTC fields are empty. However the database requires that words like "Dutch Eredivisie" appear in one specifiek IPTC field as well ("Supp Cat 1" to be more specific).
I guess that extracting keywords like "Dutch Eredivisie" would be too complex for ExifTool, thats why I use the workaround to delete everything before and after, so that only Dutch Eredivisie would remain.

Seeking your advice,
Thank you very much!

The code to delete everything behind (and including that word) I got from here:
https://exiftool.org/forum/index.php?topic=9681.0
Proof again that you should never delete your forum :)

Phil Harvey

#1
"-description<${description;s/.*match/match/}"

the question here what do you want to happen if "match" occurs more than once in the description?  The code above will remove everything before the last occurrence.  To remove everything before the first occurrence, do this:

"-description<${description;s/.*?match/match/}"

also, your code to remove everything after "match" had an unnecessary "\" and the terminating "$" is redundant.  And you didn't need the terminating semicolon:

"-Description<${Description;s/match.*//}"

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Phil Harvey on February 02, 2024, 08:20:56 AMalso, your code to remove everything after "match" had an unnecessary "\" and the terminating "$" is redundant.

I had a feeling that this is from something I posted and following the links it ends up with this post by me from 5¾ years ago.  I have a tendency to be overly careful and overengineer my regex because it can be so easy to have unexpected results.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

johnicepick

Thank you again Phil!

The descriptions are usually standardized and look the same, so encountering a second "match" is unlikely.
On the other hand: photographers / agencies should also deliver all the IPTC fields pre-filled  :o

Normally all the other IPTC fields contain the information to create the description. And these other IPTC fields are necessary to make sure that the photos enter in the right folder in the database and appear on those stock photo sites correctly.
However, the customer is delivering them to us empty and is unwilling to deliver them, "unless you cover us in gold" to quote them literally. To automate hundreds of images I have to derive the information from the description so basically: doing some kind of reverse engineering. The alternative would have been to copy paste the information per picture manually, hundreds of them.
I've already expressed my... opinion about that matter to my collegues.

And thank you StarGeek,
The link I provided above leads to that post you're reffering to, if I would have read it properly I would have deleted the unnecessary "\" and other stuff (stuff that Phil also mentioned).
Thank you guys, you saved me so much time and nerves!