Search and remove Tags based on Regular Expressions

Started by Kugelblitz, June 18, 2018, 05:26:20 AM

Previous topic - Next topic

Kugelblitz

Hello,

this is my First post and I am new to ExifTool.

I have a lot of Travel Photography Images in Various Image Formats. Mainly JPG, CR2, DNG, PNG. And have tracked the Tours with a GPS logger.
I had used the software https://www.geosetter.de to match the Images to the Location of the GPS Logger and get all kinds of location based metadata of the GPS Coordinates saved in the EXIF / IPTC Tags, like Country, City, District and so on.
And also the Geotags for Flickr that I was using back then (geotagged; geo:lat=xx.xxxxxxxx; geo:lan=xx.xxxxxxxx;)



Now I have about 25.000 geotagged images and all that geo:lat and geo:lon mess up the browsing and editing tags in other programs like Lightroom, Picasa or Diffractor. There are just to many of them.
The only issue I have is that it adds the tags geo:lat, geo:lon and geotagged which seems redundant because the GPS information is already in the files. And I do not need the "flickr" Geotaggs anymore as I do not use Flickr anymore.


So I like to get rid of the geotags in the Keywords section of the Exif/IPTC Tag with the exiftool

Something like
exiftool -keywords-="geotagged" -xmp:subject-=geotagged -xmp:subject-=geo:lat= -xmp:subject-=geo:lon=  d:\pictures


I sort of have figured out the regular expression to get the Tags
geo\:lat\=[0-9]{1,2}\.[0-9]{1,8};geo\:lon\=[0-9]{1,2}\.[0-9]{1,8};geo\:lat\=\-[0-9]{1,2}\.[0-9]{1,8};geo\:lon\=\-[0-9]{1,2}\.[0-9]{1,8};geotagged;

But I have no clue how to write that so it works in exiftool as a batch search and remove for all files in all subfolders.

Thank you for reading this post and thank you for your help.

Phil Harvey

That's a long explanation.

To remove the "geo:lat=...", "geo:lon=..." and "geotagged" from the XMP subject, you could do this:

exiftool -sep xxx "-subject<${subject;s/(^|xxx)(geo:lat=|geo:lon=|geotagged).*?(xxx|$)/xxx/;s/(^xxx|xxx$)//}" DIR

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

This would work as well, would it not (with ver 10.87+)

exiftool -sep xxx "-subject<${subject@;$_=undef if /^(geo:lat=|geo:lon=|geotagged).*?/}" DIR

I'm just trying to get the hang of the @ option.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

Yes.  That's a better solution.  It was a bit fidgety taking care of the edge cases when processed as a single string.  You can even simplify a bit further:

exiftool -sep xxx "-subject<${subject@;$_=undef if /^(geo:lat=|geo:lon=|geotagged)/}" DIR

(the ".*?" was not needed)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Kugelblitz

Hello Phil,
thank you very much for your reply.

I have tried the code you provided and it did something - looks like it has rearranged the geo:lat geo:lon and geotagged Tags but not removed them.

I have added a Sample Image to this reply. Maybe you can see it for yourself and get the right code quicker than if I write back and forth..

Thank You very much Phil


Phil Harvey

% exiftool CIMG0461.jpg -subject
Subject                         : Deutschland, geo:lat=50.32272323, geo:lon=6.93208420, geotagged, Müllenbach, Rheinland-Pfalz
% exiftool CIMG0461.jpg -sep xxx '-subject<${subject@;$_=undef if /^(geo:lat=|geo:lon=|geotagged)/}'
    1 image files updated
% exiftool CIMG0461.jpg -subject
Subject                         : Deutschland, Müllenbach, Rheinland-Pfalz


(I'm on Mac, so I use single quotes)
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Kugelblitz on June 18, 2018, 10:57:13 AM
I have tried the code you provided and it did something - looks like it has rearranged the geo:lat geo:lon and geotagged Tags but not removed them.

Phil's command only removed the Subject keywords.  Your file also has them in the Keywords tag, so that needs to be added.

exiftool -sep xxx "-subject<${subject@;$_=undef if /^(geo:lat=|geo:lon=|geotagged)/}" "-Keywords<${Keywords@;$_=undef if /^(geo:lat=|geo:lon=|geotagged)/}" DIR

It didn't rearrange the tags, the Keywords tag had them in a different order than in the Subject tag.  So whatever program you used to look at them ended up reading them a different way.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Kugelblitz

#7
Hello Phil,
Hello StarGeek,

Thank you for your replies.

Have tried the code from StarGeek and that worked perfectly.
Thank you very much.

I decided just to remove the geo lat and geo lon tags and keep the geotagged tag so I can filter all Images with GPS coordinates.

How can I use it on a folder with all Subfolders that contain the pictures?  "-r"  If I recall it right?
exiftool -sep xxx "-subject<${subject@;$_=undef if /^(geo:lat=|geo:lon=)/}" "-Keywords<${Keywords@;$_=undef if /^(geo:lat=|geo:lon=)/}" -r DIR

I noticed the "original" jpg are still there called "CIMG0461.jpg_original"
How can that be automatically deleted too?

exiftool -sep xxx "-subject<${subject@;$_=undef if /^(geo:lat=|geo:lon=)/}" "-Keywords<${Keywords@;$_=undef if /^(geo:lat=|geo:lon=)/}" -r -overwrite_original d:\geotagged

Thank you again that was really helpful from you guys.

Cheers


Phil Harvey

Quote from: Kugelblitz on June 18, 2018, 02:03:53 PM
How can I use it on a folder with all Subfolders that contain the pictures?  "-r"  If I recall it right?

Yes.

QuoteI noticed the "original" jpg are still there called "CIMG0461.jpg_original"
How can that be automatically deleted too?

-overwrite_original

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Kugelblitz

SUCCESS

Ok the Process took a little more than two days but is finished now. Without any crashes or such.

5549 directories scanned
95869 image files updated
98375 image files unchanged
  215 files weren't updated due to errors

I am not sure about the 215 files with the errors. I was not watching all the time and I did not "log" the whole process. But I have copied some Warning Messages when I saw them:

Warning: Invalid PrintIM header - DIR

Warning: [minor] Error reading PreviewImage from file - DIR

Warning: [Minor] IPTC:Keywords exceeds length limit (truncated) - DIR

Warning: [minor] Fixed incorrect URI for xmlns:MicrosoftPhoto - DIR

Warning: [minor] Advanced formatting expression returned undef for 'subject' - DIR

Warning: Bad NikonScanIFD SubDirectory start - DIR

Warning: Can't read MakerNotes data. Ignored. - DIR

Error: [minor] Bad MakerNotes offset for NEFBitDepth - DIR

Warning: [minor] Tag 'subject' not defined - DIR

Guess that one just means there are no geotaggs in the Subject (Tags) Metadata of the Picture.

Is there a way to log the error messages Appart from the "[minor] Tag 'subject' not defined". Or maybe it is easier to log everything and then just delete all lines with the "subject' not defined" error.

Thank you for your help.
Cheers




Phil Harvey

The files that had errors will be the ones which didn't get their file modification date/time updated to the time when you ran the command.

To see only the errors and suppress all other output, you could add -q -q to the command.

To log the warnings/errors, you may be able to add 2>error_log.txt to the end of the command, depending on what command shell you are using.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Kugelblitz on June 20, 2018, 05:36:03 PM
Warning: [minor] Fixed incorrect URI for xmlns:MicrosoftPhoto - DIR

This can be safely ignored.  Microsoft is inconsistent with their own standard and exiftool will fix this if the tag is rewritten.

QuoteWarning: [minor] Advanced formatting expression returned undef for 'subject' - DIR
Warning: [minor] Tag 'subject' not defined - DIR[/b]

These are probably cases where, as you guessed, there weren't geo keywords to change.

QuoteWarning: [Minor] IPTC:Keywords exceeds length limit (truncated) - DIR

This one might require some fixing.  The IPTC:Keywords tag has a limited length according to the specs, but it is pretty much ignored by most software.  In this case it got truncated.  You might notice in one of your Digital asset management (DAM) programs where a long keyword exists that there is now an additional truncated version of it.

QuoteWarning: Bad NikonScanIFD SubDirectory start - DIR
Warning: Can't read MakerNotes data. Ignored. - DIR
Error: [minor] Bad MakerNotes offset for NEFBitDepth - DIR

There are cases where the MakersNotes might have been messed up in some way.  Picasa, for example, tends to treat Nikon MakerNotes badly, thought it usually just deletes them in my experience.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).