Shell script for Google IPTC metadata how to

Started by Velo145, January 11, 2024, 10:41:04 PM

Previous topic - Next topic

Velo145

Hi,

New to UNIX command lines in Terminal and new to Exiftool, but after hours with Adobe Creative Suite and metadata software, I wish I would have started here first.

I found this post and want to clarify how to use it:

Quote from: Meta Monkey on August 17, 2021, 12:27:09 AMJust summing up this thread to save others the leg work in future.

A good summary of Google Images and IPTC metadata.
Quick guide to IPTC Photo Metadata and Google Images
https://iptc.org/standards/photo-metadata/quick-guide-to-iptc-photo-metadata-and-google-images/

Use this tool to test your images for that metadata ( Check "Metadata relevant for photos shown as result by search engines" for Option A or Option B ).
Get IPTC Photo Metadata
https://getpmd.iptc.org/getiptcpmd.html

And last but not least, here are the corresponding fields for ExifTool, and a nice little bash script to populate these fields in your images (just replace the ALLCAPS values with your own).

ExifTool Google Image IPTC metadata Tags

Copyright Notice: -xmp-dc:rights
Creator: -xmp-dc:Creator
Credit Line: -xmp-photoshop:Credit
Web Statement of Rights: -xmp-xmprights:WebStatement
Licensor URL: -xmp:LicensorURL

Links to relevant Tag sections:
https://exiftool.org/TagNames/XMP.html#dc
https://exiftool.org/TagNames/XMP.html#photoshop
https://exiftool.org/TagNames/XMP.html#xmpRights
https://exiftool.org/TagNames/XMP.html#Licensor

Bash Script for populating fields

#!/bin/bash
# IPTC Metadata for Google Images
# Run ./exiftool-iptc-google.sh

# Target directory
local dir="/PATH/TO/FILES"

# Target filename
local filename="*.jpg *.png"

# Tags ( https://exiftool.org/TagNames/XMP.html ) - eval used to execute the multiline string as a shell command.
local tags="-xmp-dc:rights='Copyright © STATEMENT' \
-xmp-dc:Creator='CREATOR NAME' \
-xmp-photoshop:Credit='CREDITS' \
-xmp-xmprights:WebStatement='HTTPS://LICENSORURL.COM/WEBSTATEMENT' \
-xmp:LicensorURL='HTTPS://LICENSORURL.COM' \
"
read -p "Enter working directory: " -i "$dir" -e dir
cd $dir && \
printf "Done: Working directory set to $dir\n"

read -p "Enter filename/s: " -i "$filename" -e filename
# eval: eval [arg ...] Execute arguments as a shell command.
# Combine ARGs into a single string, use the result as input to the shell, and execute the resulting commands.
eval exiftool -overwrite_original \ $tags $filename

Final food for thought - it might be handy to highlight these "Google Image" fields with their own page or section on the site, I know I spent longer than I'd have liked to tracking them down...

Anyways, thanks to all for the handy pointers here, and as a first post, I must say thanks Phil for the BIGLY TREMENDOUS ExifTool!!! It's frekkin rad! 8)

Do I copy the text of the shell script to my text editor, edit my details, and then save as a .sh file on desktop? What command do I then use in Terminal to execute it? Or do I paste the code into Terminal?

Also, how many images in a directory would be reasonable to run this on at one time? Could I run it on 30,000-40,000 images at one time? Working off a 2012 MacBook Pro Retina, Mojave, 8 GB Ram, about 100GB free space.

 :D Thanks.

StarGeek

Quote from: Velo145 on January 11, 2024, 10:41:04 PMDo I copy the text of the shell script to my text editor, edit my details, and then save as a .sh file on desktop? What command do I then use in Terminal to execute it? Or do I paste the code into Terminal?

Pretty much yes, though I don't know what the standard location would be for saving scripts. Copy/paste it into a text editor (not a word processor) and save it with whatever name you want.  You would have to make sure that it has executable permission, and you would run it by typing in the name you saved it as.

QuoteAlso, how many images in a directory would be reasonable to run this on at one time? Could I run it on 30,000-40,000 images at one time?

I can't answer anything about the script, but there would be no problem on the exiftool side.  I've run exiftool commands that processed a couple of hundred thousand images in one shot before.

I strongly advise testing it out before running it on a large batch of files.  Use exiftool with the command in FAQ #3 to check the data before and after running it to make sure it's not overwriting important data that is already in the file.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Velo145

Thanks for the reply StarGeek.

My only experience with a shell script before was with a zsh script that saved to my desktop and had to use the command
chmod +x ./SCRIPT-NAME.zsh to initialize or activate it (pardon the non accurate description of what the command did). I wasn't sure if all shell scripts that are run from Terminal have to be "activated" before using. Does anyone know about this particular kind of script?

Thanks for pointing me to the commands in FAQ #3. Upon running them, it appears that I have multiple tags for some of the same meta data fields (if I'm saying that correctly). This confirms why I am seeing a [NOTsync1] on multiple of my IPTC fields when I check one of my images through the online IPTC Get Photo Metadata tool.

The majority of the images I am changing metadata on are PNG. The XML IPTC metadata was added from Adobe Illustrator (CS6), then they were exported using a batch saving javascript (that utilized the now defunct Adobe ImageReady) that erased most of the metadata, and then XML IPTC metadata was added in again via Adobe Bridge. Thus, the mess of meta data I currently have.

Here's my plan, which I'm open to suggestions about:

Make my own backups, then remove all meta data from the png files:

exiftool -all= -overwrite_original /to/my/directory
Then, use this script to add back in the IPTC fields for Google to read (copyright, etc.).

Finally, these will likely get converted to Webp files using the Google Webp codec via a shell script or via online (wordpress plug-in) tool. It seems to make the most sense to me to do the metadata work before webp. Any thoughts?

From what I tested on the png files, running the
exiftool -all= command leaves some basic exif data for them, including size, compression, etc. They are still being read by web browser and can be opened in applications. But I'll test them all along the way.

As a greenhorn, any feedback on my process above is welcome.

Again, thank you Phil for this tool and all who add to it. Indispensable.

StarGeek

Quote from: Velo145 on January 12, 2024, 04:12:10 PMMy only experience with a shell script before was with a zsh script that saved to my desktop and had to use the command
chmod +x ./SCRIPT-NAME.zsh...
I wasn't sure if all shell scripts that are run from Terminal have to be "activated" before using.

I'm pretty sure that any script needs +x in order to run under Mac/Linux.

QuoteThe majority of the images I am changing metadata on are PNG. The XML IPTC metadata was added from Adobe Illustrator (CS6), then they were exported using a batch saving javascript (that utilized the now defunct Adobe ImageReady) that erased most of the metadata, and then XML IPTC metadata was added in again via Adobe Bridge. Thus, the mess of meta data I currently have.

I believe you mean XMP. XMP is similar to XML, but not quite the same.

The IPTC standard uses tags in the XMP namespace, though not all of them are actually in the IPTC specific groups, XMP-iptcCore/XMP-iptcExt. Some of the tags in the spec will be in Dublin Core XMP-dc, Photoshop XMP-photoshop, and a few other subgroups.

Note that all of these appear in the XMP group.  There is an older IPTC standard called IPTC IIM.  This group is what exiftool calls the IPTC group.  The reason you are getting the NOTsync1 warning on that website is probably because there have been changes to one, but not the other, so, as it says, they are no longer in sync.

If possible, you would want to avoid using the old IPTC standard and keep with the XMP tags, but sometimes it isn't possible depending upon your software.

QuoteMake my own backups, then remove all meta data from the png files:

Personally, I'm against removing data in bulk like this. It's better to keep the data that is there and edit it to fix anything that needs fixing. There are command files, called Args files, that exiftool can use to properly sync data between XMP, IPTC, and EXIF as needed.  They can be found on GitHub. You would save the raw text of these files and the instructions for using them are in the files.

Another problem with deleting all data is that it will also delete color space data, like ICC_Profile, and you would end up altering the colors of the image. If you are going to delete all the data, you want to make sure and keep the color data with a command such as
exiftool -All= --ICC_Profile -TagsFromFile @ -ColorSpaceTags /path/to/files/

exiftool -all= -overwrite_original /to/my/directory
QuoteFinally, these will likely get converted to Webp files using the Google Webp codec via a shell script or via online (wordpress plug-in) tool. It seems to make the most sense to me to do the metadata work before webp. Any thoughts?

I would check to see if these conversions will copy the metadata to the new file. It's pretty common for programs to not do so, though I believe ImageMagick will now do so, though it didn't for quite some time.

QuoteFrom what I tested on the png files, running the
exiftool -all= command leaves some basic exif data for them, including size, compression, etc. They are still being read by web browser and can be opened in applications.

These are probably not EXIF, IPTC, or XMP tags.  Run the FAQ #3 command, but use -G instead of -G1.  You'll find that the remaining data will all be part of the File or System group, meaning they are properties of the file, such as image width/height, file size, and are not data that is embedded in the file.  Two exceptions two this would be Adobe APP14 tags (which can affect the colors if removed), and color space tags if you keep them as per the above command.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Quote from: StarGeek on January 12, 2024, 05:08:24 PMI'm pretty sure that any script needs +x in order to run under Mac/Linux.

Unless you run it like this:  zsh script

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Velo145

Yes, I meant XMP, not XML - thanks for the catch StarGeek.

The reason I like the idea of deleting all exif data possible is for file size - these are for the web, so user experience and Google rankings are being considered. Saving 20k per image may seem trivial, but the website products each have 200+ images per product, as they are variable products. So every byte matters.

Of course I don't want to delete anything that's necessary, including color profiles.

The PNG's are 8-bit pngs and are saved as indexed color. I read somewhere that indexed color pngs cannot hold to a color profile, but when I look at these with Adobe's XMP meta panel with Bridge, Illustrator or Photoshop, it shows the color profile sRGB IEC61966-2.1 is connected with the image. Preview app on Mac also shows the sRGB IEC61966-2.1 color profile that I originally set in Illustrator. I still see that in Adobe and Preview apps after deleting "all" exif data with Exiftool on the images as well:

exiftool -all=
After running above command, I don't see the color profile in the Exiftool data (below) by running exiftool -a -G1 -s (or just G), unless that is "hidden" inside the third to last line: (Binary data 384 bytes, use -b option to extract).

I will try the commands you suggested that exclude color profiles to see if anything looks different.

[ExifTool]      ExifToolVersion                 : 12.70
[System]        FileName                        : Name-of-my-file.png
[System]        Directory                       : .
[System]        FileSize                        : 41 kB
[System]        FileModifyDate                  : 2024:01:12 09:29:59-10:00
[System]        FileAccessDate                  : 2024:01:12 16:56:13-10:00
[System]        FileInodeChangeDate             : 2024:01:12 16:56:11-10:00
[System]        FilePermissions                 : -rw-r--r--
[File]          FileType                        : PNG
[File]          FileTypeExtension               : png
[File]          MIMEType                        : image/png
[PNG]           ImageWidth                      : 1200
[PNG]           ImageHeight                     : 1200
[PNG]           BitDepth                        : 8
[PNG]           ColorType                       : Palette
[PNG]           Compression                     : Deflate/Inflate
[PNG]           Filter                          : Adaptive
[PNG]           Interlace                       : Noninterlaced
[PNG]           Palette                         : (Binary data 384 bytes, use -b option to extract)
[Composite]     ImageSize                       : 1200x1200
[Composite]     Megapixels                      : 1.4

Thanks for the assistance!

Velo145

In liew of the shell script referred to in this post, I have put together a command to add in the IPTC data that Google (Image) Search reads. I'm hoping someone can look and see if I have ordered the commands correctly.

I want to delete all exif data, except xmp-photoshop:DateCreated, and then add back in the Google Search-read IPTC fields, and overwrite the originals.

exiftool -all= --xmp-photoshop:DateCreated -overwrite_original \
-xmp-dc:rights='Copyright © STATEMENT' \
-xmp-dc:Creator='CREATOR NAME' \
-xmp-photoshop:Credit='CREDITS' \
-xmp-xmprights:WebStatement='HTTPS://LICENSORURL.COM/LICENSE-INFO'
-xmp-plus:LicensorName='LICENSE HOLDER NAME' \
-xmp-plus:LicensorURL='HTTPS://LICENSORURL.COM' \
-xmp-iptcExt:DigitalSourceType='DIGITAL OR AI ART DESIGNATION' \
-xmp-dc:Description='DESCRIPTION OF IMAGE' \
MYIMAGE.JPG

I then tried to adapt the above to work recursively within a directory of images, while only making changes to jpg and png files in those directories. IMPORTANT: I don't want it to edit metadata in other images (eps, svg, etc.) of that directory/sub-directories:

exiftool -r -all= --xmp-photoshop:DateCreated -ext jpg -ext png -overwrite_original \
-xmp-dc:rights='Copyright © STATEMENT' \
-xmp-dc:Creator='CREATOR NAME' \
-xmp-photoshop:Credit='CREDITS' \
-xmp-xmprights:WebStatement='HTTPS://LICENSORURL.COM/LICENSE-INFO'
-xmp-plus:LicensorName='License Holder Name' \
-xmp-plus:LicensorURL='HTTPS://LICENSORURL.COM' \
-xmp-iptcExt:DigitalSourceType='DIGITAL OR AI ART DESIGNATION' \
-xmp-dc:Description='DESCRIPTION OF IMAGE' \
/TO/MY/DIRECTORY

Any changes needed?

To note, the shell script in this post leaves off one field that Google Image Search reads and uses to provide a link to the licensor of the image: xmp-plus:LicensorName. The IPTC Photo Metadata Standard webpage appears to have a typo on the "help" exiftool tag, but I believe what I have in the last sentence is correct.

If using xmp-iptcExt:DigitalSourceType for Google Image Search, it should contain an IPTC NewsCode to specify AI art type. See here.

Phil Harvey

Quote from: Velo145 on January 13, 2024, 05:15:38 PMexiftool -all= --xmp-photoshop:DateCreated [...]

From the --TAG option documentation:

            But note that this will not exclude individual tags
            from a group delete (unless a family 2 group is specified, see
            note 4 below).  Instead, individual tags may be recovered using
            the -tagsFromFile option (eg. "-all= -tagsfromfile @ -artist").


Other than that, I can't see anything wrong with what you've done.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Velo145

Thanks Phil. I did just see that the date created tag isn't being recovered from the -all group delete.

I', about two days in on exiftool...could you give me a little more guidance on how that needs to be written?

Would that first line be written as:
exiftool -all= tagsfromfile @ xmp-photoshop -overwrite_original \
Thanks.

Velo145

Whoops, did that too fast. Would it be:
exiftool -all= -tagsfromfile @ -xmp-photoshop -overwrite_original \

Phil Harvey

You wanted to recover xmp-photoshop:DateCreated

exiftool -all= -tagsfromfile @ -xmp-photoshop:DateCreated [...]

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Velo145

OK, got it. I haven't worked with code before, so many of the references are like reading French for me (I took Spanish).

Thanks Phil. I am so grateful for this tool.