Difficulty in writing UTF8 text into Caption-Abstract tag

Started by Sawfish, November 02, 2020, 05:01:04 PM

Previous topic - Next topic

Sawfish

Hi, I am new to image files and their metadata.

I have an assignment that requires that I read the current existing content of four tags, some of which exist in some files and in some cases exist in all files.  I then massage this text using sed, replacing certain standard character combinations (e.g., c",  D", etc) with the proper Croatian character that belongs in place of the character combinations. The text was created by a US citizen, a  genealogist,  who used Expression to edit the tags, inserting the paired letters until such a time that he could replace them with UTF* Croatian letters.

There are ~6000 .tif and .jpg files that need this treatment, and it works fine for all tags except Caption-Abstract. This is done in a C shell script. I realize that exiftool has the capabilities to do all/most of this, but I needed it done soon, and I know C shell scripting, so...

Here are the pertinent blocks. First, where read the existing text from each tag, feed it to sed, and store it in a variable.

# READ DESCRIPTION, IMAGEDESCRIPTION, CAPTION-ABSTRACT, HEADLINE

        set c_a_captions = "`exiftool -T -charset iptc=UTF8 -Caption-Abstract '${INFILE}' | sed -f ${TOOLDIR}/character_maps.sed`"  # CREATE MAPPED CAPTION  # v1.1 version - does not remove newlines.

        set descr_captions = "`exiftool -T -charset iptc=UTF8 -Description '${INFILE}' | sed -f ${TOOLDIR}/character_maps.sed`"  # CREATE MAPPED CAPTION  # v1.1 version - does not remove newlines.

        set image_descr_captions = "`exiftool -T -charset exif=UTF8 -ImageDescription '${INFILE}' | sed -f ${TOOLDIR}/character_maps.sed`"  # CREATE MAPPED CAPTION  # v1.1 version - does not remove newlines.

        set headline_captions = "`exiftool -T -charset iptc=UTF8 -Headline '${INFILE}' | sed -f ${TOOLDIR}/character_maps.sed`"  # CREATE MAPPED CAPTION  # v1.1 version - does not remove newlines.

Now where I write the mapped text back into the tags:

# WRITE DESCRIPTION, IMAGEDESCRIPTION, CAPTION-ABSTRACT, HEADLINE

        exiftool -m -q -q -overwrite_original -charset iptc=UTF8 -Caption-Abstract="${c_a_captions}" "${INFILE}" | tee -a  ./${OUTREPORT} # IMPORT MAPPED CAPTION TEXT; SUPPRESSES AND FIXES MINOR WARNINGS, IF POSSIBLE.

        exiftool -m -q -q -overwrite_original -charset iptc=UTF8 -Description="${c_a_captions}" "${INFILE}" | tee -a  ./${OUTREPORT} # IMPORT MAPPED CAPTION TEXT; SUPPRESSES AND FIXES MINOR WARNINGS, IF POSSIBLE.

        exiftool -m -q -q -overwrite_original -charset exif=UTF8 -ImageDescription="${image_descr_captions}" "${INFILE}" | tee -a  ./${OUTREPORT} # IMPORT MAPPED CAPTION TEXT; SUPPRESSES AND FIXES MINOR WARNINGS, IF POSSIBLE.

        exiftool -m -q -q -overwrite_original -charset iptc=UTF8 -Headline="${headline_captions}" "${INFILE}" | tee -a  ./${OUTREPORT} # IMPORT MAPPED CAPTION TEXT; SUPPRESSES AND FIXES MINOR WARNINGS, IF POSSIBLE.

I realize that this could be done much more efficiently, but I'm retired and this is the first time I've coded in 7+ years. I just need to get it done as best I can.

The behavior is that all text copies back into the tags EXCEPT for Caption-Abstract. It is written in something other than UTF*.

E.g. an example of the text just before I try to write it into Caption-Abstract:

Caption-Abstract                : Lucija Klaic' Sekulic' (*1863 Popovic'i; father: Vlaho VLAS"KO; mother: Kate of Luka Vezilic' of C"ilipi; husband: Tomo Sekulic' of Grab, Hercegovina); unknown (but she married an Austrian soldier); Katica Klaic' (father: Pero; not married); Stane D'uras" Klaic' (standing; *1906 Popovic'i; father: Antun; husband: Vlaho); Rade Radic' Klaic' (sitting; *1886 Popovic'i; father: Niko; husband: Pero). Photo from Vlaho Klaic' of Popovic'i. Scanned April 2005.

However, this is what it turns out as after saving the file:

Caption-Abstract                : Lucija Klaić Sekulić (*1863 Popovići; father: Vlaho VLAÅ KO; mother: Kate of Luka Vezilić of ÄŒilipi; husband: Tomo Sekulić of Grab, Hercegovina); unknown (but she married an Austrian soldier); Katica Klaić (father: Pero; not married); Stane Ä<U+0090>uraÅ¡ Klaić (standing; *1906 Popovići; father: Antun; husband: Vlaho); Rade Radić Klaić (sitting; *1886 Popovići; father: Niko; husband: Pero). Photo from Vlaho Klaić of Popovići. Scanned April 2005.

The same text is put into Description, and it looks correct:

Description                     : Lucija Klaić Sekulić (*1863 Popovići; father: Vlaho VLAŠKO; mother: Kate of Luka Vezilić of Čilipi; husband: Tomo Sekulić of Grab, Hercegovina); unknown (but she married an Austrian soldier); Katica Klaić (father: Pero; not married); Stane Đuraš Klaić (standing; *1906 Popovići; father: Antun; husband: Vlaho); Rade Radić Klaić (sitting; *1886 Popovići; father: Niko; husband: Pero). Photo from Vlaho Klaić of Popovići. Scanned April 2005.

I need this to be in Caption-Abstract, as well, and have beaten on this for quite a while.

Any help/suggestions would be greatly appreciated!



StarGeek

I'm guessing this isn't Windows?

Take a look at the IPTC subsection of FAQ #10. Maybe set -CodedCharacterSet=UTF8?

Also, if the Description is correct, then you could bulk copy the Description into Caption-Abstract
exiftool '-Caption-Abstract<Description' /path/to/files/
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Sawfish

Right. It is CentOS.

I cannot assume that what my client sees on his Expression, Bridge, or Photoshop screen, in the "Description"" window, actually comes from the Description tag, so like shooting in the dark, I must take each of the tags that I have seen, using exiftool -s that have essentially the same text, and map each one of those, to be sure that he will see the mapped text.

E.g, he has used Bridge to see a "Description:" ,  which he will later edit (or not), and it's clear that Bridge is showing the Caption-Abstract tag in that edit window, whereas I use gthumb, and see the mapped text, so I know that for gthumb, I'm not viewing Caption-Abstract, but one of the other tags. The text is mapped and fine.

The provenance of the files is all over the place--some scanned, some done in Europe ad sent to him, with who knows what metadata in there. Right now, with my level of knowledge (practically nil) I'm just stabbing blindly.

Sawfish

StarGeek, thanks for the idea about  using -CodedCharacterSet=UTF8. By simply adding it, so that now I have:

exiftool -m -q -q -overwrite_original -CodedCharacterSet=UTF8 -charset iptc=UTF8 -Caption-Abstract="${c_a_captions}" "${INFILE}" | tee -a  ./${OUTREPORT} # IMPORT MAPPED CAPTION TEXT; SUPPRESSES AND FIXES MINOR WARNINGS, IF POSSIBLE.

The content of  the Caption-Abstract tag is now as it should be:

Caption-Abstract                : Lucija Klaić Sekulić (*1863 Popovići; father: Vlaho VLAŠKO; mother: Kate of Luka Vezilić of Čilipi; husband: Tomo Sekulić of Grab, Hercegovina); unknown (but she married an Austrian soldier); Katica Klaić (father: Pero; not married); Stane Đuraš Klaić (standing; *1906 Popovići; father: Antun; husband: Vlaho); Rade Radić Klaić (sitting; *1886 Popovići; father: Niko; husband: Pero). Photo from Vlaho Klaić of Popovići. Scanned April 2005.

So this begs the question: What, exactly does CodedCharacterSet do? The documentation seems sparse.

But sure enough, it seems to have been the magic bullet...

Thanks again!

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).