Understanding the -charset option

Started by 3ndymion, June 12, 2017, 11:05:06 AM

Previous topic - Next topic

3ndymion

Hi.  I read about the -charset option in the FAQ #10, & some other sources of info in the documentation & the forums.  I got a little confused when I saw mention of the -codedcharacterset option.  I've done some testing, & it looks like the -charset option is what I need, but I just wanted to verify my understanding of it before I implement it.

Here's my situation:

I want to write Japanese & other non-Ascii characters into the metadata.  It seems that it works automatically for EXIF & XMP, but not for IPTC.  So, I...

  • want to write metadata using UTF8 as the internal character set.
  • am mainly concerned about writing IPTC.
  • am only writing brand new tags.
I've seen that writing tags with Japanese characters without the -charset iptc=utf8 option only writes question marks into the metadata.  I then tried converting that into UTF8 with the example from FAQ #10:
exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8 a.jpg

That only had horrible effects & didn't seem to work right.  But, when I DO use the -charset iptc=utf8 option to write the new tags, all the different image viewers now see the Japanese text in IPTC.
I've seen somewhere that the -codedcharacterset option puts some kind of charset identifying byte in front of the tag in the metadata.

My Questions:

  • Does the -charset option put that identifying byte in front of the tags too???
  • Would I need the -codedcharacterset option at all for writing brand new tags???  (I assume no.)
  • It seems that non-Ascii characters are written into EXIF correctly, & automatically.  Is the -charset iptc=utf8 option all that I need, or would I need -charset exif=utf8 as well, just in case???

Thanks.

Phil Harvey

Quote from: 3ndymion on June 12, 2017, 11:05:06 AM
I got a little confused when I saw mention of the -codedcharacterset option.

That's not an option.  It is an IPTC metadata tag, and should be set in the file if you are writing text containing special characters to IPTC.

QuoteI've seen somewhere that the -codedcharacterset option puts some kind of charset identifying byte in front of the tag in the metadata.

Exactly.

Quote1. Does the -charset option put that identifying byte in front of the tags too???

No.  The -charset option(s) only affect how ExifTool interprets string data.

Quote2. Would I need the -codedcharacterset option at all for writing brand new tags???

This should be set in the IPTC metadata when writing strings with special characters to IPTC.  When done, the -charset iptc no longer needs to be specified.  There are exceptions however:

1. If the IPTC already exists and has special characters written in some other encoding, then setting the CodedCharacterSet tag to "UTF8" may mess up the decoding of existing information.

2. If you (for some reason) want to use some other (non-UTF8) encoding for IPTC metadata, then you shouldn't set the CodedCharacterSet tag, but in this case other applications are less likely to read the IPTC correctly.

Quote3. It seems that non-Ascii characters are written into EXIF correctly, & automatically.  Is the -charset iptc=utf8 option all that I need, or would I need -charset exif=utf8 as well, just in case???

-charset iptc=utf8 is only necessary when reading/writing IPTC if the CodedCharacterSet tag doesn't exist when reading or isn't specified when writing.[/tt]

Only in rare situations should the EXIF character set be specified, but this only has an effect when reading/writing EXIF metadata.

I hope this clears things up a bit.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3ndymion

Very cool.  Thanks for the quick answer.  Yes, this certainly helps.  My goal is to write the new metadata so that any image viewer can correctly read it.

So from what I understand, it looks like once IPTC:CodedCharacterSet is set to UTF8, then any IPTC metadata written after that will be encoded in UTF8.  Other image readers will look at the IPTC:CodedCharacterSet tag & know that they should read the IPTC metadata as UTF8.  So, for my goal of allowing text & characters of any language to be written into IPTC, in a way that every possible image viewer can correctly read & display it, then -codedcharacterset=utf8 is definitely what I should be using.  Is this correct???  It certainly seems like so.

exiftool -IPTC:Caption-Abstract='ども ありがと!!!' -codedcharacterset=utf8 Picture.jpg

I have been testing this & checking the results with Digikam & gThumb, & they read the characters perfect.  I think I have a better understanding of this now.  Thank you once again for your help.  It's very much appreciated.   :D :D :D

Phil Harvey

Quote from: 3ndymion on June 12, 2017, 12:48:52 PM
every possible image viewer can correctly read & display it, then -codedcharacterset=utf8 is definitely what I should be using.  Is this correct???

Well, the hard fact is that not all image viewers honour the CodedCharacterSet.  But this is certainly what I would recommend, and most modern viewers should work with this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

3ndymion

Thank you again for taking the time to help me with this.  Hopefully, this will be helpful to others who see it too.

Setting the CodedCharacterSet tag is doing exactly what I want, especially since I only want it for writing brand new tags.  It seems to be the best & most correct way to do this, & I very much appreciate it.  If I ever come across any viewers in my testing that do not honor the tag & give problems, then I'll deal with them somehow & share my knowledge here.  Otherwise, I will happily use your recommendation.   :)