[Originally posted by frereroy on 2009-02-11 12:12:29-08]
I am rtrying to work out why accented characters written to IPTC in accented characters is interpreted and misinterpreted according to the programme that reads or writes it and also on Win and Mac.
I cannot work out the command line for exiftool to return the field 90 - CodedCharacterSet in the IPTC EnvelopeRecord Tags.
Any help would be appreciated.
[Originally posted by exiftool on 2009-02-11 13:53:57-08]Did you read
FAQ
number 10? This may be of some help. Also see FAQ number 18.
The command to extract the CodedCharacterSet is:
exiftool -codedCharacterSet FILE
I suggest using UTF-8 if your applications support it. The command
to set this is:
exiftool -codedCharacterSet=utf8
(note: no "-" in "utf8")
I hope this gets you going.
- Phil
[Originally posted by frereroy on 2009-02-11 14:05:45-08]
Thanks for that. Yes I had read the FAQ 10 and 18 also a thread from last year. After setting utf8 with exiftool (I got an updated file message) looking at it in a Hexeditor my character e9 (e accent) was still e9 I understood that it would be converted to the UTF8 equivalent. Am I missing something?
[Originally posted by exiftool on 2009-02-11 16:11:17-08]
If you set the IPTC CodedCharacterSet to UTF8, then the
characters are passed straight through without
translation since exiftool expects input in UTF8. Exiftool
also supports input in Windows Latin1, and for this you
use the -L option. If you use the -L
option and write to IPTC where the CodedCharacterSet
is UTF8, then exiftool will translate from Latin1 to UTF8.
It sounds like this is what you want.
- Phil
[Originally posted by frereroy on 2009-02-11 16:51:46-08]
Thanks. My ultimate aim is to have IPTC data written to an image file in utf8.
I have been tried out most of what is out there including the good freebies like Irfanview and Xnview to Photoshop CS4 and AcdSee Pro 2. Only Photoshop and Bridge write in UTF8 but strangely the file is not signed "Exiftool -codedCharacterSet FILE" does not return a value.
I was hoping that I could find a way for Exiftool to rewrite existing Latin1 encoded data to UTF8 and then sign the file as UTF8 - all in batch mode - but I am probably dreaming...
[Originally posted by exiftool on 2009-02-11 17:15:10-08]"I was hoping that I could find a way for Exiftool to rewrite
existing Latin1 encoded data to UTF8 and then sign the file as
UTF8 - all in batch mode - but I am probably dreaming..."
This command will convert Latin1-encoded IPTC to UTF-8 for
all files in directory
DIR:
exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8 DIR
This will rewrite all IPTC in each file, performing the appropriate
translations.
But be careful because if the files are already UTF-8 and
CodedCharacterSet was not set properly as you mentioned, then
the values will be incorrectly translated. ExifTool relies on
CodedCharacterSet being set properly when translating
extracted values.
- Phil
[Originally posted by frereroy on 2009-02-11 17:39:34-08]
That works like a treat. Many thanks.
Just for the record I have found an application that writes IPTC in UTF8 AND signs the file as UTF8 and that is Photo Mechanic.
Your solution is great for my photo archives.
BTW, How can I set "Latin1" in the CodedCharacterSet before translating to UTF8- it an escape character, no ?
[Originally posted by exiftool on 2009-02-11 18:58:32-08]
IPTC is absurd. If I understand the ISO 2022 specification,
the only encoding that can be invoked directly via the
CodedCharacterSet is UTF-8. Other character character
sets can only be designated by CodedCharacterSet, but
they must be invoked by escape sequences in the text
itself. So the common practice to assume Latin1 (or some
other local character set) if CodedCharacterSet is missing.
This is what exiftool does.
[Originally posted by frereroy on 2009-02-11 19:11:19-08]
Absurd is the word! I was working on a Mac and sucessfully encoded my IPTC data from Latin1 to UTF8 and was able to read it correctly in many different graphic applications. Back on the PC Xnview and Irfanview show the double byte characters of the UTF8 and it is only Photoshop that interprets them correctly. I read somewhere else in the forums here that the character set used by many programmes is that of the machine - Western European Windows in my case (Latin1).
[Originally posted by exiftool on 2009-02-11 20:12:31-08]
Exactly. Many software just uses the local character set.
However, UTF8 is becoming a standard, and is being
recommended by the MWG, so I think eventually other
software will come around to dealing with UTF8 properly
in IPTC.
- Phil
[Originally posted by frereroy on 2009-02-12 07:41:22-08]
That's good news. In the meantime. Is it possible to convert a jpg that has the IPTC and XMP information stored in UTF8 back to Latin1?
I know that -L reads the info but how to rewrite it?
[Originally posted by exiftool on 2009-02-12 11:23:51-08]This will do the conversion to Latin1 for a single file, presuming
that the CodedCharacterSet is set to utf8 to begin with:
exiftool -tagsfromfile FILE -iptc:all -codedcharacterset= FILE
But due to the different order of operations in batch mode (tags are
copied from the file after static tag assignments have been performed),
the command is a bit different in batch mode:
exiftool -tagsfromfile @ -iptc:all --codedcharacterset -codedcharacterset= DIR
The bottom line is that exiftool encodes IPTC based on the value
of CodedCharacterSet.
- Phil
[Originally posted by frereroy on 2009-02-12 14:08:21-08]Many thanks Phil - always can count on a good solution from you. Now I can change the metadata charset as necessary in waiting for all applications to be able to read and write in utf8. I remember all this charset business from back in the days of Windows 95 when trying to write and send emails in both Eastern and Western Europe charset. Sometimes it seems as if progress is not that speedy