UTF-8 -> Latin1 conversion, L option

Started by Archive, May 12, 2010, 08:54:04 AM

Previous topic - Next topic

Archive

[Originally posted by linuxuser on 2007-05-08 14:39:20-07]

I searched the exiftool-page but I didn't find out, what I have to define for -iptc:CodedCharacterSet=Latin1. Sorry I don't know the ESC-sequence.

What happens if the tag is written in UTF-8 and CodedCharacterSet is removed? Remains the coding UTF8?

How would you recommend to code all the tags from UTF-8 to Latin1 and _no_ CodedCharacterSet-tag?

What does exiftool with characters which are not contained in Latin1? Is it possible that complete words, which contain e.g. only 1 non-latin character are removed from keywords? E.g. njemacka (where the c is a special character and I don't know how to write it in the forum), which is Germany in Croatian.

I think the CodedCharacterSet-tag is a problem with Zooomr and I would like to do tests with different metatags.

Archive

[Originally posted by exiftool on 2007-05-08 15:49:02-07]

The encoding of existing information is not changed if you
change CodedCharacterSet.  However, it affects any new information
added with ExifTool, and it also affects the way all IPTC information
is decoded when reading.

If you want to use Latin1, probably the best thing to do is to delete
the CodedCharacterSet tag.  Most software will assume Latin1 if there
is no CodedCharacterSet specified.

FYI: The proper way to use Latin1 in IPTC is actually very complex, and
few software packages would understand it if done properly.   (You
need to use ISO 2022 and designate your choice to alternate graphics
character sets to be Latin1 with the appropriate escape sequence in
CodedCharacterSet, then invoke the desired character set with another
ISO 2022 escape sequence in the actual text when you want to use it.)

But to answer your question, here is how you would change encoding to
UTF8:

Code:
exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=UTF8

Unfortunately, due to a quirk in the way this is implemented in versions
up to 6.89, this doesn't work when the CodedCharacterSet is deleted (although
this is exactly what you want to do).  So I have changed this, and uploaded a
6.90 pre-release
which properly handles the translations when CodedCharacterSet is deleted.
With this version, you can also translate the IPTC values back to Latin1 like
this:

Code:
exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=

- Phil

Archive

[Originally posted by exiftool on 2007-05-08 15:54:49-07]

Sorry, I didn't answer your question about conversion of
characters which aren't valid Latin1:  Only valid Latin1 characters
are translated.  All other characters are passed straight through
without translation, and just encoded into UTF8 directly.

- Phil

Archive

[Originally posted by linuxuser on 2007-05-08 18:22:28-07]

Phil, I would like to do it the other way round, not

Code:
exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=UTF8

I want to create valid latin1-tags _from_ existing utf8-tags.

How should the command be to to copy all the tags of an image with uft8-metatags to a new image with latin1-tags? How can I throw away the characters which are not Latin1? I use the bash, so maybe I could use a sed command in a pipe "between". I think the only fields which could contain "real" UTF-8-chacters like Greek characters would be the keyword-field and maybe the description field, so I could exclude this first and then add it after a modification.

What is the escape-sequence to define Latin1 in codedcharacterset? I mean something like ESC ..

Thanks a lot

Archive

[Originally posted by exiftool on 2007-05-08 18:42:06-07]

I gave you an example of how to convert UTF8 tags to Latin1, but
you need ExifTool 6.90 to do it.  If you want to convert when copying
to another file, just use a different filename as the source in the
-tagsfromfile option.

Thinking about this a bit more carefully:  There is no way to throw
out non-Latin1 characters, because all byte values 0-255 correspond
to a valid Latin1 character.

As I said, the Latin1 escape sequence is not simple.  "ESC,A", "ESC-A",
"ESC.A" and "ESC/A" in CodedCharacterSet will designate Latin1 for
graphics character set 0 through 3 respectively, but then you have to
invoke the appropriate set through an ISO 2022 escape sequence in
the text itself, otherwise the CodedCharacterSet's don't get used.
This is a real pain, and no software will decode this properly.

Did I mention that IPTC really sucks when it comes to coded characters?

- Phil