How to use UTF8 characters for EXIF user comment in command line

Started by Archive, May 12, 2010, 08:54:31 AM

Previous topic - Next topic

Archive

[Originally posted by upscho on 2009-02-13 15:54:59-08]

Hello,

I'd like to set the user comment (exif) with a string that contains non-ASCII characters. Unfortunately it does not work.

The command line (Windows XP) looks like this (letter ä as an example):

exiftool -usercomment="This should go to user comment: ä" test.jpg

and it ends in: "Warning: Malformed UTF-8 character(s) - test.jpg"

Changing the code page according to FAQ 18 only changes the display. The warning remains the same.

I've tried to replace the letter ä by its hex codes \xc3\xa4 but the only thing is that the escape sequence is not recognized and "as it is" written in the user comment.

I'm all mixed up and would greatly appreciate if anyone could help me. Thanks a lot!

Best regards,

upscho

Archive

[Originally posted by upscho on 2009-02-14 09:54:22-08]

Meanwhile I found two possible solutions:

1) Command line directly

In Windows it is possible to input characters by pressing ALT and typing the decimal code for the character on the numerical key pad. Example: For the character ä the UTF-8 hex codes are C3 A4 and decimal 195 164. So one has to type

exiftool -usercomment="This should go to user comment:

then:

ALT-0195

ALT-164

and the rest of the command which results in a display of

exiftool -usercomment="This should go to user comment: ä" test.jpg

(At least with my code page which is 850.) That will do for the command line.

2) Command line with option file

The command line looks like this:

exiftool -@ MyOptions test.jpg

The file MyOptions can be generated with the Windows editor (XP or higher) by typing

-usercomment=This should go to user comment: ä

and storing the file with a coding "UTF-8".

The editor puts three additional bytes at the beginning of the file: EF BB BF, the so-called "UTF-8 Byte Order Mark". Unfortunately exiftool does not like these bytes and generates an error: "File not found."

Workaround: Delete the first three bytes with another editor that does not touch the UTF-8 coded character.

Phil: Wouldn't it be a good idea to supress those bytes from within exiftool? It's nothing wrong with them. They just indicate that this file is an UTF-8 file (which is needed by exiftool in case of non-english characters). (Please see http://en.wikipedia.org/wiki/Byte-order_mark" target="_blank">Wikipedia article to Byte Order Mark.) Just an idea.

One additional remark if one wants to check the attribute with Windows Explorer (see also http://www.cpanforum.com/threads/7329" target="_blank">thread 7329):

After deleting all tags with -all= and setting usercomment as described above it won't be displayed correctly in the Windows Explorer File Properties. (At least I see some asian signs.) This is because Windows Explorer expects the unicode bytes stored in little-endian byte order(and not big-endian which is the default).

There are two options in exiftool that control the byte order:

-ExifByteOrder

-ExifUnicodeByteOrder

Windows Explorer will display the contents of usercomment correctly if only the first option is set to little-endian:

-ExifByteOrder=little-endian

It will also work with setting both options:

-ExifByteOrder=big-endian

-ExifUnicodeByteOrder=little-endian

which indicates that Windows Explorer only cares about the byte order of the unicode strings and not of the rest of the exif header. But as far as I have understood Phil correctly it is not advisible to set those options to different values.

So, my solution is to add the line "-ExifByteOrder=little-endian" as the first line in the file MyOptions. That's it. (Unfortunately I'm not sure if this will lead to problems on other platforms (Mac, Unix). Does anybody know?)

Hope that helps others.

Best regards,

upscho

Archive

[Originally posted by exiftool on 2009-02-14 15:25:41-08]

Hi Upscho,

I'm glad you're getting along yourself.  I actually replied to this yesterday
but apparently forgot to submit the post after previewing it.

Did you try entering the characters in Windows Latin1, and using the
-L option?  (See
https://exiftool.org/faq.html#Q10" target="_blank">faq
number 10).  Of course, this only works
if the characters you want exist in this code page.

It sounds like entering unicode at the Windows command line is painful.
I can believe that.  I understand the BOM that many editors write to a
Unicode file, and can filter that out from the -@ input because
this must be a text file, but I can't filter it out from any other file inputs
(like -tag&lt=file for example), because they can be arbitrary
binary data.  I will add the -@ filtering to the next release.

Windows does have a problem with the byte ordering of Unicode text.
The natural thing to do is to write it in the same byte order as the
EXIF data.  The MWG will be addressing this problem, and their recommendation
will be to do what exiftool does, and use the EXIF byte ordering.  But
as you have discovered, Microsoft seems to prefer little-endian Unicode
even if the EXIF is big-endian.  For this reason, I added the
-ExifUnicodeByteOrder option, which you have also discovered.
Setting -ExifByteOrder has no effect if the EXIF already exists,
so this won't allow you to write little-endian Unicode to existing
big-endian EXIF.

- Phil

Archive

[Originally posted by upscho on 2009-02-18 04:56:05-08]

Hi Phil,

of course you are right: The -L option had also helped me. If I had used it correctly... In fact I wasn't aware that command line options of exiftool are case sensitive. I'm sure that I tried "-l" once wondering that nothing happened. Sorry for that.

The following is to complete my 2nd posting in this thread:

3. Command line directly with -L option

First thing after starting the command line of windows (cmd.exe) is to change the code page to 1252 (as it's described in FAQ #10):

chcp 1252

Then the following command helps with my example:

exiftool -L -usercomment="This should go to user comment: ä" test.jpg

That's all!

To make it possible that Windows Explorer displays the user comment correctly one has to use option -ExifUnicodeByteOrder=little-endian or option -ExifByteOrder=little-endian depending if the EXIF already exists or not. So for me after using the -all= option the command line looks like this:

exiftool -exifbyteorder=little-endian -L -usercomment="This should go to user comment: ä" test.jpg

Sorry again Phil for overlooking this possibility.

One last remark: After discovering the case sensitive thing the -E option "suddenly worked for me" ;-) It is a nice thing to display special characters (ä in my case) independently from code pages. Unfortunately it is only for reading an EXIF file. Phil, what about adding an additional writing option? Just an idea because the above described solutions also work well.

Best regards,

upscho