Problem setting Chinese chars in Exif Metadata

Started by SimonKravis, September 24, 2017, 08:42:36 AM

Previous topic - Next topic

SimonKravis

I am trying to set a Title field to Chinese characters using the following commands at the Windows console:

(cannot insert image in post using Image button - would appreciate guidance.Link below


https://www.dropbox.com/s/dwoyo07mas6xfib/ExifTool%20Chinese%20Chars.png?dl=0

1) Change font to Simsun. Echoing Chinese chars works OK

echo 能美報前
能美報前

chcp65001
exiftool - title="能美報前" -charset cp65001 -Overwrite_original "Pumpkin Flower.jpg"
  1 image files updated
exiftool - title -charset cp65001  "Pumpkin Flower.jpg"
Title      : ????


if i set the command window code page to CP1252 and use -charset cp1252 I get the same results.

How can I set and retrieve Chinese chars from exif Title field instead of question marks for each Chinese char ?

StarGeek

#1
Windows command line really sucks when it comes to characters from languages other than English.  I've never been able to get it to work properly.

In this case, the command line isn't passing the proper characters to exiftool.  If you check the file through Windows->Right Click->Properties->Details, you'll see that the data isn't saved correctly.

On the other hand, if you save the string you want to insert to a file and use <= (doc link) to insert the data, it's saved correctly and you can verify it through Windows properties.

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

SimonKravis

This looks like a way of setting metadata directly - can you give any details of how to get Title metadata field set using <=?

StarGeek

Sorry, I should have been more clear with my link to the docs, now fixed to be more obvious.  Make sure and re-read the last paragraph of my previous post and the link to the docs.

This is a very indirect way of doing so.  It introduces the extra step of writing your data out to a file first, then inserting data from that file.

Because the echo command command adds a CRLF to the redirect into the temp file, that is shown by the two dots at the very end of the text in this example. 

Here's the example:
D:\>chcp 65001
Active code page: 65001

D:\>echo >temp.txt 能美報前

D:\>type temp.txt
能美報前

D:\>exiftool -P -overwrite_original "-title<=temp.txt" y:\X_Drive\!temp\Test3.jpg
    1 image files updated

D:\>exiftool -g1 -a -s -title y:\X_Drive\!temp\Test3.jpg
---- XMP-pdf ----
Title                           :  能美報前..
---- XMP-xmp ----
Title                           :  能美報前..
---- XMP-dc ----
Title                           :  能美報前..


Also, a quick check shows that using a CSV file to insert the data also works well.
D:\>exiftool -g1 -a -s -title y:\X_Drive\!temp\Test3.jpg
---- XMP-pdf ----
Title                           : Old title
---- XMP-xmp ----
Title                           : Old title
---- XMP-dc ----
Title                           : Old title

D:\>type temp.txt
Sourcefile,title
y:\X_Drive\!temp\Test3.jpg,能美報前
D:\>exiftool -P -overwrite_original -csv=temp.txt y:\X_Drive\!temp\Test3.jpg
    1 image files updated

D:\>exiftool -g1 -a -s -title y:\X_Drive\!temp\Test3.jpg
---- XMP-pdf ----
Title                           : 能美報前
---- XMP-xmp ----
Title                           : 能美報前
---- XMP-dc ----
Title                           : 能美報前

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

SimonKravis

This is a very neat solution, but as I mainly need to retrieve the Chinese chars (which may be mixed with ASCII ones) from metadata after setting them  I came up with the solution of checking code for a Unicode char, and if at least one is present, encoding the bytes to a string use Base64 encoding, adding some marker ASCII chars, and then decoding from Base64 if marker string is present in the caption. The downside is that the Title field displayed in Windows will be encoded and meaningless. If users complain I will use your suggested method.

Setting Unicode string:

                foreach (char c in sCaption)
                {
                    if (c > 127) // if any unicode char, encode
                    {
                        Byte[] byteArray = Encoding.UTF8.GetBytes(sCaption);
                        sCaption = "Z!X!Y!" +Convert.ToBase64String(byteArray);
                        break;
                    }
                }


Getting Unicode string:

if(sOut.StartsWith("Z!X!Y!"))
{
      Byte[] byteArray2 = Convert.FromBase64String(sOut.Substring(6));
     sOut = System.Text.Encoding.UTF8.GetString(byteArray2).Trim();
}

StarGeek

Quote from: SimonKravis on September 25, 2017, 07:53:42 AM
I came up with the solution of checking code for a Unicode char, and if at least one is present,

If you're able to do that, try this solution instead.

I'm guessing that you're using .NET or something like that?  Then instead of encoding to base64, encode to HTML Entities (HttpUtility.HtmlEncode I think from a quick google).  Then, you can use the -E option to write the data

I took your Chinese character string and used an online HTML character encoder and it returned this as the encoded result: &#x80FD;&#x7F8E;&#x5831;&#x524D;.  Here's the result
C:\Windows\System32>exiftool -P -overwrite_original -E -title="&#x80FD;&#x7F8E;&#x5831;&#x524D;" y:\X_Drive\!temp\Test3.jpg
    1 image files updated

C:\Windows\System32>exiftool -g1 -a -s -title y:\X_Drive\!temp\Test3.jpg
---- XMP-pdf ----
Title                           : 能美報前
---- XMP-xmp ----
Title                           : 能美報前
---- XMP-dc ----
Title                           : 能美報前


* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

SimonKravis

I am trying to write Chinese characters to various EXIF fields in a Windows 10  environment. I can do this OK using the following c# code, but all non-EXIF metadata is lost:

              Encoding _Encoding = Encoding.UTF8;

              Image Img = new Bitmap(sFile);
                PropertyItem[] propItems = Img.PropertyItems;

                foreach (PropertyItem propItem in propItems)
                {
                    if (propItem.Id.ToString("x") == "9c9b") // EXIF tag code for XPTitle
                    {
                        propItem.Value = _Encoding.GetBytes(sTitle + '\0');
                        Img.SetPropertyItem(propItem);                       
                    }
      }


However I try and set the field using exiftool run from C# with  the following code

                string sXPTitle = MyHtmlEncode(sXPTitle);

                Process p = new Process();
                p.StartInfo = new ProcessStartInfo(sExe);
                p.StartInfo.Verb = "open";
                sArgs = "-E -TagsFromFile \"" + sSourceFile + "\" ";
                sArgs += "\"-xptitle=" + sXPTitle + "\" "; ;
                sArgs += " -overwrite_original \"" + sDestFile + "\"";
               
                p.StartInfo.Arguments = sArgs;
                p.StartInfo.UseShellExecute = false;
                p.StartInfo.RedirectStandardOutput = true;
                p.StartInfo.RedirectStandardError = true;
                p.StartInfo.CreateNoWindow = true;
                p.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
                p.Start();


Input characters (sXPTitle)   能美報前営米委歩段 appear as ���1XMR�Us|�Yik�k when read.  In the code MyHTMLEncode does HTML encoding of 2-byte chars.

I have also tried writing sXPTitle to a file using
System.IO.File.WriteAllBytes(sExeFolder + "\\Title.txt", System.Text.Encoding.UTF8.GetBytes(sXPTitle + "\0"));

and then setting directly from the file using "-xptitle<=" + sExeFolder + "\\Title.txt\" " with the same result. It appears that each byte of the 2 byte string is being interpreted individually.

Phil Harvey

Could you post Title.txt so I can take a look?  It should look like this if it is HTML-encoded (using hex characters):

&#x80fd;&#x7f8e;&#x5831;&#x524d;&#x55b6;&#x7c73;&#x59d4;&#x6b69;&#x6bb5;

From a UTF-8 terminal session:

> exiftool a.jpg -xptitle="能美報前営米委歩段"
    1 image files updated
> exiftool a.jpg -xptitle
XP Title                        : 能美報前営米委歩段
> exiftool a.jpg -xptitle -E
XP Title                        : &#x80fd;&#x7f8e;&#x5831;&#x524d;&#x55b6;&#x7c73;&#x59d4;&#x6b69;&#x6bb5;


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).