Why do I get different code for exif and xmp blocks if I use cmd.exe?

Started by Jom, March 20, 2020, 10:13:52 PM

Previous topic - Next topic

Jom

Hi again.

Why do I get different code for exif and xmp blocks if I use cmd.exe?


Windows 10 (english version)


cmd.exe (Active code page: 437)

exiftool -exif:copyright="(exC)© АндRei КорZhyts" -xmp-dc:rights="(dcC)© АндRei КорZhyts" -ext cr2 .

different code





metadata.pl

use strict;
use warnings;
use File::Copy;


system('exiftool -exif:copyright="(exC)© АндRei КорZhyts" -xmp-dc:rights="(dcC)© АндRei КорZhyts" -ext cr2 .');


perl metadata.pl

same code






StarGeek

Basically FAQ #10.  Read the subsections for EXIF and XMP.  EXIF strings are stored as ASCII, XMP are stored as UTF.  Exiftool can read the XMP as UTF-8, 16, or 32 and saves as UTF-8
C2A9 is UTF-8 for ©
A9 is ASCII for ©

These are written correctly when using Perl because Windows CMD isn't getting in the way.  You have the code page set to 437 and © doesn't exist in that character set (see Wikipedia).

I don't understand all the character code page stuff myself, I just know what works for me.  Adding the -L (latin) option while writing usually fixes any character problems for me. 
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Jom

Thank you, StarGeek, for saving me time by sharing common points. Now it's easier for me to learn the details.

Jom

I'm still trying to solve the problem with encodings, but I've already discovered one thing:
ExifTool for Cyrillic only offers cp1251 encoding, but Windows 10 uses a different encoding for Cyrillic — cp866.

I found a mention of cp866 here
https://exiftool.org/forum/index.php?topic=10882,
but it seems there didn't pay attention to it, if I translated correctly.

StarGeek

You might try the new Windows Terminal to see if that handles character sets better. It puts PowerShell and CMD in a tabbed window and is supposed to handle characters such as emojis better. 

I never tested it but CygWin might work better as well, since that's linux based.  Cygwin comes with it's own peculiarities, as the the file path structure is completely different.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Jom

Thank you, I recently found out about it and wanted to try it.
Yes, it is convenient, but it seems to me that it is just a shell for Powershell and cmd Windows.
We should be in accordance with the Windows locale encoding, but there is no cp866 encoding in ExifTool.

I would like to know Phil's opinion about cp866 in ExifTool, I probably don't know something, but it seems to me that this encoding lack.

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Jom

Thanks,.Phil.
I hope this solves the issues with Cyrillic on Win 10 (if the characters in the file names do not go beyond cp866).
But I continue to study the question and experiment.
If I enable UFT-8 to support worldwide languages,



then everything seems to be fine (only OpenServer doesn't start ).

Microsoft Windows [Version 10.0.18362.720]
(c) 2019 Microsoft Corporation. All rights reserved.

f:\>chcp
Active code page: 65001

f:\>exiftool -exif:copyright="(exC)© АндRei КорZhyts" -xmp-dc:rights="(dcC)© АндRei КорZhyts" -ext cr2 .
Install Win32::FindFile to support Windows Unicode file names in directories
    1 directories scanned
    2 image files updated

f:\>exiftool -G0:1 -s -a -f -exif:copyright -xmp-dc:rights -ext cr2 .
Install Win32::FindFile to support Windows Unicode file names in directories
======== ./_MG_5004.CR2
[EXIF:IFD0]     Copyright                       : (exC)© АндRei КорZhyts
[XMP:XMP-dc]    Rights                          : (dcC)© АндRei КорZhyts
======== ./Фото.CR2
[EXIF:IFD0]     Copyright                       : (exC)© АндRei КорZhyts
[XMP:XMP-dc]    Rights                          : (dcC)© АндRei КорZhyts
    1 directories scanned
    2 image files read


But still getting a message about Win32:: FindFile.
Perhaps this message is no longer necessary in this case?

Jom

It seems that something working out.
But I don't understand why cp1251 works if I have cp886 locale.
I will read the documentation again, I probably translated something inaccurately.
I don't know...

Microsoft Windows [Version 10.0.18362.720]
(c) 2019 Microsoft Corporation. All rights reserved.

f:\>chcp
Active code page: 866

f:\>exiftool -charset Cyrillic -charset exif=utf8 -exif:copyright="(exC)© АндRei КорZhyts" -xmp-dc:rights="(dcC)© АндRei КорZhyts" -ext cr2 .
    1 directories scanned
    1 image files updated

f:\>exiftool -G0:1 -s -a -f -exif:copyright -xmp-dc:rights -ext cr2 .
======== ./_MG_5004.CR2
[EXIF:IFD0]     Copyright                       : (exC)┬й ╨Р╨╜╨┤Rei ╨Ъ╨╛╤АZhyts
[XMP:XMP-dc]    Rights                          : (dcC)┬й ╨Р╨╜╨┤Rei ╨Ъ╨╛╤АZhyts
    1 directories scanned
    1 image files read

f:\>chcp 65001
Active code page: 65001

f:\>exiftool -G0:1 -s -a -f -exif:copyright -xmp-dc:rights -ext cr2 .
======== ./_MG_5004.CR2
[EXIF:IFD0]     Copyright                       : (exC)© АндRei КорZhyts
[XMP:XMP-dc]    Rights                          : (dcC)© АндRei КорZhyts
    1 directories scanned
    1 image files read

f:\>

Jom

It seems that a lot of things are becoming clear.
https://serverfault.com/questions/80635/how-can-i-manually-determine-the-codepage-and-locale-of-the-current-os/836221


PS C:\Users\andrei> Get-WinSystemLocale | Select-Object Name, DisplayName,
>>                         @{ n='OEMCP'; e={ $_.TextInfo.OemCodePage } },
>>                         @{ n='ACP';   e={ $_.TextInfo.AnsiCodePage } }


Name  DisplayName      OEMCP  ACP
----  -----------      -----  ---
ru-RU Russian (Russia)   866 1251



This is the reason why I see 866 in console, but all works with -charset Cyrillic.
I will read related ExifTool documentation completely again and check all it again.

Jom

Quote from: Phil Harvey on March 21, 2020, 06:01:38 PM
Hi Andrei,

I will look into adding support for cp866

- Phil

I understood what was happening.
I had an English version of Windows that had cp1252 configured for legacy programs, but it showed cp437 in the console.  To use Cyrillic, do not need to configure the console for this, but Windows.
I didn't know this. I didn't know I was using cp1252 which just doesn't have Cyrillic characters.
There is no need to add cp866 support.
This just need to check the region's code page settings or enable UTF-8 at the system level.

Phil Harvey

OK, thanks.  I've already added support for cp866, so I'll leave it in.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Jom

Well, I thought I'd to be in time to warn you.
In any case thanks.