ExifTool Forum

ExifTool => The "exiftool" Application => Topic started by: hvdwolf on September 22, 2021, 07:27:52 AM

Title: Unicode on Windows command line in any console codepage
Post by: hvdwolf on September 22, 2021, 07:27:52 AM
This is a spin off of "Unicode in windows command line (https://exiftool.org/forum/index.php?topic=11770.msg63374#msg63374)"
That post is "forcing" users to switch to unicode, which as such I like very much as it is really unbelievable that Windows 10 itself is still not in unicode, but in a codepage and characterset for the country it is used in: so much for world-wide exchangeabilty!
Note also that windows makes use of DOS codepages for the console and windows codepages for the GUI. (neanderthalic morons  >:()

All that "not so experienced" users that simply use the default codepage, like the Western dos codepage 850 (windows cp 1250) or in the cyrillic example on dos codepage 855 or 866  (windows cp 1251) are therefore in bad luck, not to mention the Korean, Chinese, Turkish, etcetera users.
Now I read there is a workaround by using a utf-8 encoded arg file on windows with any codepage.
So I have an utf-8 encoded argfile.txt (see also attached), which contains a combination of cyrillic, German, French and Spanish words.
-exif:Artist=Порядок байтов España genießen, Vergnügen, goûter, façade, ¿Abrir
-exif:Copyright=Порядок байтов España genießen, Vergnügen, goûter, façade, ¿Abrir
-exif:UserComment="Порядок байтов España genießen, Vergnügen, goûter, façade, ¿Abrir

(And if you are in Edge as browser you can see it displays correctly as that one is unicode, just like their IIS webserver. Otherwise nobody in the entire world would use Microsoft products) Of course all programs originating on Unix/Linux or MacOS are of course unicode.

And I use the command on windows inside a codepage 850 console:
exiftool.exe -charset utf8 -charset iptc=utf8 -charset exif=utf8  -@ argfile.txt -preserve -overwrite_original 3rdimage.jpg

For a console command and for my JTG program that is the only option to write utf-8 to an image and to read utf-8 from an image on a Windows system in country codepage. Not all the "-charset" options are always necessary, but in this case I simply used them all: better safe than sorry.
However, reading it in a Windows console still doesn't work. But it does work from my JTG program.
If you switch to unicode in a console by issuing the command
chcp 65001
exiftool.exe -use mwg -exif:all 3rdimage.jpg

It does display correctly apart from the cyrillic characters (for whatever reason)

(This took me weeks to find this and now to make it work in JTG. I also think it is one of the first, if not the first, cross-platform program who does this. Now working to a new release)