On windows 10 64-bit I am running ExifTool with the following batch script:
chcp 1252
set EXIFTOOL=c:\bin\EXIFTool\exiftool.exe
set SRC="Coons 2018.orig.jpg"
set DEST="Coons 2018.exiftool3.jpg"
set COPYRIGHT=Copyright © 2018 Kenneth Evans All Rights Reserved
copy %SRC% %DEST%
%EXIFTOOL% -charset utf8 -artist="Kenneth Evans" -copyright="%COPYRIGHT%" -copyrightnotice="%COPYRIGHT%" -rights="%COPYRIGHT%" -UsageTerms="All Rights Reserved" -Marked="true" %DEST%
chcp 65001
%EXIFTOOL% -filename -artist -copyright -copyrightnotice -rights %DEST%
The BAT file is UTF-8 according to Notepad++. On doing a hex dump, the characters are single-byte except the copyright symbol (C2 A9).
It runs as is and gives the ExifTool output I would like (actual c-in-circle copyright symbols) both in the output with code page 2001 and in ExifToolGui. However, the code page is Latin and the copyright symbol shows up as © in the echo statements from the script. (My understanding is that ExifTool will be assuming the input is Latin, not UTF-8.)
If I change the first line to chcp 65001 (UTF-8), then the echo output is as you would expect (copyright symbol is a ©), but I get:
Warning: Malformed UTF-8 character(s) - Coons 2018.exiftool3.jpg
and the output is:
Artist : Kenneth Evans
Copyright : Copyright 2018 Kenneth Evans All Rights Reserved
Copyright Notice : Copyright
Rights : Copyright ? 2018 Kenneth Evans All Rights Reserved
So I get the wrong results when everything is UTF-8 and the right results when the code page is 1252 (Latin). What am I doing wrong or failing to understand?
Thanks.
Your console is set to cp1252 but you have told ExifTool that you are entering characters in UTF8.
It looks like you should use -charset cp1252 instead of -charset utf8 when writing.
- Phil
Edit: But you say your bat file is UTF8. Then why are you setting cp1252 and not cp65001 at the start? I must admit, I haven't tried doing this in a .bat file.
Thanks for the fast reply. You would think so but that doesn't work. (EXIF Copyright has a bad character, other two ok.) What I wrote is what works.
I've done a lot of reading and trial & error by now. ;)
I would like to use chcp 65001 and not specify -charset utf8. Why doesn't that work?
I have no idea. This is really more of a Windows question.
But specifying -charset utf8 should have no effect since this is the default.
- Phil
Removing -charset utf8 doesn't change anything with the script as is (using chcp 1252). The only real issue is that the command output has ©, whereas with chcp 65001, it is ©.
It may be a Windows question, but ExifTool is doing something different in the two cases. With 65001, I see © and get a a malformed character. With 1252 I see a malformed character © and get ©.
I would expect it to be the other way around. it would be nice to understand what is happening, so I don't get bit down the road by doing something that doesn't make sense.
Added later: How does ExifTool determine the input charset? In both of my cases the bytes it is getting are C2 A9 for ©.
The -charset option is how ExifTool determines the input character set.
I'll have to try this in Windows to be able to comment more intelligently on what is happening, but it may be a while before I can do that.
- Phil
Quote from: Phil Harvey on July 16, 2018, 04:21:09 PM
I'll have to try this in Windows to be able to comment more intelligently on what is happening, but it may be a while before I can do that.
Thanks.
I worked on this further. The BAT file can all be chcp 65001 (UTF-8) except for setting the copyright:
This works:
@chcp 1252 > nul
set COPYRIGHT=Copyright © %YEAR% Kenneth Evans All Rights Reserved
@chcp 65001 > nul
The © in the code excerpt is a 2-byte UTF-8 © and the BAT file itself is UTF-8, done in Notepad++.
So you are right, it seems to be a Windows thing. I think the console is able to display things in a particular code page and language, but does its own thing under the covers. I am not an expert and avoid it if I can.
It would be interesting to know what Exiftool gets as input, that is, what causes it to print that it encountered malformed characters (when it has chcp 65001 at the top and not using the chcp 1252 in the excerpt).
I'm glad you figured this out. The exiftool -echo option may be useful to see what ExifTool sees. Try something like this:
exiftool -echo "Copyright ©" > out.txt
You should be able to do the same thing with the built-in "echo" command.
- Phil
I did this:
set YEAR=2018
chcp 1252
set COPYRIGHT=Copyright © %YEAR% Kenneth Evans All Rights Reserved
echo %COPYRIGHT% | hexdump -C
exiftool -echo "Copyright ©" > test1252.txt
hexdump -C test1252.txt
chcp 65001
set COPYRIGHT=Copyright © %YEAR% Kenneth Evans All Rights Reserved
echo %COPYRIGHT% | hexdump -C
exiftool -echo "Copyright ©" > test65001.txt
hexdump -C test65001.txt
These are the results:
C:\bin\EXIFTool>TestChcp.bat
C:\bin\EXIFTool>set YEAR=2018
C:\bin\EXIFTool>chcp 1252
Active code page: 1252
C:\bin\EXIFTool>set COPYRIGHT=Copyright © 2018 Kenneth Evans All Rights Reserved
C:\bin\EXIFTool>echo Copyright © 2018 Kenneth Evans All Rights Reserved | hexdump -C
00000000 43 6f 70 79 72 69 67 68 74 20 c2 a9 20 32 30 31 |Copyright .. 201|
00000010 38 20 4b 65 6e 6e 65 74 68 20 45 76 61 6e 73 20 |8 Kenneth Evans |
00000020 41 6c 6c 20 52 69 67 68 74 73 20 52 65 73 65 72 |All Rights Reser|
00000030 76 65 64 20 0d 0a |ved ..|
00000036
C:\bin\EXIFTool>exiftool -echo "Copyright ©" 1>test1252.txt
C:\bin\EXIFTool>hexdump -C test1252.txt
00000000 43 6f 70 79 72 69 67 68 74 20 c2 a9 0d 0a |Copyright ....|
0000000e
C:\bin\EXIFTool>chcp 65001
Active code page: 65001
C:\bin\EXIFTool>set COPYRIGHT=Copyright © 2018 Kenneth Evans All Rights Reserved
C:\bin\EXIFTool>echo Copyright © 2018 Kenneth Evans All Rights Reserved | hexdump -C
00000000 43 6f 70 79 72 69 67 68 74 20 c2 a9 20 32 30 31 |Copyright .. 201|
00000010 38 20 4b 65 6e 6e 65 74 68 20 45 76 61 6e 73 20 |8 Kenneth Evans |
00000020 41 6c 6c 20 52 69 67 68 74 73 20 52 65 73 65 72 |All Rights Reser|
00000030 76 65 64 20 0d 0a |ved ..|
00000036
C:\bin\EXIFTool>exiftool -echo "Copyright ©" 1>test65001.txt
C:\bin\EXIFTool>hexdump -C test65001.txt
00000000 43 6f 70 79 72 69 67 68 74 20 a9 0d 0a |Copyright ...|
0000000d
C:\bin\EXIFTool>
So it looks like Exiftool is getting the same bytes either way, but the results of your suggested test are different. In chcp 65001 it is losing the c2 byte.
Interesting. Thanks for running this test.
I can't explain the difference. All I can tell you is that the exiftool -echo command echos back exactly the characters that exiftool gets from the command line without any recoding (by exiftool that is -- I can't speak for the shell). Obviously this is somehow different from what the built-in echo command is doing. I must admit that I really don't understand how the Windows command shell handles character encoding.
- Phil
I also don't understand how the Windows command shell works internally, but I am not seeing anything anomalous from what I would expect in the shell part, just in what Exiftool does.
It looks like Exiftool is getting both bytes of © in either case, based on the shell output lines. It doesn't make sense that it is dropping the first byte when in chcp 65001, the code page you would expect to work right.
I don't get it either.
- Phil
Note that FAQ 18 (https://exiftool.org/faq.html#Q18) mentions this:
Note that Windows will recode arguments on the command line from the current console code page to the system code page
Which may explain why the c2 is dropped when you chcp 65001.
- Phil
Yes, I saw FAQ 18. It essentially says to use chcp 65001. ;)
I could be wrong, but I have heard Windows uses wide characters internally. In any case it should be doing the same thing both ways. It is my guess that Perl is doing it, but that's just a guess.
In any event I have a work around.
This is the first I have used Exiftool more than superficially. I am impressed. Thanks.
Quote from: Kenneth Evans on July 19, 2018, 06:05:18 PM
Yes, I saw FAQ 18. It essentially says to use chcp 65001. ;)
Yes, and
the ExifTool -charset should be set to the system code page for command-line arguments.
So use
-charset SYSTEMCODEPAGE with chcp 65001 to get the correct encoding for command-line parameters. Did you try this?
The first part of FAQ 18 deals only with getting the ExifTool output correct. Input is different, unforunately.
- Phil
1. I tried -charset SYSTEMCODEPAGE and got:
Invalid Charset SYSTEMCODEPAGE
2. I don't completely understand FAQ 18, but I assume you want me to use chcp 437. (That is what I get if I start a new console and type chcp.) I'm not sure from the FAQ what to use for -charset, but I tried -charset cp437 and got:
Warning: Some character(s) could not be encoded in Latin - Coons 2018.exiftool3.jpg
1 image files updated
And the results are garbage.
3. I tried setting chcp 437 instead of chcp 6501 at the top, and using -charset cp437. It didn't complain but the results were garbge:
C:\Users\evans\Pictures\EXIF Test>c:\bin\EXIFTool\exiftool.exe -filename -artist -copyright -copyrightnotice -rights -usageterms "Coons 2018.exiftool3.jpg"
File Name : Coons 2018.exiftool3.jpg
Artist : Kenneth Evans
Copyright : Copyright - 2018 Kenneth Evans All Rights Reserved
Copyright Notice : Copyright -¼ 2018 Kenneth Evans All Rights Reserved
Rights : Copyright -¼ 2018 Kenneth Evans All Rights Reserved
Usage Terms : All Rights Reserved
I think this is similar to one of my early tries using chcp 1252 at the top and -L for the charset.
In case what I did is not clear, this is my BAT file at try 3. (The BAT file is UTF-8). Originally it had chcp 65001 at the top instead of chcp 437 and did not have any -charset. With either chcp 437 or chcp 1252, the console output will not show © correctly.
chcp 437
set EXIFTOOL=c:\bin\EXIFTool\exiftool.exe
set SRC="Coons 2018.orig.jpg"
set DEST="Coons 2018.exiftool3.jpg"
set COPYRIGHT=Copyright © 2018 Kenneth Evans All Rights Reserved
copy %SRC% %DEST%
%EXIFTOOL% -charset cp437 -artist="Kenneth Evans" -copyright="%COPYRIGHT%" -copyrightnotice="%COPYRIGHT%" -rights="%COPYRIGHT%" -usageterms="All Rights Reserved" -Marked="true" %DEST%
chcp 65001
%EXIFTOOL% -filename -artist -copyright -copyrightnotice -rights -usageterms %DEST%
This is what works to give the correct shell output (except for the one line in chcp 1252) and the correct results:
chcp 65001
set EXIFTOOL=c:\bin\EXIFTool\exiftool.exe
set SRC="Coons 2018.orig.jpg"
set DEST="Coons 2018.exiftool3.jpg"
chcp 1252
set COPYRIGHT=Copyright © 2018 Kenneth Evans All Rights Reserved
chcp 65001
copy %SRC% %DEST%
%EXIFTOOL% -artist="Kenneth Evans" -copyright="%COPYRIGHT%" -copyrightnotice="%COPYRIGHT%" -rights="%COPYRIGHT%" -usageterms="All Rights Reserved" -Marked="true" %DEST%
chcp 65001
%EXIFTOOL% -filename -artist -copyright -copyrightnotice -rights -usageterms %DEST%
That is the best I can do (so far ;) ) And, no, it doesn't make sense to me. I should be able to do everything in chcp 65001, using a UTF-8 BAT file (which requires the two-byte ©).
The hex dumps indicate I am sending the right thing. But Exiftool is not getting the right thing, and it is unclear what would strip the missing byte from what is in the shell. You would expect an extra  perhaps, but not removing a byte.
Quote from: Kenneth Evans on July 19, 2018, 11:05:42 PM
1. I tried -charset SYSTEMCODEPAGE and got:
Invalid Charset SYSTEMCODEPAGE
I meant for you to insert whatever system code page you are using in place of "SYSTEMCODEPAGE". (ie. "cp1252")
QuoteWarning: Some character(s) could not be encoded in Latin - Coons 2018.exiftool3.jpg
1 image files updated
Yes. This problem is mentioned in FAQ 18. And the recommended solution is to use a
-@ argfile to avoid the command-line recoding issues.
- Phil
Quote from: Phil Harvey on July 20, 2018, 07:17:38 AM
Yes. This problem is mentioned in FAQ 18. And the recommended solution is to use a -@ argfile to avoid the command-line recoding
Phil,
That is inconvenient. What I posted is a test case I used to figure out how it worked. The real implementation is more sophisticated and lets you specify the date, among other things. This would mean writing the file on the fly, as well as managing the file in the first place. My solution is much simpler. Neither is elegant.
As to the second part of FAQ 18. It could be made more clear. It could say you can determine the system font by typing chcp in a new console, and it could say add cp to the number nnn you get to specify the charset, i.e. use -charset cpnnn.
Having said that, I think it is bad advice to use the system font. The system font will not display a UTF-8 © correctly. You really want to work in UTF-8 (chcp 65001) entirely. You want the metadata to be UTF-8, as that is the de facto standard. You don't put copyright information in to read yourself. You put it in for others to read with whatever tools they may have. It should be as standard as possible.
The problem here is not with the Windows console. The console handles the UTF-8 © fine (even if you don't use chcp 65001). The problem is that Exiftool doesn't parse it correctly. Other programs, like Hexdump, do parse it correctly.
Nevertheless, I appreciate your interest and help, and I do like the program. This issue is the only real problem I have encountered so far. Thanks.
-Ken
The problem isn't ExifTool. It is something else outside my control. It could be that Perl uses the standard C library routines to read the command-line arguments, and Windows programs likely use Windows library routines.
- Phil
Just jumping in to point out that there is always the -E option (https://exiftool.org/exiftool_pod.html#E--ex--escapeHTML--escapeXML). You can avoid mucking about with the code page stuff by using -E and replacing © with the html entity ©.