Getting Malformed URL Characters Running with Code Page 65001

Started by Kenneth Evans, July 16, 2018, 01:41:29 PM

Previous topic - Next topic

Phil Harvey

Quote from: Kenneth Evans on July 19, 2018, 06:05:18 PM
Yes, I saw FAQ 18.  It essentially says to use chcp 65001.  ;)

Yes, and

the ExifTool -charset should be set to the system code page for command-line arguments.

So use -charset SYSTEMCODEPAGE with chcp 65001 to get the correct encoding for command-line parameters.  Did you try this?

The first part of FAQ 18 deals only with getting the ExifTool output correct.  Input is different, unforunately.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Kenneth Evans

1. I tried -charset SYSTEMCODEPAGE and got:

Invalid Charset SYSTEMCODEPAGE

2. I don't completely understand FAQ 18, but I assume you want me to use chcp 437.  (That is what I get if I start a new console and type chcp.)  I'm not sure from the FAQ what to use for -charset, but I tried -charset cp437 and got:

Warning: Some character(s) could not be encoded in Latin - Coons 2018.exiftool3.jpg
    1 image files updated

And the results are garbage.

3. I tried setting chcp 437 instead of chcp 6501 at the top, and using -charset cp437.  It didn't complain but the results were garbge:

C:\Users\evans\Pictures\EXIF Test>c:\bin\EXIFTool\exiftool.exe -filename -artist -copyright -copyrightnotice -rights -usageterms "Coons 2018.exiftool3.jpg"
File Name                       : Coons 2018.exiftool3.jpg
Artist                          : Kenneth Evans
Copyright                       : Copyright - 2018 Kenneth Evans All Rights Reserved
Copyright Notice                : Copyright -¼ 2018 Kenneth Evans All Rights Reserved
Rights                          : Copyright -¼ 2018 Kenneth Evans All Rights Reserved
Usage Terms                     : All Rights Reserved

I think this is similar to one of my early tries using chcp 1252 at the top and -L for the charset.

In case what I did is not clear, this is my BAT file at try 3.  (The BAT file is UTF-8).  Originally it had chcp 65001 at the top instead of chcp 437 and did not have any -charset.  With either chcp 437 or chcp 1252, the console output will not show © correctly.

chcp 437
set EXIFTOOL=c:\bin\EXIFTool\exiftool.exe
set SRC="Coons 2018.orig.jpg"
set DEST="Coons 2018.exiftool3.jpg"
set COPYRIGHT=Copyright © 2018 Kenneth Evans All Rights Reserved
copy %SRC% %DEST%
%EXIFTOOL% -charset cp437 -artist="Kenneth Evans" -copyright="%COPYRIGHT%" -copyrightnotice="%COPYRIGHT%" -rights="%COPYRIGHT%" -usageterms="All Rights Reserved" -Marked="true" %DEST%
chcp 65001
%EXIFTOOL% -filename -artist -copyright -copyrightnotice -rights -usageterms %DEST%


This is what works to give the correct shell output (except for the one line in chcp 1252) and the correct results:

chcp 65001
set EXIFTOOL=c:\bin\EXIFTool\exiftool.exe
set SRC="Coons 2018.orig.jpg"
set DEST="Coons 2018.exiftool3.jpg"

chcp 1252
set COPYRIGHT=Copyright © 2018 Kenneth Evans All Rights Reserved
chcp 65001

copy %SRC% %DEST%
%EXIFTOOL% -artist="Kenneth Evans" -copyright="%COPYRIGHT%" -copyrightnotice="%COPYRIGHT%" -rights="%COPYRIGHT%" -usageterms="All Rights Reserved" -Marked="true" %DEST%
chcp 65001
%EXIFTOOL% -filename -artist -copyright -copyrightnotice -rights -usageterms %DEST%


That is the best I can do (so far  ;) )  And, no, it doesn't make sense to me.  I should be able to do everything in chcp 65001, using a UTF-8 BAT file (which requires the two-byte ©).

The hex dumps indicate I am sending the right thing.  But Exiftool is not getting the right thing, and it is unclear what would strip the missing byte from what is in the shell.  You would expect an extra  perhaps, but not removing a byte.

Phil Harvey

Quote from: Kenneth Evans on July 19, 2018, 11:05:42 PM
1. I tried -charset SYSTEMCODEPAGE and got:

Invalid Charset SYSTEMCODEPAGE

I meant for you to insert whatever system code page you are using in place of "SYSTEMCODEPAGE".  (ie. "cp1252")

QuoteWarning: Some character(s) could not be encoded in Latin - Coons 2018.exiftool3.jpg
    1 image files updated

Yes.  This problem is mentioned in FAQ 18.  And the recommended solution is to use a -@ argfile to avoid the command-line recoding issues.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Kenneth Evans

Quote from: Phil Harvey on July 20, 2018, 07:17:38 AM
Yes.  This problem is mentioned in FAQ 18.  And the recommended solution is to use a -@ argfile to avoid the command-line recoding
Phil,

That is inconvenient.  What I posted is a test case I used to figure out how it worked.  The real implementation is more sophisticated and lets you specify the date, among other things.  This would mean writing the file on the fly, as well as managing the file in the first place.  My solution is much simpler.  Neither is elegant.

As to the second part of FAQ 18.  It could be made more clear.  It could say you can determine the system font by typing chcp in a new console, and it could say add cp to the number nnn you get to specify the charset, i.e. use -charset cpnnn.

Having said that, I think it is bad advice to use the system font.  The system font will not display a UTF-8 © correctly.  You really want to work in UTF-8 (chcp 65001) entirely.  You want the metadata to be UTF-8, as that is the de facto standard.  You don't put copyright information in to read yourself.  You put it in for others to read with whatever tools they may have.  It should be as standard as possible. 

The problem here is not with the Windows console.  The console handles the UTF-8 © fine (even if you don't use chcp 65001).  The problem is that Exiftool doesn't parse it correctly.  Other programs, like Hexdump, do parse it correctly.

Nevertheless, I appreciate your interest and help, and I do like the program.  This issue is the only real problem I have encountered so far.  Thanks.

-Ken

Phil Harvey

The problem isn't ExifTool.  It is something else outside my control.  It could be that Perl uses the standard C library routines to read the command-line arguments, and Windows programs likely use Windows library routines.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Just jumping in to point out that there is always the -E option.  You can avoid mucking about with the code page stuff by using -E and replacing © with the html entity ©.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype