Hello,
I noticed some of the German written Names in many fields appear double with the duplicate beeing written with wrong characters.
(https://exiftool.org/forum/index.php?action=dlattach;topic=9664.0;attach=2827)
For example the City "Cologne" is written in German "Köln" and appears twice in the Metadata of some files
something like this Müllenbach is displayed as M¸llenbach or even Müllenbach and Köln, Kˆln or even Kö ln.
I like to replace all of the following characters in all Fields. Cause they Appear in so many different fields like Keywords, Geo Names and Copyright
‰ -> ä
ˆ -> ö
¸ -> ü
ƒ -> Ä
÷ -> Ö
‹ -> Ü
fl -> ß
and also
ä -> ä
ö -> ö
ü -> ü
Ã,, -> Ä
Ö -> Ö
Ü -> Ü
ß -> ß
And set the encoding to UTF-8 for all fields too.
How can I check which charset has been used in the Files?
When I drag n drop files to the exiftool this is the output.
---- ExifTool ----
ExifToolVersion : 11.17
---- XMP ----
XMPToolkit : Image::ExifTool 10.96
CountryCode : DEU
Location : N├╝rburg
Subject : Deutschland, geotagged, M├╝llenbach, Rheinland-Pfalz, DEU, Deutschland, N├╝rburg, Rheinland-Pfalz
Country : Deutschland
State : Rheinland-Pfalz
CreatorTool : 1.01
---- IPTC ----
ApplicationRecordVersion : 4
Keywords : Deutschland, geotagged, Müllenbach, Rheinland-Pfalz, DEU, Deuts
City : N├╝rburg
Sub-location : N├╝rburg
Province-State : Rheinland-Pfalz
Country-PrimaryLocationCode : DEU
Country-PrimaryLocationName : Deutschland
or
---- ExifTool ----
ExifToolVersion : 11.17
---- XMP ----
XMPToolkit : Image::ExifTool 10.96
CountryCode : DEU
Location : M├╝llenbach
Rights : 2009 mARTin Bierschenk
Subject : Deutschland, geotagged, M┬©llenbach, M├╝llenbach, Rheinland-Pfalz
Country : Deutschland
State : Rheinland-Pfalz
CreatorTool : 1.01
---- IPTC ----
EnvelopeRecordVersion : 4
CodedCharacterSet : UTF8
ApplicationRecordVersion : 4
Keywords : Deutschland, geotagged, M┬©llenbach, M├╝llenbach, Rheinland-Pfal
City : M┬©llenbach
Sub-location : M┬©llenbach
Province-State : Rheinland-Pfalz
Country-PrimaryLocationCode : DEU
Country-PrimaryLocationName : Deutschland
The first step is to sort out your IPTC character coding problem. See the IPTC section of FAQ 10 (https://exiftool.org/faq.html#Q10) for help here.
It looks like your XMP has got invalid characters because you have copied them from IPTC without using the proper encoding.
I would suggest these steps to fix the problem:
1. Delete the IPTC entries from XMP (using the same incorrect encoding that they were added with)
2. Solve your IPTC encoding problems
3. Re-insert the IPTC back into XMP
- Phil
Hello Phil, thank you very much for your reply.
Sounds like that is a more of a manual task than an automation.
How can I list all the Files that contain any of these "wrong" charaters? So I have the files to work on.
‰
ˆ
¸
ƒ
÷
‹
fl
Cheers
Very good question. The character encoding is system dependent (a-la FAQ 10), so your mileage may vary, but this works for me on the Mac:
> exiftool a.jpg b.jpg -filename -subject -if '$subject =~ /[‰ ˆ¸ƒ÷‹fl]/'
======== a.jpg
File Name : a.jpg
Subject : ƒ
1 files failed condition
- Phil
Just like to add a quite helpful Table here.
Table for Debugging Common UTF-8 Character Encoding Problems.
Unicode | Win1252 | Expected | Actual | UTF-8Byte | | | Unicode | Win1252 | Expected | Actual | UTF-8Byte |
U+20AC | 0x80 | € | â,¬ | %E2 %82 %AC | | | U+00C0 | 0xC0 | À | À | %C3 %80 |
| 0x81 | | | | | | U+00C1 | 0xC1 | Á | Ã | %C3 %81 |
U+201A | 0x82 | , | ‚ | %E2 %80 %9A | | | U+00C2 | 0xC2 |  | Ã, | %C3 %82 |
U+0192 | 0x83 | ƒ | Æ' | %C6 %92 | | | U+00C3 | 0xC3 | à | Ã | %C3 %83 |
U+201E | 0x84 | ,, | „ | %E2 %80 %9E | | | U+00C4 | 0xC4 | Ä | Ã,, | %C3 %84 |
U+2026 | 0x85 | ... | … | %E2 %80 %A6 | | | U+00C5 | 0xC5 | Å | Ã... | %C3 %85 |
U+2020 | 0x86 | † | †| %E2 %80 %A0 | | | U+00C6 | 0xC6 | Æ | Æ | %C3 %86 |
U+2021 | 0x87 | ‡ | ‡ | %E2 %80 %A1 | | | U+00C7 | 0xC7 | Ç | Ç | %C3 %87 |
U+02C6 | 0x88 | ˆ | ˆ | %CB %86 | | | U+00C8 | 0xC8 | È | È | %C3 %88 |
U+2030 | 0x89 | ‰ | ‰ | %E2 %80 %B0 | | | U+00C9 | 0xC9 | É | É | %C3 %89 |
U+0160 | 0x8A | Š | Å | %C5 %A0 | | | U+00CA | 0xCA | Ê | Ê | %C3 %8A |
U+2039 | 0x8B | ‹ | ‹ | %E2 %80 %B9 | | | U+00CB | 0xCB | Ë | Ë | %C3 %8B |
U+0152 | 0x8C | Œ | Å' | %C5 %92 | | | U+00CC | 0xCC | Ì | ÃŒ | %C3 %8C |
| 0x8D | | | | | | U+00CD | 0xCD | Í | Ã | %C3 %8D |
U+017D | 0x8E | Ž | Ž | %C5 %BD | | | U+00CE | 0xCE | Î | ÃŽ | %C3 %8E |
| 0x8F | | | | | | U+00CF | 0xCF | Ï | Ã | %C3 %8F |
| 0x90 | | | | | | U+00D0 | 0xD0 | Ð | Ã | %C3 %90 |
U+2018 | 0x91 | ' | ‘ | %E2 %80 %98 | | | U+00D1 | 0xD1 | Ñ | Ã' | %C3 %91 |
U+2019 | 0x92 | ' | ’ | %E2 %80 %99 | | | U+00D2 | 0xD2 | Ò | Ã' | %C3 %92 |
U+201C | 0x93 | " | “ | %E2 %80 %9C | | | U+00D3 | 0xD3 | Ó | Ã" | %C3 %93 |
U+201D | 0x94 | " | †| %E2 %80 %9D | | | U+00D4 | 0xD4 | Ô | Ã" | %C3 %94 |
U+2022 | 0x95 | • | • | %E2 %80 %A2 | | | U+00D5 | 0xD5 | Õ | Õ | %C3 %95 |
U+2013 | 0x96 | – | â€" | %E2 %80 %93 | | | U+00D6 | 0xD6 | Ö | Ö | %C3 %96 |
U+2014 | 0x97 | — | â€" | %E2 %80 %94 | | | U+00D7 | 0xD7 | × | × | %C3 %97 |
U+02DC | 0x98 | ˜ | Ëœ | %CB %9C | | | U+00D8 | 0xD8 | Ø | Ø | %C3 %98 |
U+2122 | 0x99 | ™ | â,,¢ | %E2 %84 %A2 | | | U+00D9 | 0xD9 | Ù | Ù | %C3 %99 |
U+0161 | 0x9A | š | Å¡ | %C5 %A1 | | | U+00DA | 0xDA | Ú | Ú | %C3 %9A |
U+203A | 0x9B | › | › | %E2 %80 %BA | | | U+00DB | 0xDB | Û | Û | %C3 %9B |
U+0153 | 0x9C | œ | Å" | %C5 %93 | | | U+00DC | 0xDC | Ü | Ü | %C3 %9C |
| 0x9D | | | | | | U+00DD | 0xDD | Ý | Ã | %C3 %9D |
U+017E | 0x9E | ž | ž | %C5 %BE | | | U+00DE | 0xDE | Þ | Þ | %C3 %9E |
U+0178 | 0x9F | Ÿ | Ÿ | %C5 %B8 | | | U+00DF | 0xDF | ß | ß | %C3 %9F |
U+00A0 | 0xA0 | | Â | %C2 %A0 | | | U+00E0 | 0xE0 | à | Ã | %C3 %A0 |
U+00A1 | 0xA1 | ¡ | ¡ | %C2 %A1 | | | U+00E1 | 0xE1 | á | á | %C3 %A1 |
U+00A2 | 0xA2 | ¢ | ¢ | %C2 %A2 | | | U+00E2 | 0xE2 | â | â | %C3 %A2 |
U+00A3 | 0xA3 | £ | £ | %C2 %A3 | | | U+00E3 | 0xE3 | ã | ã | %C3 %A3 |
U+00A4 | 0xA4 | ¤ | ¤ | %C2 %A4 | | | U+00E4 | 0xE4 | ä | ä | %C3 %A4 |
U+00A5 | 0xA5 | ¥ | Â¥ | %C2 %A5 | | | U+00E5 | 0xE5 | å | Ã¥ | %C3 %A5 |
U+00A6 | 0xA6 | ¦ | ¦ | %C2 %A6 | | | U+00E6 | 0xE6 | æ | æ | %C3 %A6 |
U+00A7 | 0xA7 | § | § | %C2 %A7 | | | U+00E7 | 0xE7 | ç | ç | %C3 %A7 |
U+00A8 | 0xA8 | ¨ | ¨ | %C2 %A8 | | | U+00E8 | 0xE8 | è | è | %C3 %A8 |
U+00A9 | 0xA9 | © | © | %C2 %A9 | | | U+00E9 | 0xE9 | é | é | %C3 %A9 |
U+00AA | 0xAA | ª | ª | %C2 %AA | | | U+00EA | 0xEA | ê | ê | %C3 %AA |
U+00AB | 0xAB | « | « | %C2 %AB | | | U+00EB | 0xEB | ë | ë | %C3 %AB |
U+00AC | 0xAC | ¬ | ¬ | %C2 %AC | | | U+00EC | 0xEC | ì | ì | %C3 %AC |
U+00AD | 0xAD | | Â | %C2 %AD | | | U+00ED | 0xED | í | Ã | %C3 %AD |
U+00AE | 0xAE | ® | ® | %C2 %AE | | | U+00EE | 0xEE | î | î | %C3 %AE |
U+00AF | 0xAF | ¯ | ¯ | %C2 %AF | | | U+00EF | 0xEF | ï | ï | %C3 %AF |
U+00B0 | 0xB0 | ° | ° | %C2 %B0 | | | U+00F0 | 0xF0 | ð | ð | %C3 %B0 |
U+00B1 | 0xB1 | ± | ± | %C2 %B1 | | | U+00F1 | 0xF1 | ñ | ñ | %C3 %B1 |
U+00B2 | 0xB2 | ² | ² | %C2 %B2 | | | U+00F2 | 0xF2 | ò | ò | %C3 %B2 |
U+00B3 | 0xB3 | ³ | ³ | %C2 %B3 | | | U+00F3 | 0xF3 | ó | ó | %C3 %B3 |
U+00B4 | 0xB4 | ´ | ´ | %C2 %B4 | | | U+00F4 | 0xF4 | ô | ô | %C3 %B4 |
U+00B5 | 0xB5 | µ | µ | %C2 %B5 | | | U+00F5 | 0xF5 | õ | õ | %C3 %B5 |
U+00B6 | 0xB6 | ¶ | ¶ | %C2 %B6 | | | U+00F6 | 0xF6 | ö | ö | %C3 %B6 |
U+00B7 | 0xB7 | · | · | %C2 %B7 | | | U+00F7 | 0xF7 | ÷ | ÷ | %C3 %B7 |
U+00B8 | 0xB8 | ¸ | ¸ | %C2 %B8 | | | U+00F8 | 0xF8 | ø | ø | %C3 %B8 |
U+00B9 | 0xB9 | ¹ | ¹ | %C2 %B9 | | | U+00F9 | 0xF9 | ù | ù | %C3 %B9 |
U+00BA | 0xBA | º | º | %C2 %BA | | | U+00FA | 0xFA | ú | ú | %C3 %BA |
U+00BB | 0xBB | » | » | %C2 %BB | | | U+00FB | 0xFB | û | û | %C3 %BB |
U+00BC | 0xBC | ¼ | ¼ | %C2 %BC | | | U+00FC | 0xFC | ü | ü | %C3 %BC |
U+00BD | 0xBD | ½ | ½ | %C2 %BD | | | U+00FD | 0xFD | ý | ý | %C3 %BD |
U+00BE | 0xBE | ¾ | ¾ | %C2 %BE | | | U+00FE | 0xFE | þ | þ | %C3 %BE |
U+00BF | 0xBF | ¿ | ¿ | %C2 %BF | | | U+00FF | 0xFF | ÿ | ÿ | %C3 %BF |
Source: https://www.i18nqa.com/debug/utf8-debug.html (https://www.i18nqa.com/debug/utf8-debug.html)