ExifTool Forum

General => Metadata => Topic started by: Alyssa on September 02, 2014, 02:00:26 PM

Title: Damaged metadata?
Post by: Alyssa on September 02, 2014, 02:00:26 PM
Hello

I encountered a little problem with some images I've recovered from an online backup. At first there wasn't any sign of anomaly, but when I tried to process some with exiftool I encountered an error.
Using exiftool itself I analysed the metadata and I discovered irregular characters in a tag(XPcomment).

(https://lh3.googleusercontent.com/-DeaPw1WrC7U/U_91rJzKKpI/AAAAAAAB7-M/rBHrRhNNPI0/w1079-h618-no/Senza%2Btitolo%2B-%2B2.jpg)

Apparently Windows 7 reads the tag correctly (as shown in the right panel), however exiftool detects irregular characters.

Isn't there a way to "fix" this or do I have to rewrite the tag from scratch?, it looks strange that Windows reads it correctly while exiftool doesnt.
Title: Re: Damaged metadata?
Post by: Phil Harvey on September 03, 2014, 07:11:51 AM
It is likely that ExifTool is reading this correctly, but that your terminal isn't displaying the UTF-8 characters properly.

Try adding -L to the command, and reading FAQ 10 (https://exiftool.org/faq.html#Q10).

- Phil
Title: Re: Damaged metadata?
Post by: Alyssa on September 03, 2014, 03:19:30 PM
I fear its not just the terminal

(https://lh6.googleusercontent.com/xHwbUcOgOBESLNTML3mGk3SDiqzHA7090NOwOIuFmE2N=w656-h279-no)

Even exiftool doesn't read the tag correctly. I tried both -L and -E for Html charset, but the result was the same.

The script I am using here its the one that you provided me some time ago to set file create date based on the XPcomment tag:

C:\exiftool.exe "-filecreatedate<${exif:xpcomment;s[(\d{2})/(\d{2})/(\d{4})][$3/$2/$1]}" -L -k .

For windows 7's explorer the tag seems just fine:

(https://lh5.googleusercontent.com/tIhxCFl76bv3NvObl5iDlozfP3jFHHEkdn_fApzbNpJg=w381-h515)


Title: Re: Damaged metadata?
Post by: Hayo Baan on September 03, 2014, 04:35:44 PM
Can you post a sample file? The smaller the better, just as long it contains the problem.
Title: Re: Damaged metadata?
Post by: Alyssa on September 03, 2014, 05:05:16 PM
I attached one of the images.
Title: Re: Damaged metadata?
Post by: Phil Harvey on September 03, 2014, 07:25:47 PM
Hi Alyssa,

Thanks for the sample image.

Here is what I get on the Mac for this image (in a UTF-8 terminal window):

> exiftool ~/Desktop/Random\ image.jpg -xpcomment
XP Comment                      : ‎16/06/‎2014 Ore 01:59:16[Lunedì]


This seems correct to me.  So we are left with a terminal character set issue.

It could be that these characters don't exist in Windows Latin1, in which case the -L option won't work.  Try writing the output to a file and opening in a UTF-8 aware text editor.  Like this:

exiftool "Random image.jpg" -xpcomment > out.txt

- Phil

BTW, I get this when I use the -E option, and you should get the same:

> exiftool ~/Desktop/Random\ image.jpg -xpcomment -E
XP Comment                      : &lrm;16/06/&lrm;2014 Ore 01:59:16[Luned&igrave;]


... which also seems correct to me (although I don't know what a &lrm; character is).
Title: Re: Damaged metadata?
Post by: Alyssa on September 03, 2014, 07:38:27 PM
Hi Phil

I exported the tag as you suggested, with Windows's Wordpad I get this:
‎16/06/‎2014 Ore 01:59:16[Lunedì]

With basic notepad:
‎16/06/‎2014 Ore 01:59:16[Lunedì]
(As it should look)

Opening the file with Microsoft Word it suggests me to use the UTF-8 charset, and it also display it properly
16/06/‎2014 Ore 01:59:16[Lunedì]

In Faq#20 it says that tags like XPcomment/XPtitle/etc use the little-endian Unicode (UCS‑2), so I tried something like this:


C:\exiftool.exe "-filecreatedate<${exif:xpcomment;s[(\d{2})/(\d{2})/(\d{4})][$3/$2/$1]}" -charset exif=ucs-2 -k .


But it says that ucs-2 its an invalid charset.
Title: Re: Damaged metadata?
Post by: Hayo Baan on September 04, 2014, 01:59:49 AM
&lrm; is the Left-to-right mark: wikipedia (http://en.m.wikipedia.org/wiki/Left-to-right_mark)
Why it's there, I don't know, but windows sure doesn't know how to handle it on the command line as it seems >:(
Title: Re: Damaged metadata?
Post by: Phil Harvey on September 04, 2014, 07:10:51 AM
Thanks Hayo.

Quote from: Alyssa on September 03, 2014, 07:38:27 PM
In Faq#20 it says that tags like XPcomment/XPtitle/etc use the little-endian Unicode (UCS‑2), so I tried something like this:

C:\exiftool.exe "-filecreatedate<${exif:xpcomment;s[(\d{2})/(\d{2})/(\d{4})][$3/$2/$1]}" -charset exif=ucs-2 -k .

This won't work for 2 reasons:  1) this isn't stored as an EXIF "string", so -charset exif=XXX will have no effect.  2) ExifTool is already interpreting this correctly. 

The problem is not in ExifTook, but in how the ExifTool output is being displayed.  This will display properly if the display software handles UTF-8 properly.

- Phil
Title: Re: Damaged metadata?
Post by: Alyssa on September 04, 2014, 11:34:15 AM
I understand, so its because Windows 7 cmd console not reading correctly those characters?, its system-related?

Is there a way to fix this like...changing my system's default language ?
Title: Re: Damaged metadata?
Post by: Phil Harvey on September 04, 2014, 11:41:56 AM
This is FAQ 18 (https://exiftool.org/faq.html#Q18)

- Phil
Title: Re: Damaged metadata?
Post by: Alyssa on September 04, 2014, 01:40:31 PM
Hi Phil
I read the article and tried to do so specified.
First I selected TT Lucida Console as the default Font
(https://lh4.googleusercontent.com/-H0JOqsPSi3g/VAig2DzAZLI/AAAAAAAB9-o/cjD9yvnNN60/w1024-h492-no/1.jpg)
Typed chcp 65001(Faq says to press return?)
(https://lh5.googleusercontent.com/-x7CgJPG3ef4/VAihTU6n3CI/AAAAAAAB9_E/sSI-0T1kELE/w604-h395-no/2.jpg)
Then I tried to run the script:
(https://lh6.googleusercontent.com/-P9uHw-4r9uo/VAih0LPxlpI/AAAAAAAB9_U/Hva-1kIB3u0/w600-h395-no/3.jpg)
But I get the same error:
(https://lh6.googleusercontent.com/jHYndMBGbkylxr_yq0X1v4jepE6D2YQ9LTFYOQRLTdaY=w603-h393-no)
I try then to check again the image's metadata:
(https://lh4.googleusercontent.com/-hhidfb7QLy0/VAiiZUmt5yI/AAAAAAAB9_w/zETvJNlkGm0/w575-h922-no/5.jpg)
And it shows that yet still the XP comment tag isn't shown properly despite the above settings.

As suggested in Faq 18 I try with chcp 1252
(https://lh5.googleusercontent.com/aWPZn-vEfiml1nxDz4OniukgQf5AgB00OEDlPOA5T6da=w596-h419-no)
And try to check the image's metadata again:
(https://lh5.googleusercontent.com/-fWVzCFTyobk/VAijUBv_N6I/AAAAAAAB-AU/6dWDNIX-dds/w576-h761-no/7.jpg)
Which shows yet again an invalid charset in the XP comment tag.
I try to run anyway the script with the -L option as suggested by the Faq:
(https://lh6.googleusercontent.com/-yRRbSz3cQGM/VAijxBMp1DI/AAAAAAAB-Ac/k5A-18O2RHM/w582-h831-no/8.jpg)
But with no success.
Title: Re: Damaged metadata?
Post by: Hayo Baan on September 04, 2014, 03:00:14 PM
Alyssa,

Sigh, I see. Though with the 65001 codepage, the output is almost correct, there are indeed still some problems. Microsoft's implementation of Unicode is just completely crappy and I'm afraid there's not much that can be done about that. However, in your case, I'm quite sure your display problems would vanish if the completely unnecessary (you're not mixing "normal" script with e.g., Arabic or Hebrew), Left-to-Right marker would be removed. How it got there, I don't know (who wrote the XPComment?), but you should be able to get rid of it by overwriting it with a clean version of the text:
exiftool -XPComment="16/06/2014 Ore 01:59:16[Lunedì]"  "Random image.jpg"
Or, if you have more files with this marker character:
exiftool -XPComment"<${XPComment;s/\x{e2}\x{80}\x{8e}//g}" "Random image.jpg"

Hope this helps,
Hayo
Title: Re: Damaged metadata?
Post by: Alyssa on September 04, 2014, 08:42:10 PM
Hi Hayo

I used the script you provided and I got this "interesting" error:

(https://lh3.googleusercontent.com/L7PZ3Xn_32WqbMe8_hIvd4XwKKySGBufDz52iKfYWdWI=w602-h153)
Title: Re: Damaged metadata?
Post by: Hayo Baan on September 05, 2014, 01:52:30 AM
Alyssa,

I guess you pasted my code? Microsoft didn't preserve the Unicode properly when doing so though. As you can see the ì got replaced with an accented y, and probably malformed the code as well.

Try typing in the code manually, or make use of the 65001 code page. If that doesn't work, use my second piece of code to remove the left-to-right marks that are causing the trouble.

HTH,
Hayo
Title: Re: Damaged metadata?
Post by: Alyssa on September 05, 2014, 08:07:22 AM
Hi Hayo

Unfortunately even fixing the script doesn't change the result:
(https://lh4.googleusercontent.com/ux2BBr63Uj5WPr-5-fxioznQK4tRov5ddf5Er_yYhXkF=w657-h155)

Update:
I tried the same script on the same image on a different machine(a laptop with Windows 8.1 installed), and I got the exact same error.
Title: Re: Damaged metadata?
Post by: Hayo Baan on September 06, 2014, 02:08:14 AM
Hi Alyssa,

Hmm, I'm not sure what Microsoft has done to Unicode support, but it sure doesn't play by the rules...
Have you tried my second, more generic, version of the code to remove the characters already? That should definitely work.

If that still doesn't work, get a Mac ;)
Title: Re: Damaged metadata?
Post by: Phil Harvey on September 06, 2014, 07:38:08 AM
Alyssa,

You can bypass the Windows cmd-line character problems by using a UTF-8 aware text editor to write the desired string to a file (eg. "string.txt"), then using a command like this:

exiftool "-xpcomment<=string.txt" FILE

- Phil

Edit: I have added this to FAQ 18.
Title: Re: Damaged metadata?
Post by: Alyssa on September 06, 2014, 01:19:47 PM
Hi again, sorry for the late reply.

I tried again the second script Hayo suggested and it worked finally, I don't known why it wasn't working the first time I tried it, maybe I copied it wrong.

Thanks for helping me solving this issue :D