Main Menu

Damaged metadata?

Started by Alyssa, September 02, 2014, 02:00:26 PM

Previous topic - Next topic

Alyssa

Hello

I encountered a little problem with some images I've recovered from an online backup. At first there wasn't any sign of anomaly, but when I tried to process some with exiftool I encountered an error.
Using exiftool itself I analysed the metadata and I discovered irregular characters in a tag(XPcomment).



Apparently Windows 7 reads the tag correctly (as shown in the right panel), however exiftool detects irregular characters.

Isn't there a way to "fix" this or do I have to rewrite the tag from scratch?, it looks strange that Windows reads it correctly while exiftool doesnt.

Phil Harvey

It is likely that ExifTool is reading this correctly, but that your terminal isn't displaying the UTF-8 characters properly.

Try adding -L to the command, and reading FAQ 10.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Alyssa

I fear its not just the terminal



Even exiftool doesn't read the tag correctly. I tried both -L and -E for Html charset, but the result was the same.

The script I am using here its the one that you provided me some time ago to set file create date based on the XPcomment tag:

C:\exiftool.exe "-filecreatedate<${exif:xpcomment;s[(\d{2})/(\d{2})/(\d{4})][$3/$2/$1]}" -L -k .

For windows 7's explorer the tag seems just fine:





Hayo Baan

Can you post a sample file? The smaller the better, just as long it contains the problem.
Hayo Baan – Photography
Web: www.hayobaan.nl

Alyssa

I attached one of the images.

Phil Harvey

Hi Alyssa,

Thanks for the sample image.

Here is what I get on the Mac for this image (in a UTF-8 terminal window):

> exiftool ~/Desktop/Random\ image.jpg -xpcomment
XP Comment                      : ‎16/06/‎2014 Ore 01:59:16[Lunedì]


This seems correct to me.  So we are left with a terminal character set issue.

It could be that these characters don't exist in Windows Latin1, in which case the -L option won't work.  Try writing the output to a file and opening in a UTF-8 aware text editor.  Like this:

exiftool "Random image.jpg" -xpcomment > out.txt

- Phil

BTW, I get this when I use the -E option, and you should get the same:

> exiftool ~/Desktop/Random\ image.jpg -xpcomment -E
XP Comment                      : &lrm;16/06/&lrm;2014 Ore 01:59:16[Luned&igrave;]


... which also seems correct to me (although I don't know what a &lrm; character is).
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Alyssa

#6
Hi Phil

I exported the tag as you suggested, with Windows's Wordpad I get this:
‎16/06/‎2014 Ore 01:59:16[Lunedì]

With basic notepad:
‎16/06/‎2014 Ore 01:59:16[Lunedì]
(As it should look)

Opening the file with Microsoft Word it suggests me to use the UTF-8 charset, and it also display it properly
16/06/‎2014 Ore 01:59:16[Lunedì]

In Faq#20 it says that tags like XPcomment/XPtitle/etc use the little-endian Unicode (UCS‑2), so I tried something like this:


C:\exiftool.exe "-filecreatedate<${exif:xpcomment;s[(\d{2})/(\d{2})/(\d{4})][$3/$2/$1]}" -charset exif=ucs-2 -k .


But it says that ucs-2 its an invalid charset.

Hayo Baan

&lrm; is the Left-to-right mark: wikipedia
Why it's there, I don't know, but windows sure doesn't know how to handle it on the command line as it seems >:(
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

Thanks Hayo.

Quote from: Alyssa on September 03, 2014, 07:38:27 PM
In Faq#20 it says that tags like XPcomment/XPtitle/etc use the little-endian Unicode (UCS‑2), so I tried something like this:

C:\exiftool.exe "-filecreatedate<${exif:xpcomment;s[(\d{2})/(\d{2})/(\d{4})][$3/$2/$1]}" -charset exif=ucs-2 -k .

This won't work for 2 reasons:  1) this isn't stored as an EXIF "string", so -charset exif=XXX will have no effect.  2) ExifTool is already interpreting this correctly. 

The problem is not in ExifTook, but in how the ExifTool output is being displayed.  This will display properly if the display software handles UTF-8 properly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Alyssa

I understand, so its because Windows 7 cmd console not reading correctly those characters?, its system-related?

Is there a way to fix this like...changing my system's default language ?

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Alyssa

Hi Phil
I read the article and tried to do so specified.
First I selected TT Lucida Console as the default Font

Typed chcp 65001(Faq says to press return?)

Then I tried to run the script:

But I get the same error:

I try then to check again the image's metadata:

And it shows that yet still the XP comment tag isn't shown properly despite the above settings.

As suggested in Faq 18 I try with chcp 1252

And try to check the image's metadata again:

Which shows yet again an invalid charset in the XP comment tag.
I try to run anyway the script with the -L option as suggested by the Faq:

But with no success.

Hayo Baan

Alyssa,

Sigh, I see. Though with the 65001 codepage, the output is almost correct, there are indeed still some problems. Microsoft's implementation of Unicode is just completely crappy and I'm afraid there's not much that can be done about that. However, in your case, I'm quite sure your display problems would vanish if the completely unnecessary (you're not mixing "normal" script with e.g., Arabic or Hebrew), Left-to-Right marker would be removed. How it got there, I don't know (who wrote the XPComment?), but you should be able to get rid of it by overwriting it with a clean version of the text:
exiftool -XPComment="16/06/2014 Ore 01:59:16[Lunedì]"  "Random image.jpg"
Or, if you have more files with this marker character:
exiftool -XPComment"<${XPComment;s/\x{e2}\x{80}\x{8e}//g}" "Random image.jpg"

Hope this helps,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

Alyssa

Hi Hayo

I used the script you provided and I got this "interesting" error:


Hayo Baan

Alyssa,

I guess you pasted my code? Microsoft didn't preserve the Unicode properly when doing so though. As you can see the ì got replaced with an accented y, and probably malformed the code as well.

Try typing in the code manually, or make use of the 65001 code page. If that doesn't work, use my second piece of code to remove the left-to-right marks that are causing the trouble.

HTH,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl