Malformed UTF-8 - then it got worse

Started by TOPsie, October 15, 2016, 12:39:51 PM

Previous topic - Next topic

TOPsie

I am trying to add IPTC data to some scanned images.

All worked great until I wanted to add some text in Austrian (German). The text is

ÖBB 91.107 Mürzzuschlag Jul'70

the original failure I got was "Malformed UTF-8 characters, but I found a forum thread and added
-charset iptc=utf8 -charset exiftool=utf8

So my command line is now:-

C:\windows\exiftool.exe  -charset iptc=utf8 -charset exiftool=utf8  -IPTC:Objectname="ÖBB 91.107 Mürzzuschlag Jul'70 -1"  -IPTC:Keywords=;Mürzzuschlag;  D:\OldDdrive\TAL1990.jpg

This passes through exiftool without error, but then it starts to unravel.
My normal viewing tool (Windows Photo Viewer) does not display IPTC Object name - for any file - so I cannot check if the correct value has been set.
If I try and load it in Photoshop (which should let me see this field) the load fails "invalid end of file condition" !!

My final destination is Zenfolio and the photo displays but the Objectname is not visible. (so it has failed to "see" it)

Curiously the ü character is visible in the Keywords section so that is OK.

From past experience elsewhere I have a vague memory that Ö is an even more special character than the lower case accented characters.

Any clues as to what is happening (the Photoshop "crash" seems to indicate some sort of corruption?

Hayo Baan

Strange! If exiftool processed the file OK, it should be OK. Unless the image data was corrupt to begin with, of course. Can you post a (small) sample file that exhibits this problem? A before and after version of the file would be most useful.
Hayo Baan – Photography
Web: www.hayobaan.nl

TOPsie

Two files - original and EXIFTOOL'd file with the Ö character - using the command line as per my original post.
Note that I have processed 100's of very similar files - all with "plain" English characters with complete success

Hayo Baan

I had a look at the files and I notice two problems:

  • Both your original and your exiftool edited file are corrupt in a way that causes Photoshop to not be able to open them (they show fine in some other software I tried though, so not all is lost).
  • You did not succeed in setting the tags to have proper utf8 characters.

The first issue is quite bad as it stops you from being able to edit the jpg in Photoshop. I tried rewriting the file using the trick from FAQ20, but that didn't work because exiftool could not find the JPEG EOI marker (which is what caused Photoshop to not like the file as well). This error is too severe for exiftool to be able to do anything with the file, so that route is closed. The only alternative I see now is to either recreate the jpgs or to resave them (but that degrade the quality as it reapplies jpg compression to the image data).

The second issue is probably caused by you having exiftool interpret your non-UTF8 characters on the command-line  as UTF8 (windows command-line is not UTF8). I can't test this (don't have a windows machine available at the moment), but I think you could leave out -charset exiftool=utf8, and/or change your windows command-line codepage to cp65001, which is similar to utf8, (run chcp 65001 on the command-line).

Hope that helps,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

TOPsie

Thank you. You are correct about the file corruption. The bigger story of what I am doing is that I have an original scan - done at 4800 dpi.
I am trying to firstly reduce the file to 300dpi, then pass it through Exiftool to add tags. As you report, and I now see, the corruption is being introduced by my reduction (to 300dpi) code and is nothing to do with Exiftool.

So I still have my original file (untouched). Firstly I will have to sort out why my reduction code is corrupting the file. Then I can look a bit deeper at the character set issue.


THANK YOU for this advice.

I may be gone some time - while I sort this out  ;)


Hayo Baan

Good to hear you still have the originals! When you have sorted out the problem with the reduction, the issue with the characters showing up incorrectly should be easy to solve :)
Hayo Baan – Photography
Web: www.hayobaan.nl

TOPsie

OK - Hands up - I think I have got it all wrong all by myself.

Nothing wrong with Exiftool or Zenfolio - just a bit of idiocy from myself.

Once I had added the bit about UTF8 to my command line then it all works.

I obviously got a bit confused with files created before and after this problem.

Now I have all the latest files in place all looks good

(just got to sort out why PhotoShop reports  corrupt file - but that is my issue nothing to do with Exiftool

Retires stage left with red face  :-[  ;)


Hayo Baan

Excellent! Glad things worked out in the end. One question though, in the file you uploaded the umlauts did not come across correctly as UTF8, so may may want to have a look at that.
Hayo Baan – Photography
Web: www.hayobaan.nl