Please help check I'm writing EXIF and IPTC tags properly

Started by joakimk, March 23, 2017, 03:45:02 PM

Previous topic - Next topic

joakimk

I'm working on an (Android/Java) app to read and write metadata into JPEG (using Apache Sanselan). I want to support both EXIF and IPTC, as per the recommendations from this forum (Phil Harvey):
QuoteThey recommend writing all of EXIF:ImageDescription, IPTC:Caption-Abstract and XMP-dc:Description.

I believe I have both the EXIF and the IPTC tags implemented correctly, but would someone mind having a look at the attached image to see if the JPEG looks OK? Especially the IPTC part (since you've been so kind as to review my EXIF tagging previously). I'm having some charset issues on Windows, making it hard for me to render/print the comment in ExifTools or in the log console of my Java IDE (Android Studio):
T��h��i��s�� ��i��s�� ��a�� ��t��e��s��t�� ��c��o��m��m��e��n��t

but the comment sure looks fine both in Picasa:


and in Windows (right-click > Properties):


Please have a look, and thank you very much for you attention!


Joakim


joakimk

I'm having some security problems attaching a JPEG to the post.
How do I fix that?

joakimk

Got some help at #android-dev (IRC):

Since I need support for "international characters" (like æ, ø, å) I use UTF-16 encoding. However, it seems I was not writing *nor* reading the text entirely properly. Firstly, I was using a hack to add a "Unicode marker" (some bytes) to the beginning of the string, which really don't seem to be necessary (doing anything helpful). Then, at reading, I wasn't encoding the bytes to UTF-16. Since Java expects Strings are UTF-8 encoded, it gave me all those null-bytes.

Here's some updated (simplified) code on how to write (again, thanks to #android-dev) using Apache Sanselan:

String textToSet = "This is a comment";
byte[] comment = textToSet.getBytes("UnicodeLittle");
TiffOutputField exif_comment = new TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT,
TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED, comment.length, comment);


and how to read:

JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
TiffField field = jpegMetadata.findEXIFValue(TiffConstants.EXIF_TAG_USER_COMMENT);
if (field != null) {
        String text = new String(field.getByteArrayValue(), "UTF-16");
}



joakimk

Would still like to post a test JPEG, though, in case someone might take a look :)

Hayo Baan

I haven't yet had time to look at your jpg, but instead of UTF16 I strongly suggest using UTF8; it's what most of the world would expect (and is the default for exiftool too).
Hayo Baan – Photography
Web: www.hayobaan.nl

joakimk

Thank you for your input! Now that I had another look at the charsets and encoding, I'll give it another shot with UTF-8. As you say, it's the expected standard and should definitely cover my needs, too.

That said, what I really hope to have some help with is to check that the "directory structure" of the metadata looks right, after I've added my tags. Especially the IPTC tag.

Joakim

joakimk

Thanks for the tip! UTF-8 works, as expected :) Somehow got it messed up on my previous attempt, which made me switch to UTF-16.



Updated test file attached.


Joakim

Phil Harvey

ExifIFD:UserComment and IPTC:Caption-Abstract both look good.

ExifIFD:XPComment is garbled.  (XPComment should always be in little-endian byte order)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

C:\Users\joakimk\Downloads\exiftool-10.46>exiftool.exe ..\test_UTF8-again.jpg
ExifTool Version Number         : 10.46
File Name                       : test_UTF8-again.jpg
Directory                       : ..
File Size                       : 913 kB
File Modification Date/Time     : 2017:03:24 18:45:32+01:00
File Access Date/Time           : 2017:03:24 18:45:32+01:00
File Creation Date/Time         : 2017:03:24 18:45:32+01:00
File Permissions                : rw-rw-rw-
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
Exif Byte Order                 : Big-endian (Motorola, MM)
Make                            : LGE
Camera Model Name               : Nexus 5
Orientation                     : Horizontal (normal)
X Resolution                    : 72
Y Resolution                    : 72
Resolution Unit                 : inches
Modify Date                     : 2017:03:24 18:44:38
Y Cb Cr Positioning             : Centered
Image Description               : tittel
Exposure Time                   : 1/17
F Number                        : 2.4
ISO                             : 2316
Exif Version                    : 0220
Date/Time Original              : 2017:03:24 18:44:38
Create Date                     : 2017:03:24 18:44:38
Components Configuration        : Y, Cb, Cr, -
Shutter Speed Value             : 17
Aperture Value                  : 2.4
Flash                           : No Flash
Focal Length                    : 4.0 mm
Warning                         : Invalid EXIF text encoding for UserComment
User Comment                    : This is a test comment

Sub Sec Time                    : 674450
Sub Sec Time Original           : 674450
Sub Sec Time Digitized          : 674450
XP Comment                      : This is a test comment
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 1536
Exif Image Height               : 2048
Interoperability Index          : R98 - DCF basic file (sRGB)
Interoperability Version        : 0100
White Balance                   : Auto
Compression                     : JPEG (old-style)
Thumbnail Offset                : 20
Thumbnail Length                : 38439
JFIF Version                    : 1.01
Profile CMM Type                :
Profile Version                 : 2.0.0
Profile Class                   : Display Device Profile
Color Space Data                : RGB
Profile Connection Space        : XYZ
Profile Date Time               : 2009:03:27 21:36:31
Profile File Signature          : acsp
Primary Platform                : Unknown ()
CMM Flags                       : Not Embedded, Independent
Device Manufacturer             :
Device Model                    :
Device Attributes               : Reflective, Glossy, Positive, Color
Rendering Intent                : Perceptual
Connection Space Illuminant     : 0.9642 1 0.82491
Profile Creator                 :
Profile ID                      : 29f83ddeaff255ae7842fae4ca83390d
Profile Description             : sRGB IEC61966-2-1 black scaled
Blue Matrix Column              : 0.14307 0.06061 0.7141
Blue Tone Reproduction Curve    : (Binary data 2060 bytes, use -b option to extract)
Device Model Desc               : IEC 61966-2-1 Default RGB Colour Space - sRGB
Green Matrix Column             : 0.38515 0.71687 0.09708
Green Tone Reproduction Curve   : (Binary data 2060 bytes, use -b option to extract)
Luminance                       : 0 80 0
Measurement Observer            : CIE 1931
Measurement Backing             : 0 0 0
Measurement Geometry            : Unknown
Measurement Flare               : 0%
Measurement Illuminant          : D65
Media Black Point               : 0.01205 0.0125 0.01031
Red Matrix Column               : 0.43607 0.22249 0.01392
Red Tone Reproduction Curve     : (Binary data 2060 bytes, use -b option to extract)
Technology                      : Cathode Ray Tube Display
Viewing Cond Desc               : Reference Viewing Condition in IEC 61966-2-1
Media White Point               : 0.9642 1 0.82491
Profile Copyright               : Copyright International Color Consortium, 2009
Chromatic Adaptation            : 1.04791 0.02293 -0.0502 0.0296 0.99046 -0.01707 -0.00925 0.01506 0.75179
Current IPTC Digest             : 7b42f0564cf2eb2b7ae8e6fbb345efd0
Application Record Version      : 2
Caption-Abstract                : This is a test comment
Image Width                     : 1536
Image Height                    : 2048
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Aperture                        : 2.4
Image Size                      : 1536x2048
Megapixels                      : 3.1
Shutter Speed                   : 1/17
Create Date                     : 2017:03:24 18:44:38.674450
Date/Time Original              : 2017:03:24 18:44:38.674450
Modify Date                     : 2017:03:24 18:44:38.674450
Thumbnail Image                 : (Binary data 38439 bytes, use -b option to extract)
Focal Length                    : 4.0 mm
Light Value                     : 2.1

C:\Users\joakimk\Downloads\exiftool-10.46>


Thanks so much for checking! I made a new test file, and -- as far as I can see -- the tags look OK. XPCOMMENT looks fine, too, right?
But why do I get a "warning" on the encoding of the UserComment?

I added a prefix to the various fields I'm writing, to see where they show up in various software, and I learn the following:


  • Windows Properties shows IPTC Caption/Abstract under "Title"
  • Picasa also shows IPTC Caption/Abstract under "Caption" (under the image + in slideshow mode), which is quite nice
  • Windows Properties shows EXIF UserComment under "Comments" (I think, but I need to get the encoding problem mentioned above worked out).

Phil Harvey

#9
You can use exiftool to write UserComment to write it correctly, then use the exiftool -htmlDump option to see in detail what was written.  There are a few technical issues other than just the UserComment encoding:

> exiftool -validate -warning -a ~/Desktop/test_UTF8-again.jpg
Validate                        : 8 Warnings (5 minor)
Warning                         : Wrong IFD for ExifIFD tag 0x010e ImageDescription (found in IFD0)
Warning                         : Non-standard format (undef) for ExifIFD 0x010e ImageDescription
Warning                         : [minor] Odd offset for ExifIFD tag 0x010e
Warning                         : [minor] Odd offset for ExifIFD tag 0x9286
Warning                         : Invalid EXIF text encoding for UserComment
Warning                         : [minor] Odd offset for ExifIFD tag 0x9291
Warning                         : [minor] Non-standard ExifIFD tag 0x9c9c XPComment
Warning                         : [minor] Odd offset for ExifIFD tag 0x9c9c


Regarding metadata in Windows:  StarGeek has put together a very useful table describing all of this here.

- Phil

Edit: Note that the Validate feature is still experimental.  The "Wrong IFD" message should read ("should be in IFD0", not "found in IFD0").  Also XPComment is listed as Non-standard because it is in ExifIFD but should also be in IFD0.  I'll fix these messages for ExifTool 10.48.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

Thanks for the links!

However, the errors/warnings you're referring to (from validate) refer to the ImageDescription field, and not the UserComment field we were looking at here?

What I'm finding, is that the EXIF UserComment tag requires a specific charset -- as warned by ExifTool. Apache Sanselan has some bugs, which I apparently have to work around.
I found this at this discussion: http://git.net/ml/user-commons-apache/2012-03/msg00046.html

I can only make ExifTool happy if I use UTF-16LE.
Also, apparently, the content (text) has to be prefixed with a "Unicode marker": 7 bytes actually spelling out "UNICODE", with null-termination: 554E49434F444500
Could you please take a look at the attached file? As far as I can see, ExifTool no longer complains about the UserComment field.

I'd really like to get it working with UTF-8, since I don't understand why UserComment specifically requires a different encoding than the other fields, where I can write UTF-8 directly (with no "markers" etc)...



About the warnings, though:
If I validate a "plain" JPEG, taken with the same phone but with no metadata written to it, I still get "Odd offset" minor warnings. So I don't think they're "my fault" ;-)
What does the "Wrong IFD" warning (on ImageDescription) actually tell me? I'm writing to that field in a much more simple way than what I do for UserComment; I'm not using the "big hack" with the Unicode marker. I'm just writing plain UTF-8 to it.

joakimk

Interesting read about the EXIF UserComment tag: https://forums.adobe.com/thread/375932
Seems to me, the proper way to write to this field is to use UTF-16, prefixed with UNICODE\0.
Since Apache Commons Imaging (previously, Sanselan) fails to do this properly, a "big hack" is required in the Java code.

Phil Harvey

#12
It sounds like you should read the EXIF specification.  It explains the encoding for UserComment, although it doesn't explicitly state the encoding it implies UCS2 in the same byte ordering as the EXIF.  However, the MWG recommendation suggests treating this as UCS-16.

I would say that your UNICODE is in the wrong byte order.  But note that ExifTool uses a heuristic to deal with UNICODE written in the wrong byte order, and currently doesn't warn about this.  (FYI, the MWG has adopted my heuristic into their recommendations.)

The "odd offset" warnings are a clear indication of metadata written by a programmer who hasn't read/understood the TIFF specification.  ;)

- Phil

Edit: The ExifTool 10.48 Validate feature will warn about incorrect byte ordering in EXIF text
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

While you are correct, I haven't read the EXIF specification, I am not sure that the problems necessarily arise from my parts of the code. The Apache library for reading and writing EXIF metadata has not been maintained for many years, and is still in beta. With known bugs. Since I'm (mainly, with the exception of the hack for UserComment) only interfacing that library to write the tags, I'm not sure that I can change where in the directory structure the tags are written (as far as I understand the ImageDescription warning).

But I will read the standard, and hope to be able to fix the bugs. Perhaps some hack is required to fix what Sanselan is doing for the ImageDescription tag, too.

Thanks for your patience and always helpful replies!

joakimk

I've read the EXIF specification, and I understand more about the details for the tags I'm interested in (ImageDescription and UserComment). I think I'm able to write successfully to UserComment, and to the IPTC field Caption/Abstract, but I can't seem to write properly to the ImageDescription field. It always ends up in the wrong IFD (IFD0, instead of... IFD1?):

Warning                         : Wrong IFD for ExifIFD tag 0x010e ImageDescription (found in IFD0)

But first: I've attached a "plain" JPEG as generated by my cell phone (Nexus LG 5). Even if I don't modify it with my app at all, ExifTool still finds problems. So I'm starting to think this is maybe a losing battle -- that some of the bugs ExifTools is reporting are maybe due to the phone? If the starting point (original, bare JPEG) is flawed, then it's really hard to debug my app... Could you please have a look at the attached file test_bare.jpg and see if it looks messed up?

C:\Users\joakimk\Downloads\exiftool-10.46>exiftool.exe -validate -warning -a ..\test_bare.jpg
Validate                        : 6 Warnings (all minor)
Warning                         : [minor] Odd offset for ExifIFD tag 0x9201
Warning                         : [minor] Odd offset for ExifIFD tag 0x9003
Warning                         : [minor] Odd offset for ExifIFD tag 0x9292
Warning                         : [minor] Odd offset for ExifIFD tag 0x829a
Warning                         : [minor] Odd offset for IFD1 tag 0x011b
Warning                         : [minor] Odd offset for IFD1 tag 0x011a


Just to eliminate my phone/camera from the equation, I've downloaded a stock JPEG onto my app and written UserComment and Caption/Abstract to it. The stock image was first validated OK by ExifTool, so that should be a "safe" starting point. Note I'm skipping writing the ImageDescription field, which always ends up in the wrong place anyway. I'll get back to that later, and maybe I won't even need it (the IPTC field seems to be all I really need). If you'll take a look at the validation of the resulting (edited) file?

C:\Users\joakimk\Downloads\exiftool-10.46>exiftool.exe -validate -warning -a ..\croatia_oasis_wpo.jpg
Validate                        : 8 Warnings (all minor)
Warning                         : [minor] Odd offset for IFD0 tag 0x0131
Warning                         : [minor] Odd offset for IFD0 tag 0x8298
Warning                         : [minor] Odd offset for ExifIFD tag 0x9286
Warning                         : [minor] Odd offset for ExifIFD tag 0xa432
Warning                         : [minor] Odd offset for ExifIFD tag 0xa434
Warning                         : [minor] Odd offset for ExifIFD tag 0xa435
Warning                         : [minor] Odd offset for GPS tag 0x0002
Warning                         : [minor] Odd offset for GPS tag 0x0004


If a tag X is at an odd offset, then that's caused by some previous field, right, which has (somehow) not been padded to even length? It's not so much the field itself (it's content) which is to blame. Are you able to see why the offsets end up like this, where I should look to try to fix it? I'm using a "Lossless" metadata update method, from Apache, which maybe stumbles across fields while trying not to disturb them?

Thanks again!