Please help check I'm writing EXIF and IPTC tags properly

Started by joakimk, March 23, 2017, 03:45:02 PM

Previous topic - Next topic

joakimk

I'm working on an (Android/Java) app to read and write metadata into JPEG (using Apache Sanselan). I want to support both EXIF and IPTC, as per the recommendations from this forum (Phil Harvey):
QuoteThey recommend writing all of EXIF:ImageDescription, IPTC:Caption-Abstract and XMP-dc:Description.

I believe I have both the EXIF and the IPTC tags implemented correctly, but would someone mind having a look at the attached image to see if the JPEG looks OK? Especially the IPTC part (since you've been so kind as to review my EXIF tagging previously). I'm having some charset issues on Windows, making it hard for me to render/print the comment in ExifTools or in the log console of my Java IDE (Android Studio):
T��h��i��s�� ��i��s�� ��a�� ��t��e��s��t�� ��c��o��m��m��e��n��t

but the comment sure looks fine both in Picasa:


and in Windows (right-click > Properties):


Please have a look, and thank you very much for you attention!


Joakim


joakimk

I'm having some security problems attaching a JPEG to the post.
How do I fix that?

joakimk

Got some help at #android-dev (IRC):

Since I need support for "international characters" (like æ, ø, å) I use UTF-16 encoding. However, it seems I was not writing *nor* reading the text entirely properly. Firstly, I was using a hack to add a "Unicode marker" (some bytes) to the beginning of the string, which really don't seem to be necessary (doing anything helpful). Then, at reading, I wasn't encoding the bytes to UTF-16. Since Java expects Strings are UTF-8 encoded, it gave me all those null-bytes.

Here's some updated (simplified) code on how to write (again, thanks to #android-dev) using Apache Sanselan:

String textToSet = "This is a comment";
byte[] comment = textToSet.getBytes("UnicodeLittle");
TiffOutputField exif_comment = new TiffOutputField(TiffConstants.EXIF_TAG_USER_COMMENT,
TiffFieldTypeConstants.FIELD_TYPE_UNDEFINED, comment.length, comment);


and how to read:

JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
TiffField field = jpegMetadata.findEXIFValue(TiffConstants.EXIF_TAG_USER_COMMENT);
if (field != null) {
        String text = new String(field.getByteArrayValue(), "UTF-16");
}



joakimk

Would still like to post a test JPEG, though, in case someone might take a look :)

Hayo Baan

I haven't yet had time to look at your jpg, but instead of UTF16 I strongly suggest using UTF8; it's what most of the world would expect (and is the default for exiftool too).
Hayo Baan – Photography
Web: www.hayobaan.nl

joakimk

Thank you for your input! Now that I had another look at the charsets and encoding, I'll give it another shot with UTF-8. As you say, it's the expected standard and should definitely cover my needs, too.

That said, what I really hope to have some help with is to check that the "directory structure" of the metadata looks right, after I've added my tags. Especially the IPTC tag.

Joakim

joakimk

Thanks for the tip! UTF-8 works, as expected :) Somehow got it messed up on my previous attempt, which made me switch to UTF-16.



Updated test file attached.


Joakim

Phil Harvey

ExifIFD:UserComment and IPTC:Caption-Abstract both look good.

ExifIFD:XPComment is garbled.  (XPComment should always be in little-endian byte order)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

C:\Users\joakimk\Downloads\exiftool-10.46>exiftool.exe ..\test_UTF8-again.jpg
ExifTool Version Number         : 10.46
File Name                       : test_UTF8-again.jpg
Directory                       : ..
File Size                       : 913 kB
File Modification Date/Time     : 2017:03:24 18:45:32+01:00
File Access Date/Time           : 2017:03:24 18:45:32+01:00
File Creation Date/Time         : 2017:03:24 18:45:32+01:00
File Permissions                : rw-rw-rw-
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
Exif Byte Order                 : Big-endian (Motorola, MM)
Make                            : LGE
Camera Model Name               : Nexus 5
Orientation                     : Horizontal (normal)
X Resolution                    : 72
Y Resolution                    : 72
Resolution Unit                 : inches
Modify Date                     : 2017:03:24 18:44:38
Y Cb Cr Positioning             : Centered
Image Description               : tittel
Exposure Time                   : 1/17
F Number                        : 2.4
ISO                             : 2316
Exif Version                    : 0220
Date/Time Original              : 2017:03:24 18:44:38
Create Date                     : 2017:03:24 18:44:38
Components Configuration        : Y, Cb, Cr, -
Shutter Speed Value             : 17
Aperture Value                  : 2.4
Flash                           : No Flash
Focal Length                    : 4.0 mm
Warning                         : Invalid EXIF text encoding for UserComment
User Comment                    : This is a test comment

Sub Sec Time                    : 674450
Sub Sec Time Original           : 674450
Sub Sec Time Digitized          : 674450
XP Comment                      : This is a test comment
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 1536
Exif Image Height               : 2048
Interoperability Index          : R98 - DCF basic file (sRGB)
Interoperability Version        : 0100
White Balance                   : Auto
Compression                     : JPEG (old-style)
Thumbnail Offset                : 20
Thumbnail Length                : 38439
JFIF Version                    : 1.01
Profile CMM Type                :
Profile Version                 : 2.0.0
Profile Class                   : Display Device Profile
Color Space Data                : RGB
Profile Connection Space        : XYZ
Profile Date Time               : 2009:03:27 21:36:31
Profile File Signature          : acsp
Primary Platform                : Unknown ()
CMM Flags                       : Not Embedded, Independent
Device Manufacturer             :
Device Model                    :
Device Attributes               : Reflective, Glossy, Positive, Color
Rendering Intent                : Perceptual
Connection Space Illuminant     : 0.9642 1 0.82491
Profile Creator                 :
Profile ID                      : 29f83ddeaff255ae7842fae4ca83390d
Profile Description             : sRGB IEC61966-2-1 black scaled
Blue Matrix Column              : 0.14307 0.06061 0.7141
Blue Tone Reproduction Curve    : (Binary data 2060 bytes, use -b option to extract)
Device Model Desc               : IEC 61966-2-1 Default RGB Colour Space - sRGB
Green Matrix Column             : 0.38515 0.71687 0.09708
Green Tone Reproduction Curve   : (Binary data 2060 bytes, use -b option to extract)
Luminance                       : 0 80 0
Measurement Observer            : CIE 1931
Measurement Backing             : 0 0 0
Measurement Geometry            : Unknown
Measurement Flare               : 0%
Measurement Illuminant          : D65
Media Black Point               : 0.01205 0.0125 0.01031
Red Matrix Column               : 0.43607 0.22249 0.01392
Red Tone Reproduction Curve     : (Binary data 2060 bytes, use -b option to extract)
Technology                      : Cathode Ray Tube Display
Viewing Cond Desc               : Reference Viewing Condition in IEC 61966-2-1
Media White Point               : 0.9642 1 0.82491
Profile Copyright               : Copyright International Color Consortium, 2009
Chromatic Adaptation            : 1.04791 0.02293 -0.0502 0.0296 0.99046 -0.01707 -0.00925 0.01506 0.75179
Current IPTC Digest             : 7b42f0564cf2eb2b7ae8e6fbb345efd0
Application Record Version      : 2
Caption-Abstract                : This is a test comment
Image Width                     : 1536
Image Height                    : 2048
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Aperture                        : 2.4
Image Size                      : 1536x2048
Megapixels                      : 3.1
Shutter Speed                   : 1/17
Create Date                     : 2017:03:24 18:44:38.674450
Date/Time Original              : 2017:03:24 18:44:38.674450
Modify Date                     : 2017:03:24 18:44:38.674450
Thumbnail Image                 : (Binary data 38439 bytes, use -b option to extract)
Focal Length                    : 4.0 mm
Light Value                     : 2.1

C:\Users\joakimk\Downloads\exiftool-10.46>


Thanks so much for checking! I made a new test file, and -- as far as I can see -- the tags look OK. XPCOMMENT looks fine, too, right?
But why do I get a "warning" on the encoding of the UserComment?

I added a prefix to the various fields I'm writing, to see where they show up in various software, and I learn the following:


  • Windows Properties shows IPTC Caption/Abstract under "Title"
  • Picasa also shows IPTC Caption/Abstract under "Caption" (under the image + in slideshow mode), which is quite nice
  • Windows Properties shows EXIF UserComment under "Comments" (I think, but I need to get the encoding problem mentioned above worked out).

Phil Harvey

#9
You can use exiftool to write UserComment to write it correctly, then use the exiftool -htmlDump option to see in detail what was written.  There are a few technical issues other than just the UserComment encoding:

> exiftool -validate -warning -a ~/Desktop/test_UTF8-again.jpg
Validate                        : 8 Warnings (5 minor)
Warning                         : Wrong IFD for ExifIFD tag 0x010e ImageDescription (found in IFD0)
Warning                         : Non-standard format (undef) for ExifIFD 0x010e ImageDescription
Warning                         : [minor] Odd offset for ExifIFD tag 0x010e
Warning                         : [minor] Odd offset for ExifIFD tag 0x9286
Warning                         : Invalid EXIF text encoding for UserComment
Warning                         : [minor] Odd offset for ExifIFD tag 0x9291
Warning                         : [minor] Non-standard ExifIFD tag 0x9c9c XPComment
Warning                         : [minor] Odd offset for ExifIFD tag 0x9c9c


Regarding metadata in Windows:  StarGeek has put together a very useful table describing all of this here.

- Phil

Edit: Note that the Validate feature is still experimental.  The "Wrong IFD" message should read ("should be in IFD0", not "found in IFD0").  Also XPComment is listed as Non-standard because it is in ExifIFD but should also be in IFD0.  I'll fix these messages for ExifTool 10.48.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

Thanks for the links!

However, the errors/warnings you're referring to (from validate) refer to the ImageDescription field, and not the UserComment field we were looking at here?

What I'm finding, is that the EXIF UserComment tag requires a specific charset -- as warned by ExifTool. Apache Sanselan has some bugs, which I apparently have to work around.
I found this at this discussion: http://git.net/ml/user-commons-apache/2012-03/msg00046.html

I can only make ExifTool happy if I use UTF-16LE.
Also, apparently, the content (text) has to be prefixed with a "Unicode marker": 7 bytes actually spelling out "UNICODE", with null-termination: 554E49434F444500
Could you please take a look at the attached file? As far as I can see, ExifTool no longer complains about the UserComment field.

I'd really like to get it working with UTF-8, since I don't understand why UserComment specifically requires a different encoding than the other fields, where I can write UTF-8 directly (with no "markers" etc)...



About the warnings, though:
If I validate a "plain" JPEG, taken with the same phone but with no metadata written to it, I still get "Odd offset" minor warnings. So I don't think they're "my fault" ;-)
What does the "Wrong IFD" warning (on ImageDescription) actually tell me? I'm writing to that field in a much more simple way than what I do for UserComment; I'm not using the "big hack" with the Unicode marker. I'm just writing plain UTF-8 to it.

joakimk

Interesting read about the EXIF UserComment tag: https://forums.adobe.com/thread/375932
Seems to me, the proper way to write to this field is to use UTF-16, prefixed with UNICODE\0.
Since Apache Commons Imaging (previously, Sanselan) fails to do this properly, a "big hack" is required in the Java code.

Phil Harvey

#12
It sounds like you should read the EXIF specification.  It explains the encoding for UserComment, although it doesn't explicitly state the encoding it implies UCS2 in the same byte ordering as the EXIF.  However, the MWG recommendation suggests treating this as UCS-16.

I would say that your UNICODE is in the wrong byte order.  But note that ExifTool uses a heuristic to deal with UNICODE written in the wrong byte order, and currently doesn't warn about this.  (FYI, the MWG has adopted my heuristic into their recommendations.)

The "odd offset" warnings are a clear indication of metadata written by a programmer who hasn't read/understood the TIFF specification.  ;)

- Phil

Edit: The ExifTool 10.48 Validate feature will warn about incorrect byte ordering in EXIF text
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

While you are correct, I haven't read the EXIF specification, I am not sure that the problems necessarily arise from my parts of the code. The Apache library for reading and writing EXIF metadata has not been maintained for many years, and is still in beta. With known bugs. Since I'm (mainly, with the exception of the hack for UserComment) only interfacing that library to write the tags, I'm not sure that I can change where in the directory structure the tags are written (as far as I understand the ImageDescription warning).

But I will read the standard, and hope to be able to fix the bugs. Perhaps some hack is required to fix what Sanselan is doing for the ImageDescription tag, too.

Thanks for your patience and always helpful replies!

joakimk

I've read the EXIF specification, and I understand more about the details for the tags I'm interested in (ImageDescription and UserComment). I think I'm able to write successfully to UserComment, and to the IPTC field Caption/Abstract, but I can't seem to write properly to the ImageDescription field. It always ends up in the wrong IFD (IFD0, instead of... IFD1?):

Warning                         : Wrong IFD for ExifIFD tag 0x010e ImageDescription (found in IFD0)

But first: I've attached a "plain" JPEG as generated by my cell phone (Nexus LG 5). Even if I don't modify it with my app at all, ExifTool still finds problems. So I'm starting to think this is maybe a losing battle -- that some of the bugs ExifTools is reporting are maybe due to the phone? If the starting point (original, bare JPEG) is flawed, then it's really hard to debug my app... Could you please have a look at the attached file test_bare.jpg and see if it looks messed up?

C:\Users\joakimk\Downloads\exiftool-10.46>exiftool.exe -validate -warning -a ..\test_bare.jpg
Validate                        : 6 Warnings (all minor)
Warning                         : [minor] Odd offset for ExifIFD tag 0x9201
Warning                         : [minor] Odd offset for ExifIFD tag 0x9003
Warning                         : [minor] Odd offset for ExifIFD tag 0x9292
Warning                         : [minor] Odd offset for ExifIFD tag 0x829a
Warning                         : [minor] Odd offset for IFD1 tag 0x011b
Warning                         : [minor] Odd offset for IFD1 tag 0x011a


Just to eliminate my phone/camera from the equation, I've downloaded a stock JPEG onto my app and written UserComment and Caption/Abstract to it. The stock image was first validated OK by ExifTool, so that should be a "safe" starting point. Note I'm skipping writing the ImageDescription field, which always ends up in the wrong place anyway. I'll get back to that later, and maybe I won't even need it (the IPTC field seems to be all I really need). If you'll take a look at the validation of the resulting (edited) file?

C:\Users\joakimk\Downloads\exiftool-10.46>exiftool.exe -validate -warning -a ..\croatia_oasis_wpo.jpg
Validate                        : 8 Warnings (all minor)
Warning                         : [minor] Odd offset for IFD0 tag 0x0131
Warning                         : [minor] Odd offset for IFD0 tag 0x8298
Warning                         : [minor] Odd offset for ExifIFD tag 0x9286
Warning                         : [minor] Odd offset for ExifIFD tag 0xa432
Warning                         : [minor] Odd offset for ExifIFD tag 0xa434
Warning                         : [minor] Odd offset for ExifIFD tag 0xa435
Warning                         : [minor] Odd offset for GPS tag 0x0002
Warning                         : [minor] Odd offset for GPS tag 0x0004


If a tag X is at an odd offset, then that's caused by some previous field, right, which has (somehow) not been padded to even length? It's not so much the field itself (it's content) which is to blame. Are you able to see why the offsets end up like this, where I should look to try to fix it? I'm using a "Lossless" metadata update method, from Apache, which maybe stumbles across fields while trying not to disturb them?

Thanks again!

Phil Harvey

Quote from: joakimk on March 28, 2017, 04:02:26 PM
It always ends up in the wrong IFD (IFD0, instead of... IFD1?):

Warning                         : Wrong IFD for ExifIFD tag 0x010e ImageDescription (found in IFD0)

As I said, the warning needs to be fixed.  It should read "should be in IFD0".  It is currently in ExifIFD.  I will fix this message in the next release.

The "Odd offset" problems are common, which is why they are indicated as "[minor]".  You are safe to ignore these.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

Yes, the error says the ImageDescription field should be in IFD0, and not ExifIFD. As I learn more about EXIF, I understand more on how to use the Apache library. For the Exif tags, I get and write to the ExifDirectory, and that seems to work. But I need to write ImageDescription to a different directory, right?

I find the following directories in the JPEGs:

loadUI() on file /storage/emulated/0/Download/indonesia_monkey_wpo-5.jpg; inspecting metadata:
Found directory #0: Root
Found directory #1: Exif
Found directory #2: Gps
Found directory #3: Sub


Am I correct to assume Root is IFD0? Trying that, I no longer get the "Wrong IFD" error from ExifTool. Would you mind checking if the attached image looks correct? I have tried to check for myself, but -htmldump seems to reveal that the ExifIFD part now comes "before" (above) IFD0. But maybe that's not a problem/error? My app lists the directories in the same order as above, even after editing.


Phil Harvey

Most EXIF tags are in the ExifIFD, but some (the ones defined by the TIFF specification) are in IFD0.  The EXIF spec tells you which one it should go in.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

Hello again :)

I've been in touch with the people maintaining and developing the Apache library for editing image data and meta data (Apache Commons Imaging, previously Apache Sanselan), and they have found a patch for the bug which introduced the odd offsets when writing EXIF data.

Using this patched version of Sanselan (while waiting for the first stable release of Imaging), I've tested on a Nexus LG 5. As I said earlier (above), the original image (as produced by the camera on the device) does not validate with ExifTool:


$ ./exiftool\(-k\).exe -validate -warning -error -a ../IMG_20171223_200655.jpg
-- press RETURN --

Validate                        : 32 Warnings (6 minor)
Warning                         : Entries in IFD0 are out of order
Warning                         : Tag ID 0x0110 out of sequence in IFD0
Warning                         : Tag ID 0x0100 out of sequence in IFD0
Warning                         : Tag ID 0x0128 out of sequence in IFD0
Warning                         : Tag ID 0x0101 out of sequence in IFD0
Warning                         : Tag ID 0x0112 out of sequence in IFD0
Warning                         : Entries in ExifIFD are out of order
Warning                         : Tag ID 0x9004 out of sequence in ExifIFD
Warning                         : Tag ID 0x829d out of sequence in ExifIFD
Warning                         : Tag ID 0x9202 out of sequence in ExifIFD
Warning                         : Tag ID 0xa002 out of sequence in ExifIFD
Warning                         : Tag ID 0x9290 out of sequence in ExifIFD
Warning                         : [minor] Odd offset for ExifIFD tag 0x9201
Warning                         : Tag ID 0x9201 out of sequence in ExifIFD
Warning                         : [minor] Odd offset for ExifIFD tag 0x9003
Warning                         : Tag ID 0x9003 out of sequence in ExifIFD
Warning                         : [minor] Odd offset for ExifIFD tag 0x9292
Warning                         : Tag ID 0x9101 out of sequence in ExifIFD
Warning                         : Tag ID 0x9209 out of sequence in ExifIFD
Warning                         : Tag ID 0x9000 out of sequence in ExifIFD
Warning                         : Tag ID 0x8827 out of sequence in ExifIFD
Warning                         : Tag ID 0x9291 out of sequence in ExifIFD
Warning                         : [minor] Odd offset for ExifIFD tag 0x829a
Warning                         : Tag ID 0x829a out of sequence in ExifIFD
Warning                         : Tag ID 0x011a out of sequence in IFD0
Warning                         : Tag ID 0x010f out of sequence in IFD0
Warning                         : [minor] Odd offset for IFD1 tag 0x011b
Warning                         : Entries in IFD1 are out of order
Warning                         : Tag ID 0x0201 out of sequence in IFD1
Warning                         : Tag ID 0x0103 out of sequence in IFD1
Warning                         : [minor] Odd offset for IFD1 tag 0x011a
Warning                         : Tag ID 0x011a out of sequence in IFD1


However, if I edit the file using my app (with the patched library) then the JPEG does validate:


$ ./exiftool\(-k\).exe -validate -warning -error -a ../IMG_20171223_200655_2.jpg
-- press RETURN --

Validate                        : OK


So, it seems the patch/fix (which, while writing tags, checks if offsets are even or odd, and adds 1 and rounds up to an even byte length if odd) "repairs" the offsets, such that the file validates. Could you please have a look at the two files in question -- once again -- and see if you agree (that the second file looks better than the original one)?

The tags I've updated (added) are, EXIF UserComment, EXIF ImageDescription, and IPTC Caption/Abstract. And I think I've got the details right (according to the EXIF standard):





EXIFUserCommentUTF16 (big endian)FIELD_TYPE_UNDEFINEDPrepend string to write with ASCII bytes spelling out "UNICODE": byte[] ASCIIMarker = new byte[]{ 0x55, 0x4E, 0x49, 0x43, 0x4F, 0x44, 0x45, 0x00 };
EXIFImageDescriptionUS-ASCIIFIELD_TYPE_ASCIIAlways null-terminate string (with 0x00) regardless of odd/even length
IPTCCaption/Abstract(uses a different part of the Sanselan library -- the Photoshop part -- which I am less familiar with than the EXIF parts)


P.s. I'll email the files; the attach/upload function does not work.

Phil Harvey

Yes, ExifTool will fix the odd offsets and out-of-order tags.  These requirements are part of the TIFF6 specification, upon which EXIF is based.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

joakimk

But this file (the _2 version) has been edited by Apache Sanselan, not by ExifTool. I know ExifTool will handle standards and repair structure, but I guess I'm asking if you might verify that the patched version of Sanselan also has it's bytes in order? At least in this particular example (I emailed you the files I couldn't attach).

Phil Harvey

I see.  Yes, the edited file looks fine.  (No red in the htmlDump output means no odd offsets and no tag ordering problems)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).