Main Menu

Size of metadata

Started by grasdk, December 28, 2024, 10:54:24 AM

Previous topic - Next topic

grasdk

Is there a better way to compute metadata size of your files, than generating an XMP file and checking its size?

Current approach:

exiftool -r -ee -o %d%f.xmp .

I'm using digikam and it has a problem with metadata over 64KB (KiB), so I would like to find all my files which may be affected.

StarGeek

That would only be the XMP metadata, not any other type of metadata. For example, a very large thumbnail or preview image could be bigger than 64k.

I think the only way would be to use the -v3 (-verbose3) option and parse the output.

For example, this will list the APP1 blocks in a file, which would be the EXIF data (including the thumbnail I think) and the XMP data
C:\>exiftool -G1 -a -s -v3 y:\!temp\Test4.jpg |grep  "JPEG APP1 "
JPEG APP1 (23069 bytes):
JPEG APP1 (2904 bytes):

Going a bit further and adding awk to get a sum
C:\>exiftool -G1 -a -s -v3 y:\!temp\Test4.jpg  |grep  "JPEG APP1 " |awk -F'[()]' '{sum += $2} END {print sum}'
25973

Playing with it further, this will grab the total of all the jpeg APP* blocks
C:\>exiftool -G1 -a -s -v3 y:\!temp\Test4.jpg  |grep -E "JPEG APP[0-9]+ " |awk -F'[()]' '{sum += $2} END {print sum}'
26035

If you're on Windows, then grep and awk won't be available unless you install them. I use MSys2 ports, though there are probably ports included in Git if you use that.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

greybeard

Quote from: grasdk on December 28, 2024, 10:54:24 AMI'm using digikam and it has a problem with metadata over 64KB (KiB), so I would like to find all my files which may be affected.

What sort of metadata does it have a problem with?

I don't see anything in the documentation.

Is this related to the 64KB segment limit in EXIF metadata for jpegs?


grasdk

Thanks @Stargeek. I'm using both Debian, some Synology flavored Linux (on NAS) and WSL2, so got access to awk and grep :). Also it appears that for my particular problem the XMP segment is enough, but my question was broader, which you answered (hopefully for many people's benefits). :)

@greybeard: I'm referring to this specific comment in the bug-tracker of digikam: https://bugs.kde.org/show_bug.cgi?id=468830#c21

Quotethe size of the XMP JPEG segment. Exiv2 does not support more than 65535 bytes. Even though we do a lot with ExifTool, digiKam is based internally on Exiv2 for preparing the metadata.

StarGeek

Quote from: grasdk on December 28, 2024, 02:47:14 PMThanks @Stargeek. I'm using both Debian, some Synology flavored Linux (on NAS) and WSL2,

Upon further thought, I'd suggest
exiftool -b -XMP file.jpg | wc -c
or
exiftool -b -EXIF -XMP file.jpg | wc -c
C:\>exiftool -b -xmp y:\!temp\Test4.jpg |wc -c
   2875

C:\>exiftool -b -exif -xmp y:\!temp\Test4.jpg  |wc -c
  25938
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

grasdk

#5
Quote from: StarGeek on December 28, 2024, 04:53:55 PM
Quote from: grasdk on December 28, 2024, 02:47:14 PMThanks @Stargeek. I'm using both Debian, some Synology flavored Linux (on NAS) and WSL2,

Upon further thought, I'd suggest
exiftool -b -XMP file.jpg | wc -c
or
exiftool -b -EXIF -XMP file.jpg | wc -c
C:\>exiftool -b -xmp y:\!temp\Test4.jpg |wc -c
   2875

C:\>exiftool -b -exif -xmp y:\!temp\Test4.jpg  |wc -c
  25938


This was immensely helpful. I found two pictures, where digikam was able to handle one, but not the other. With the previous measurements (grepping for "JPEG APP1 ") there was no conclusion. The one that digikam could handle had more metadata than the other than it couldn't handle. But with your new examples here, there was a huge difference:

$ exiftool  -G1 -a -s -v3 digikam_ok.jpg | grep "JPEG APP1 "
JPEG APP1 (1226 bytes):
JPEG APP1 (6584 bytes):
JPEG APP1 (65533 bytes):
JPEG APP1 (340 bytes):

$ exiftool  -G1 -a -s -v3 digikam_not_ok.jpg | grep "JPEG APP1 "
JPEG APP1 (10150 bytes):
JPEG APP1 (64669 bytes):

$ exiftool -b -exif digikam_ok.jpg | wc -c
1220

$ exiftool -b -exif digikam_not_ok.jpg | wc -c
10144

$ exiftool -b -xmp digikam_ok.jpg | wc -c
6555

$ exiftool -b -xmp digikam_not_ok.jpg | wc -c
64640

$ exiftool -ee -b -xmp digikam_ok.jpg | wc -c
72278

$ exiftool -ee -b -xmp digikam_not_ok.jpg | wc -c
64640

Notice how the the size of the digikam_ok.jpg data increases when extracting embedded data with -ee, which indicates that the google pixel cameras, in use here, can store metadata differently.

So this is a good answer and conclusion to my post question. I'm on the hunt for more info, but I edited this post to conclude on the topic only.

StarGeek

Quote from: grasdk on December 28, 2024, 06:01:14 PMNotice how the the size of the digikam_ok.jpg data increases when extracting embedded data with -ee, which indicates that the google pixel cameras, in use here, can store metadata differently.

That seems really strange to me. A difference that large in XMP seems like it would be a thumbnail or something similar. And it's not part of the main metadata, it's from something embedded in the file. Is this a motion picture or have a depth map or something similar?

If it is, then this "extra" XMP data isn't something accessible to most standard programs.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

grasdk

I attached an example of the data when extracted to xmp to this post.

My best guess is that this information is used for "photo spheres", "panoramas" etc. and (by bug or intent) is also saved on ordinary pictures.

I'll see if I can manage to take some pictures that I can share here, that exhibits the behavior I wrote about above for further study.

grasdk

Managed to take a picture that exhibits the problem. Attached to this post

Difference in metadata with and without -ee:
$ exiftool -b -xmp PXL_20241231_035617643.jpg | wc -c
1033
$ exiftool -ee -b -xmp PXL_20241231_035617643.jpg | wc -c
64831

In digikam,  I cannot write tags to this photo. Exiftool is used for writing in my digikam config, but apparently some metadata is handled internally by exiv2, which is the cause of this problem. However, I have some interesting finding:

If I extact the metadata with exiftool, blank the big google camera tag hdrplusmakenote, then copy the info back into the file, the metadata is larger, BUT i can save tags to the photo in digikam afterwards. So the data must be stored differently after copying the info back in. Can anyone explain what I'm experiencing here. Exiftool must be saving the data differently than from the original:

$ exiftool -ee -o %d%f.xmp PXL_20241231_035617643.jpg
    1 image files created
$ exiftool -hdrplusmakernote= PXL_20241231_035617643.jpg
    1 image files updated
$ exiftool -b -xmp PXL_20241231_035617643.jpg | wc -c
1017
$ exiftool -ee -b -xmp PXL_20241231_035617643.jpg | wc -c
1557
$ exiftool -tagsfromfile %d%f.xmp "-xmp:all>all:all" -overwrite_original PXL_20241231_035617643.jpg
    1 image files updated
$ exiftool -b -xmp PXL_20241231_035617643.jpg | wc -c
5985
$ exiftool -ee -b -xmp PXL_20241231_035617643.jpg | wc -c
70857

I know it's a bit out of date, but using exiftool 13.02
$ exiftool -ver
13.02

Phil Harvey

I haven't had time to read through this entire thread, but this may help:

> exiftool ~/Downloads/PXL_20241231_035617643.jpg -validate -warning -a
Validate                        : 17 Warnings (all minor)
Warning                         : [minor] Odd offset for IFD0 tag 0x0110 Model
Warning                         : [minor] Odd offset for IFD0 tag 0x011a XResolution
Warning                         : [minor] Odd offset for IFD0 tag 0x011b YResolution
Warning                         : [minor] Odd offset for IFD0 tag 0x0131 Software
Warning                         : [minor] Odd offset for ExifIFD tag 0x9011 OffsetTimeOriginal
Warning                         : [minor] Odd offset for ExifIFD tag 0x9201 ShutterSpeedValue
Warning                         : [minor] Odd offset for ExifIFD tag 0x9202 ApertureValue
Warning                         : [minor] Odd offset for ExifIFD tag 0x9203 BrightnessValue
Warning                         : [minor] Odd offset for ExifIFD tag 0x9204 ExposureCompensation
Warning                         : [minor] Odd offset for ExifIFD tag 0x9205 MaxApertureValue
Warning                         : [minor] Odd offset for ExifIFD tag 0x9206 SubjectDistance
Warning                         : [minor] Odd offset for ExifIFD tag 0x920a FocalLength
Warning                         : [minor] Odd offset for ExifIFD tag 0xa404 DigitalZoomRatio
Warning                         : [minor] Odd offset for ExifIFD tag 0xa433 LensMake
Warning                         : [minor] Odd offset for IFD1 tag 0x011a XResolution
Warning                         : [minor] Odd offset for IFD1 tag 0x011b YResolution
Warning                         : [minor] XMP is missing xpacket wrapper

The last warning may be significant if you are having problems with XMP.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).