Keywords are not written or read properly

Started by bugmenot, December 22, 2021, 12:22:35 AM

Previous topic - Next topic

bugmenot

I'm using exiftool 12.38 on Windows 10 and am noticing some inconsistencies with JPG Keywords.  I've tried tagging images using exiftool and the Windows GUI (adding them in this dialog).  The results are inconsistent.  My end goal is to read the Keywords in a PHP application, and at this point I've not ruled out that PHP is working correctly either.

Here's a test script that can demonstrate the problem:

<?php
// Create test image
$im imagecreate(400300);
imagejpeg($im'test.jpg');
imagedestroy($im);

// Add tag
echo `exiftool -overwrite_original -keywords+="delete" test.jpg`;

// Load tags with PHP X extension
ini_set('exif.decode_unicode_motorola''UCS-2LE');
$data exif_read_data('test.jpg');
if(
array_key_exists('Keywords'$data)) {
var_dump($data['Keywords']);
} else {
echo "No keywords found on test.jpg!\n";
}

// Read tags with exiftool
echo `exiftool -keywords test.jpg`;

if(
file_exists('windows.jpg')) {
$data exif_read_data('windows.jpg');
if(array_key_exists('Keywords'$data)) {
var_dump($data['Keywords']);
} else {
echo "No keywords found on windows.jpg!\n";
}

// Read tags with exiftool
var_dump(`exiftool -keywords windows.jpg`);
}


Explanation of what's going on:

  • PHP generates an image using the GD extension.  Theoretically anyone running this script would get the exact same output, which is why I put it in there
  • PHP starts a shell (cmd in this case) and calls exiftool to add a delete tag.  This literal word behaves worse than other words, go figure
  • Then, the exif data is read with PHP's exif function
  • PHP starts another shell to call exiftool to read the data
  • If a file called windows.jpg exists in the current working directory, PHP reads that with the exif function
  • Another shell is opened to have exiftool read tags on windows.jpg

windows.jpg is a file I created in Paint.  Then I added a delete tag with Windows Explorer.

Here is the output on my system:
    1 image files updated
No keywords found on test.jpg!
Keywords                        : delete
string(6) "delete"
Warning: [minor] Fixed incorrect URI for xmlns:MicrosoftPhoto - windows.jpg
NULL


When PHP generates a JPG and uses exiftool to add a keyword, PHP can't find the keyword but exiftool can read it.
When Windows writes the keywords, exiftool can't find the keywords but PHP's exif function can.



FWIW, the Windows GUI always shows all tags, regardless of which application added them.  Windows will show tags that are "invisible" to exiftool and tags that are not visible to PHP's exif functions.
https://discorduserinfo.pro/

StarGeek

Looking over the php.net entry on exif_read_data, I don't see anything that indicates that exif_read_data can read any metadata other than EXIF.  Keywords is an IPTC tag.  I'm guessing what you need to use is iptcparse.

Use the command in FAQ #3 to find where the tags you're writing are actually located.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

bugmenot

Forgive my ignorance, but what's the difference between Keywords and XPKeywords?

Here's the actual source of PHP's parser:
https://github.com/php/php-src/blob/master/ext/exif/exif.c#L3416-L3422
https://discorduserinfo.pro/

StarGeek

Keywords is an IPTC tag which is part of the older IPTC IIM standard.  It has its own location in a file.  In jpegs it is in the APP13 block.  It is a list type tag where ever entry is completely separate from the others which, I'm guessing, the iptcparse function would return as an array (verified).  Even though it is an older standard, it still has widespread support among programs.

The XPKeywords is a Microsoft tag introduced with Windows XP.  It is located in the EXIF group.  It is a semicolon separated string tag, which you would have to parse out the entries yourself.  It has almost no support in programs beyond the Windows OS.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

bugmenot

In that case, this seems to be a problem of competing standards so I'll call this not a bug.  This code will read both types, and later the two lists can be merged:

<?php
$data 
= @exif_read_data($filename);
var_dump($data['Keywords'] ?? ''); //semicolon-delimited string
getimagesize($filename$info);
print_r(iptcparse($info['APP13'])['2#025']); //array


Thanks for the help StarGeek
https://discorduserinfo.pro/