not all IPTC data will be extracted - some is missing

Started by dbuchhorn, September 05, 2018, 08:08:55 AM

Previous topic - Next topic

dbuchhorn

Hi Phil,

I have some images with IPTC data. But the data is not written as described in the IPTC standard. The Tag 2:15 is included more than one times. In this case the last one will win (all others are gone). Also some non standard tags are in for example the tag 2:244. This tag is three time in and here also the last one will win. I already try the '-a' option but I can't see/extract the missing data. Is there another option I can try or did I found a bug?

Thanks

Dirk

StarGeek

Can you post an example command line and example file where this happens?
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

dbuchhorn

I create a test image. Only the iptc tags 2:15 and 2:244 are included. Both tags have three different values.
get the ipct data with
>exiftool.exe -g -j -H -l -a -u -IPTC:all test.jpg

Result:
[{
  "SourceFile": "test.jpg",
  "IPTC": {
    "Category": {
      "desc": "Category",
      "id": "0x000f",
      "val": "C"
    },
    "IPTC_ApplicationRecord_244": {
      "desc": "IPTC Application Record 244",
      "id": "0x00f4",
      "val": "entry_3"
    }
  }
}]


Expected:
[{
  "SourceFile": "test.jpg",
  "IPTC": {
    "Category": {
      "desc": "Category",
      "id": "0x000f",
      "val": ["A","B","C"]
    },
    "IPTC_ApplicationRecord_244": {
      "desc": "IPTC Application Record 244",
      "id": "0x00f4",
      "val": ["entry_1","entry_2","entry_3"]
    }
  }
}]


dbuchhorn

If the xml output format (-X) or text output is used then all information is included. So it belongs only to the json output format.

Phil Harvey

The JSON object names must be unique, so you only see one with -g.  You can make sure they all have unique names by using something like -g0:4.  To get the output you expected, the tag must be defined as a list-type tag, which it isn't because it has no definition. You can do this with a user-defined tag if you want.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dbuchhorn

Thank you for your reply. I use the copy groups this fits my requirements.

I was very confused that the output is so different between text/xml and json with the same options. So I look at the exiftool source code and I think I know now why it is currently not possible (my last perl work experiences are 20 years ago, so some I can only guess).
The output is done in a foreach loop over all found tags. Only defined "list" tags have an array as value, all other tags have only a single value. If there are duplicates for non list tags then this tags are included n times in the found tag list. If the output format is text or xml duplicates can be printed, but for json the attribute name must be unique and each tag can only be written once (here the last tag is used).

Now I think about it if there is any effect if json output with the duplicate option is used (without the -g0:4 option)? If not, is it possible to do for instance a pre processing (if json and duplicate option is used only) and convert duplicates to a single tag with an array as value? In this case the value type will change for non list tags but the user want json output and duplicates.
I'm not familiar with the exiftool internals and I don't know what effects these changes will have. But if this can be changed then the output will contain the same data no matter what format is used.

Thanks for your time

Dirk

Phil Harvey

Hi Dirk,

Sorry for the delay in responding.

I don't like the idea of ExifTool converting duplicates to list-type tags for the JSON output.  Among other problems, this will result in some tags being list-type in the JSON output but not in other formats.  The fact that duplicates are suppressed for JSON is simply a limitation of JSON that may be avoided with -g4.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).