Command to extract the most possible metadata

Started by greybeard, October 03, 2024, 02:11:53 AM

Previous topic - Next topic

greybeard

I'm looking for an Exiftool command that will extract the most possible metadata from an image file - but avoid the contents of embedded images, other binary components or similar.

Using the verbose option with a level 2 gets me close - it seems to include all the metadata tags I'm interested in but doesn't include the hex content of the tag value (which is important to me).

Level 3 gives me everything from level 2 with the addition of the hex values but appears to include potentially tens of thousands of lines I'm not interested in and are difficult to remove in a script.

Here is an example from a Fujifilm RAF file and this is the level of information I'm looking for :

  + [IFD0 directory with 13 entries]
  | 0)  Make = FUJIFILM
  |     - Tag 0x010f (9 bytes, string[9]):
  |         00b6: 46 55 4a 49 46 49 4c 4d 00                      [FUJIFILM.]
  | 1)  Model = X-H2S
  |     - Tag 0x0110 (6 bytes, string[6]):
  |         00c0: 58 2d 48 32 53 00                               [X-H2S.]

Unfortunately it also includes at least 25,000 lines with the following that keep repeating:

JPEG SOS
JPEG RST0
JPEG RST1
JPEG RST2
JPEG RST3
JPEG RST4
JPEG RST5
JPEG RST6
JPEG RST7
JPEG RST0
JPEG RST1
JPEG RST2
JPEG RST3
JPEG RST4
JPEG RST5
JPEG RST6

Is there a way of getting everything from verbose level 3 but without these lines?

Phil Harvey

Currently there is no way to remove these RST lines from the -v3 output.

In ExifTool 12.98 I'll change these to only output for -v4 or higher.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard


greybeard

Thanks - 12.98 works well.

I have another similar request - you can see here in this screenshot of my Exiftool based MAcOS app. I'm trying to extract all the information available about a tag. Currently we have the option to display the formatted value and the unformatted value (using the -n option) but not the completely unformatted value in hex format (is this correct?). I'm getting it here by parsing up the -v3 output but the code is prone to error.



The long(er) term objective is to gather everything available about a tag and display it in one place.

Phil Harvey

This is correct.  The hex representation of a tag value is only provided by -v3 or higher.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

It probably wouldn't be a solution, depending upon how you are extracting the data, but a simple helper function could be usued to convert the data to hex. But that would have to be applied either on a per tag bases using the -p (-printFormat) option or globally using -api Filter option.

I haven't tested this, but something like this in a config file
sub ToHex {
    s/(.)/sprintf '%04x', ord $1/seg;
}

Then use a wildcard to list all (I think) unknown tags
exiftool -U -api "Filter=ToHex" -*_0x* /path/to/files/
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

greybeard

Thanks StarGeek - there may be some solution here - but I think your regex converts the ascii characters to their hex equivalents rather than showing the original raw data in hex format.

For example I have a 4 byte tag (int32u) which Exiftool shows as 4290903955 but the raw data in hex format is 0x93ffc1ff - not only would you have to convert decimal to hex in your routine but you would also have to take into account the endianness of the way the values are stored.

Its not difficult to format when you have all the info stored in the tag but tricky to reverse engineer.

StarGeek

Quote from: greybeard on October 15, 2024, 10:46:04 AMThanks StarGeek - there may be some solution here - but I think your regex converts the ascii characters to their hex equivalents rather than showing the original raw data in hex format.

Quite possibly. This is what popped up when I did a quicky google search for some Perl code. And it would really depend upon how you're extracting the data as too whether it would require a separate pass or not.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

I should point out that even the hex representations shown in the -v3 output may not be actually what is stored in the file.  There may be a decryption that is done before this step.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

That's OK - the objective is to support those trying to identify unknown tags. The values would need to be decrypted.




greybeard

Is there a command - similar to the v3 command - that shows all three types of output?

For example with ExposureProgram:
- transformed values such as "Aperture-Priority AE"
- non-transformed values such as 3
- hex values such as 0x0300

There are various commands that will give me the first and second and v3 will give me the second and third but nothing (as far as I can see) that will give me all three.

Trying to combine the output of commands such as "exiftool -a -G1" and "exiftool -v3" to provide all three formats appears close to impossible - there doesn't seem to be a common unique identifier (or set of identifiers) for each tag that can be used for matching.

Matching the output from "exiftool -a -G1" and "exiftool -a -G1 -n" is easy enough to provide the first two formats but then matching to the -v3 output is the problem.

Phil Harvey

I would try something like this:

exiftool -j -H -l -v3 -all FILE

but it will still be tricky to match up the -v3 part with the JSON data.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

greybeard

Quote from: Phil Harvey on October 24, 2024, 10:54:36 AMI would try something like this:

exiftool -j -H -l -v3 -all FILE

but it will still be tricky to match up the -v3 part with the JSON data.

- Phil


Thanks - I didn't realise you could combine the -v3 with other parameters and get the combined output. That will be useful.

But as you say its tricky to combine v3 with other output (either json or non-json). I can get pretty close with the FujiFilm files that I'm used to but with Sony or Canon cr3 where there are lots of custom format groups looks imposing. I've tried to do it with a combination of tag name, hex tag id and group name but those groups names are difficult to reconcile between the different output formats.