Diff between entries with flags: list, bag, baglist

Started by ScannerBoy, October 06, 2024, 03:57:15 PM

Previous topic - Next topic

ScannerBoy

In trying to understand and properly display some metadata entries, I see in the 'Flags' field for the listx output both list & bag entries.
For example for:
<table name='HTML::dc' g0='HTML' g1='HTML-dc' g2='Document'>

<tag id='contributor' name='Contributor' type='?' writable='false' flags='[u]Bag,List[/u]' g2='Author'/>

While I can surmise what list & bags are (ordered versus unordered sets of data) that one item could be both has me wondering what that 'really" means ?
Or how one would display or allow editing for such an item?
Help :-)

Phil Harvey

From the ExifTool application documentation:

    The flags are formatted as a comma-separated list of the following possible
    values: Avoid, Binary, List, Mandatory, Permanent, Protected, Unknown and
    Unsafe (see the Tag Name documentation).  For XMP List tags, the list type
    (Alt, Bag or Seq) is added to the flags, and flattened structure tags are
    indicated by a Flattened flag with 'struct' giving the ID of the parent structure.


If this isn't clear, maybe it could be improved somehow.

ExifTool doesn't distinguish between the various types of XMP lists when writing.  It is up to the user to put them in the order they want if this is significant.

- Phil

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

The only comment on the related documentation would be to show why a given XMP entry would/could be considered both ordered (a list) and unordered (a bag) - that is at the root of my question.

My main interest in this is trying to display the metadata in property sheet in a way which makes display, and potentially editing, easier and more intuitive.

For instance for ordered lists, it ought to be possible to change the order without any possibility of mistyping the replacement and/or via copy & paste, losing or altering/mistyping the separators, etc, etc
Ideally, I suppose I would need to read and interpret all of the various metadata specs, but as I am depending on Exiftool to read and write data, I also am depending on its view/interpretation of the metadata. Because of this, I am reading the Exitool XML data and then extracting the necessary information from it.
IMO & FWIW, Exiftool is the most up-to-date and best maintained library of its kind for metadata. Hence it is also used as a foundational part of many apps which display metadata. This excellence is also most likely due to the non-Windows development environment it came from.

A related issue, for me, is to find example apps which show, read and write the data which would help me understand and then handle similar entries my app, which is mainly intended to handle metadata for genealogy work. As well, finding and exploring suitable candidate pictures for the various (correctly implemented) options/entries is a challenge.

I was very impressed with ExiftoolGUI by Bogdan Hrastnik and tried to re-invent/extend his app when he stopped working on it.
At the time I was mostly working under Windows and C++, but would now prefer to work with a more OS agnostic app using Python along with wxPython. The main initial motivation was to make the app portable between Win & Linux - mainly as a companion for Gramps.
Among the other reasons for moving away from Windows are the many issues related to text encoding. As it seems so much of the development code I have found ignores any encoding issues related to 'foreign' - read NON-ASCII - alphabets. And for genealogy work, the need to handle those is a given.
Unfortunately, an issue related to encoding also seems to be present in PyExiftool library, when it runs under Win (11) and right now has that part of the project stopped cold.

FrankB

Quote from: ScannerBoy on October 07, 2024, 02:18:02 PMI was very impressed with ExiftoolGUI by Bogdan Hrastnik and tried to re-invent/extend his app when he stopped working on it.
Have a look at Version 6:
https://github.com/FrankBijnen/ExifToolGui


Quote from: ScannerBoy on October 07, 2024, 02:18:02 PMAmong the other reasons for moving away from Windows are the many issues related to text encoding. As it seems so much of the development code I have found ignores any encoding issues related to 'foreign' - read NON-ASCII - alphabets. And for genealogy work, the need to handle those is a given.

I believe to have fixed all Encoding issues in ExifToolGui.

In ExifTooGui lists, or Bags, are displayed with a 'Separator char' configurable in Preferences. You can add, or delete an item from a list by prefixing it with a +, -.
More or less the same as with an ExifTool command.
You can not reorder the items. They usually appear in the order they were added.

Phil Harvey

Quote from: ScannerBoy on October 07, 2024, 02:18:02 PMThe only comment on the related documentation would be to show why a given XMP entry would/could be considered both ordered (a list) and unordered (a bag) - that is at the root of my question.

XMP Bag, Seq and Alt are just different types of lists.  ExifTool doesn't distinguish between these in its interface because the difference only affects the values that the user would write, not how they are written.  For the ExifTool interface, XMP Bag, Seq and Alt lists all follow the rules for an ExifTool List tag.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

@Phil: Thank you for the information. As I find more instances of data in those fields, it will help me understand things a bit more.
@FrankB: Thank you for taking on the job of updating and maintaining(?) Bogdan's very useful front-end. I will certainly use it as my reference and report any questions which might come up.
In fact, as part of my initial testing with your current version 6.3.5 I have run into some curious display of some of the IPTC  data: Supp Categories, Keywords,Headline & Caption Abstract.
I'll attach the image, though I have no information how that data was added. I am almost certain that it was added by myself, but the software used or process is lost in the dust of the past. However it was done, did not leave any traces in any field I would have expected.

Phil Harvey

The file a1.jpg looks to be a test of special characters in IPTC.  The characters are written in Windows Latin, but the CodedCharacterSet specifies UTF-8 so they won't display correctly.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

#7
Initially, that was my suspicion, but after I managed to open a DOS prompt and run Exiftool against that file, I get for IPTC:
[IPTC]          Application Record Version      : 4
[IPTC]          Urgency                         : 9 (user-defined priority)
[IPTC]          Keywords                        : 01.07.2013, ä ae Ä Ae ü ue Ü Ue ö oeÖ Oe ß ss
[IPTC]          Time Created                    : 09:28:32+00:00
[IPTC]          Digital Creation Date           : 2013:06:28
[IPTC]          Headline                        : ä -> ae ö -> oe ü -> ue Ä -> Ae Ö -> Oe Ü -> Ue ß -> ss
[IPTC]          Caption-Abstract                : ä -> ae ö -> oe ü -> ue Ä -> Ae Ö -> Oe Ü -> Ue ß -> ss

From that, it would seem that something gets last for the GUI display, at least for IPTC
Also, the GUI shows the UTF8 line several times.

For XMP Supplemental Categories, Description and Subject, the problem may well be character set encoding; I have no ideas on that problem.

EtGui-1.png

FrankB

FWIW:
I compared GUI V516 and V635. Looks the same to me.
compare_5_6_iptc.jpg

I dont get your results with my codepage (850, or 65001) It works with 1250, is that your codepage?
Anyway, try to write to IPTC and XMP in UTF8. I believe that is the default for Exiftool. Works best in my opinion.

K:\test\ScannerBoy>chcp
Active code page: 850

K:\test\ScannerBoy>exiftool -iptc:all a1.jpg
Coded Character Set             : UTF8
Application Record Version      : 4
Urgency                         : 9 (user-defined priority)
Keywords                        : 01.07.2013, õ ae ─ Ae ³ ue ▄ Ue ÷ oeÍ Oe ▀ ss
Time Created                    : 09:28:32+00:00
Digital Creation Date           : 2013:06:28
Headline                        : õ -> ae ÷ -> oe ³ -> ue ─ -> Ae Í -> Oe ▄ -> Ue ▀ -> ss
Caption-Abstract                : õ -> ae ÷ -> oe ³ -> ue ─ -> Ae Í -> Oe ▄ -> Ue ▀ -> ss

K:\test\ScannerBoy>chcp 1250
Active code page: 1250

K:\test\ScannerBoy>exiftool -iptc:all a1.jpg
Coded Character Set             : UTF8
Application Record Version      : 4
Urgency                         : 9 (user-defined priority)
Keywords                        : 01.07.2013, ä ae Ä Ae ü ue Ü Ue ö oeÖ Oe ß ss
Time Created                    : 09:28:32+00:00
Digital Creation Date           : 2013:06:28
Headline                        : ä -> ae ö -> oe ü -> ue Ä -> Ae Ö -> Oe Ü -> Ue ß -> ss
Caption-Abstract                : ä -> ae ö -> oe ü -> ue Ä -> Ae Ö -> Oe Ü -> Ue ß -> ss

K:\test\ScannerBoy>chcp 65001
Active code page: 65001

K:\test\ScannerBoy>exiftool -iptc:all a1.jpg
Coded Character Set             : UTF8
Application Record Version      : 4
Urgency                         : 9 (user-defined priority)
Keywords                        : 01.07.2013,  ae  Ae  ue  Ue  oe Oe  ss
Time Created                    : 09:28:32+00:00
Digital Creation Date           : 2013:06:28
Headline                        :  -> ae  -> oe  -> ue  -> Ae  -> Oe  -> Ue  -> ss
Caption-Abstract                :  -> ae  -> oe  -> ue  -> Ae  -> Oe  -> Ue  -> ss

K:\test\ScannerBoy>

ScannerBoy

On the topic of code pages, it seems I am lost.
If I run chcp in the Win 11 DOS box or at the PowerShell prompt, I get 437

running ExifTool against the same test file (apparently using cp 437) I get:
D:\pkg\python\wpET> exiftool -iptc:all D:\TestImages\jpgMetaTestFiles\Umlaut\images\a1.jpg
Coded Character Set             : UTF8
Application Record Version      : 4
Urgency                         : 9 (user-defined priority)
Keywords                        : 01.07.2013, Σ ae ─ Ae ⁿ ue ▄ Ue ÷ oe╓ Oe ▀ ss
Time Created                    : 09:28:32+00:00
Digital Creation Date           : 2013:06:28
Headline                        : Σ -> ae ÷ -> oe ⁿ -> ue ─ -> Ae ╓ -> Oe ▄ -> Ue ▀ -> ss
Caption-Abstract                : Σ -> ae ÷ -> oe ⁿ -> ue ─ -> Ae ╓ -> Oe ▄ -> Ue ▀ -> ss

Which is quite different from the output from the current ExifToolGUI running on the very same PC.
How/why does ETG display the data with, what seems to be, using a different code page?
Where is this specified or set up?

FrankB

@Moderators (Phil, Stargeek)
Maybe better to move this thread to ExifToolGui. It's getting Gui specific.

@ScannerBoy.
Gui doesn't use a specific Codepage. Internally it uses UTF-16, to interface with ExifTool UTF-8.
Maybe do a small test and write this string 'ä -> ae ö -> oe ü -> ue Ä -> Ae Ö -> Oe Ü -> Ue ß -> ss' using the WorkSpace to an IPTC tag. Should work.

ScannerBoy

Done and it worked perfectly. Thank you for updating ETG.
Just one question: I am curious why the new ETG does not default to replacing the EXIF software with the ExiftoolGUI string

FrankB

Glad it worked!

Quote from: ScannerBoy on October 08, 2024, 08:52:47 PMJust one question: I am curious why the new ETG does not default to replacing the EXIF software with the ExiftoolGUI string

Honoustly. I dont know. Version 516 did not do that, and it never crossed my mind to add that. But the same question could be asked for ExifTool itself. After all, that's the software that's writing it.

A few more (lengthy) remarks.
* The line CodedCharacterSet apppears multiple times.

You can control that with Options/Don't show duplicated tags. That in turn adds, or leaves out, the -a option to ExifTool.

* Codepages

Simply said, they only work for Non-Windows program's, like the CMD prompt. I dont want to rely on Codepages in GUI, because to open a file I would need to know with wich Codepage it was written.
If a file is written on a German computer, with Umlauts, I want it displayed the same on my Dutch computer.
It could even be that a file contains German and let's say Greek or Chinese characters. No Codepage can solve that.
A well-known codepage that can handle UTF-8 is 65001. That is used when creating CMD file. (See below)

That why I decided to use only UTF-16 internally and UTF-8 to interface with ExifTool. And this is not configurable.

* Changes made to GUI to support UTF-8

- The API options-CHARSET
FILENAME=UTF8
-CHARSET
UTF8
-API
WindowsWideFile=1
Are always passed to ExifTool. Not configurable.
The options CharSet Filename and WindowsWideFile are added to allow international characters in directory/file names.

- All commands/parameters are passed via an args file 'exiftool -@ <args>'. This (temporary) args file is written as UTF-8.
Passing parameters on the command can lead to problems. Probably because Windows uses a Codepage to encode them. An args file is sent as-is to ExifTool.

- Stdout/StdErr pipes, that contain ExifTool's output, are read as UTF-8 and converted to UTF-16 for internal use. Because of the API options we can be sure they are UTF-8

Have a look at the GitHub's source code. If you have any questions let me know.

* Using ExifTool direct, the Log Window and create CMD/PS files.

- This method may be useful when debugging problems.

- Select the file(s), Open the log window and type the ExifTool Direct command. '-iptc:all'
- The log window shows the commands, the output and the error.
et_direct.jpg

- Click on CMD prompt, or Powershell, to create the script. (Note you have the option 'CmdLine', that will pass the commands via parameters. But should only be used for ANSI)
If you have a look at the generated script, you'll see that it uses an Args file and Codepage 65001
cmd.jpg

- You can now execute the script.
output.jpg

Frank

Phil Harvey

Quote from: FrankB on October 09, 2024, 02:05:58 AM
Quote from: ScannerBoy on October 08, 2024, 08:52:47 PMJust one question: I am curious why the new ETG does not default to replacing the EXIF software with the ExiftoolGUI string

Honoustly. I dont know. Version 516 did not do that, and it never crossed my mind to add that. But the same question could be asked for ExifTool itself.

I think that most ExifTool users would prefer that ExifTool only writes what they tell it to, with as few side-effects as possible.

Quote* Changes made to GUI to support UTF-8

- The API options-CHARSET
FILENAME=UTF8
-CHARSET
UTF8
-API
WindowsWideFile=1

-CHARSET UTF8 is not necessary because this is the default.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ScannerBoy

@Phil: Your explanation makes perfect sense - on second thought :-)
@FrankB: Thank you for the detailed information on using ETG as well as for fixing the issues I had run into with the older version(s).
As handling alternate languages is crucial for my needs, I had more or less given up on ETG as it was. The new information will definitely make it my go-to app for understanding metadata. The possibility of using ETG as a sort of microscope/debugger to see the lower level details will be invaluable going forward.
For a number of reason, I still may have, or chose, to come up with a similar app for Linux, as the genealogy software I have decided on (Gramps) runs best on Linux and is also Python based. One of the (for me important) features that could be improved on it, is the display, recording and usage of image/record metadata in the places where it belongs - the records themselves.
Having  both a well maintained, reliable and accurate library and GUI front end for it is a definite boon since sharing images and other files on the LAN between differing OSes is easy enough :-)