Exif Tag repository as SQLite database

Started by Wernfried, February 22, 2017, 03:28:55 AM

Previous topic - Next topic

Wernfried

Hi

Did somebody ever liked to search specific tag name or information but was not able fo find it on Phil's page?

Of course, you can use 'exiftool -listx' and start learning X-Path but I was looking for a more convenient solution. So I wrote this little perl script below which writes all information into a SQLite database. If you are familar with basic SQL you can query this database, examples:


c:\Temp>sqlite3 Exif.db ".headers on" "select * from v_tags where tag like '%bitspercomponent%'"
group_id|g0|g1|g2|tag|type|writable
IPTC::NewsPhoto|IPTC|IPTC|Image|BitsPerComponent|int8u|true
Jpeg2000::ImageHeader|Jpeg2000|Jpeg2000|Image|BitsPerComponent|int8u|false
Jpeg2000::Main|Jpeg2000|Jpeg2000|Image|BitsPerComponent|?|false

c:\Temp>sqlite3 Exif.db ".headers on" "select * from v_values_en where tag = 'FocusContinuous';"
group_id|g0|g1|g2|tag|type|writable|value|group_name|tag_name|value_name
Canon::CameraSettings|MakerNotes|Canon|Camera|FocusContinuous|int16s|true|0|Canon CameraSettings|Focus Continuous|Single
Canon::CameraSettings|MakerNotes|Canon|Camera|FocusContinuous|int16s|true|1|Canon CameraSettings|Focus Continuous|Continuous
Canon::CameraSettings|MakerNotes|Canon|Camera|FocusContinuous|int16s|true|8|Canon CameraSettings|Focus Continuous|Manual


If you prefer a GUI with a database browser there is a bunch of free SQLite database managers available.
Feel free to execute/copy/distribute this perl script for your own use.



use strict;
use XML::Simple;
use Image::ExifTool::TagInfoXML;
use DBI;

my $db_file = shift @ARGV;
die "Specify filename for SQLite database, e.g. 'c:/Temp/Exif.db'" unless ( $db_file );

print "\nCreate tag information database in XML format\n";
open(my $oldout, ">&STDOUT") or die "Can't dup STDOUT: $!";
close STDOUT;

my $xml_string;
open(STDOUT, '>', \$xml_string) or die "Can't open STDOUT: $!";
Image::ExifTool::TagInfoXML::Write(undef);
close STDOUT;
open(STDOUT, ">&", $oldout) or die "Can't dup \$oldout: $!";

print "Parse XML information\n";
my $xml = XMLin($xml_string, ForceArray => 1 );

print "Open SQLite database '$db_file' and create schema\n";
my $sqlite = DBI->connect("dbi:SQLite:dbname=$db_file", undef, undef, { RaiseError => 1 }) or die "Cannot open SQLite DB: $!";
CreateSchema();

my ( $group_pk, $tag_pk, $value_pk) = (0,0,0);
my ( %tag_lang, $sql) ;

my $insert_group = $sqlite->prepare("insert into groups (name, g0, g1, g2, ID) values (?,?,?,?,?)");
my $insert_group_desc = $sqlite->prepare("insert into group_desc (lang, text, group_ID) values (?,?,?)");
my $insert_tag = $sqlite->prepare("insert into tags (name, tag_id, type, writable, tag_index, group_ID, ID) values (?,?,?,?,?,?,?)");
my $insert_tag_desc = $sqlite->prepare("insert into tag_desc (lang, text, tag_ID) values (?,?,?)");
my $insert_val = $sqlite->prepare("insert into value (value, value_index, tag_ID, ID) values (?,?,?,?)");
my $insert_val_desc = $sqlite->prepare("insert into value_desc (lang, text, value_ID) values (?,?,?)");

foreach ( sort keys %{$$xml{table}}) {
print "Insert tags for $_\n";
$sqlite->do("BEGIN TRANSACTION");
my %group = %{$$xml{table}{$_}};
$group_pk++;
$insert_group->execute($_, $group{g0}, $group{g1}, $group{g2}, $group_pk);
foreach my $desc ( @{$group{desc}} ) {
$insert_group_desc->execute($$desc{lang}, $$desc{content}, $group_pk);
}
foreach ( keys %{$group{tag}}) {
my %tag = %{$group{tag}{$_}};
$tag_pk++;
$insert_tag->execute($_, $tag{id}, $tag{type}, $tag{writable}, $tag{index}, $group_pk, $tag_pk);
foreach my $desc ( @{$tag{desc}} ) {
$insert_tag_desc->execute($$desc{lang}, $$desc{content}, $tag_pk);
$tag_lang{$$desc{lang}}++;
}
foreach my $values ( @{$tag{values}} ) {
foreach ( keys %{$$values{key}} ) {
my %value = %{$$values{key}{$_}};
$value_pk++;
$insert_val->execute($_, $value{index}, $tag_pk, $value_pk);
foreach my $desc ( @{$value{val}} ) {
$insert_val_desc->execute($$desc{lang}, $$desc{content}, $value_pk);
}
}
}
}
$sqlite->do("COMMIT");
}

foreach my $lang ( keys %tag_lang ) {
print "Create views for '$lang'\n";
$sql =
"create view IF NOT EXISTS v_tags_${lang} as
select g.name as group_id, g0, g1, g2, t.name as tag, t.type, writable,
gl.text as group_name, tl.text as tag_name
from groups g
join tags t on g.ID = t.group_ID
left outer join group_desc gl on g.ID = gl.group_ID and gl.lang = '${lang}'
left outer join tag_desc tl on t.ID = tl.tag_ID and tl.lang = '${lang}'";
$sqlite->do($sql);

$sql =
"create view IF NOT EXISTS v_values_${lang} as
select g.name as group_id, g0, g1, g2, t.name as tag, t.type, writable, v.value,   
gl.text as group_name, tl.text as tag_name, vl.text as value_name
from groups g
join tags t on g.ID = t.group_ID
join value v on t.ID = v.tag_ID
left outer join group_desc gl on g.ID = gl.group_ID and gl.lang = '${lang}'
left outer join tag_desc tl on t.ID = tl.tag_ID and tl.lang = '${lang}'
left outer join value_desc vl on v.ID = vl.value_ID and vl.lang = '${lang}'";
$sqlite->do($sql);

}

$sqlite->disconnect;

print "\nAll done\n";


sub CreateSchema {

$sqlite->do("drop view IF EXISTS v_tags");
$sqlite->do("drop view IF EXISTS v_values");

$sqlite->do("DROP TABLE IF EXISTS value_desc");
$sqlite->do("DROP TABLE IF EXISTS value");
$sqlite->do("DROP TABLE IF EXISTS tag_desc");
$sqlite->do("DROP TABLE IF EXISTS tags");
$sqlite->do("DROP TABLE IF EXISTS group_desc");
$sqlite->do("DROP TABLE IF EXISTS groups");

my $sql =
"CREATE TABLE groups (
name TEXT NOT NULL,
g0 TEXT,
g1 TEXT,
g2 TEXT,
ID INTEGER NOT NULL CONSTRAINT group_pk PRIMARY KEY ON CONFLICT FAIL
)";
$sqlite->do($sql);
$sqlite->do("CREATE UNIQUE INDEX group_pk ON groups (ID)");


$sql =
"CREATE TABLE group_desc (
lang TEXT NOT NULL,
text TEXT,
group_ID INTEGER NOT NULL,
FOREIGN KEY(group_ID) REFERENCES groups(ID),
CONSTRAINT group_lang_uk UNIQUE (group_ID, lang) ON CONFLICT FAIL
)";
$sqlite->do($sql);
$sqlite->do("CREATE INDEX group_fk ON group_desc (group_ID)");

$sql =
"CREATE TABLE tags (
name TEXT,
tag_id TEXT NOT NULL,
type TEXT,
writable TEXT,
tag_index INTEGER,
group_ID INTEGER NOT NULL,
ID INTEGER NOT NULL CONSTRAINT tag_pk PRIMARY KEY ON CONFLICT FAIL,
FOREIGN KEY(group_ID) REFERENCES groups(ID),
CONSTRAINT tag_uk UNIQUE (group_ID, tag_id, tag_index) ON CONFLICT FAIL
)";
$sqlite->do($sql);
$sqlite->do("CREATE UNIQUE INDEX tag_pk ON tags (ID)");

$sql =
"CREATE TABLE tag_desc (
lang TEXT NOT NULL,
text TEXT,
tag_ID INTEGER NOT NULL,
FOREIGN KEY(tag_ID) REFERENCES tags(ID),
CONSTRAINT tag_lang_uk UNIQUE (tag_ID, lang) ON CONFLICT FAIL
)";
$sqlite->do($sql);
$sqlite->do("CREATE INDEX tag_fk ON tag_desc (tag_ID)");

$sql =
"CREATE TABLE value (
value TEXT NOT NULL,
value_index INTEGER,
tag_ID INTEGER NOT NULL,
ID INTEGER NOT NULL CONSTRAINT value_pk PRIMARY KEY ON CONFLICT FAIL,
FOREIGN KEY(tag_ID) REFERENCES tags(ID),
CONSTRAINT value_uk UNIQUE (tag_ID, value, value_index) ON CONFLICT FAIL
)";
$sqlite->do($sql);
$sqlite->do("CREATE UNIQUE INDEX value_pk ON value (ID)");

$sql =
"CREATE TABLE value_desc (
lang TEXT NOT NULL,
text TEXT,
value_ID INTEGER NOT NULL,
FOREIGN KEY(value_ID) REFERENCES value(ID),
CONSTRAINT value_lang_uk UNIQUE (value_ID, lang) ON CONFLICT FAIL
)";
$sqlite->do($sql);
$sqlite->do("CREATE INDEX value_fk ON value_desc (value_ID)");

$sql =
"create view v_tags as
select g.name as group_id, g0, g1, g2, t.name as tag, t.type, writable
from groups g
join tags t on g.ID = t.group_ID";
$sqlite->do($sql);

$sql =
"create view v_values as
select g.name as group_id, g0, g1, g2, t.name as tag, t.type, writable, v.value 
from groups g
join tags t on g.ID = t.group_ID
join value v on t.ID = v.tag_ID";
$sqlite->do($sql);

}



Best Regards
Wernfried

Hayo Baan

Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

#2
Hi Wernfried,

This looks very useful.

Problem:  When I run your script I get a pile of warnings:

Warning: <tag> element has non-unique value in 'name' key attribute

then two of these errors:

DBD::SQLite::db do failed: no such column: v.value

and when I try to execute your commands, this happens:

> ls -l Exif.db
-rw-r--r--  1 phil  staff  22528 Feb 22 07:14 Exif.db
> sqlite3 Exif.db ".headers on" "select * from v_tags where tag like '%bitspercomponent%'"
> sqlite3 Exif.db ".headers on" "select * from v_values_en where tag = 'FocusContinuous';"
Error: no such table: v_values_en
> sqlite3 --version
3.15.2 2016-11-28 19:13:37 bbd85d235f7037c6a033a9690534391ffeacecc8


- Phil

Edit:  Ah.  I found the problem.  It was due to a bug in the SMF forum (which I reported a long time ago, but hasn't been fixed) which converts spaces to Unicode non-breaking spaces in code blocks.  Perl doesn't like these.  Once I fixed this, it worked fine (but I still get the "non-unique value" warnings).  The database is now 14 MB instead of 22 kB, which is much more reasonable.

Edit2: I find that if I set "KeyAttr => { }" in the options to XMLin, the warnings go away.  But unfortunately then the returned XML isn't parsed properly by the script.

Edit3: I have modified the script to parse the output properly with "KeyAttr => { }".  Now the script exports all tags, even if they have the same name.  (I hope I did this correctly -- I don't have much SQL experience.)  This fixes the problems with the warnings, and speeds up the script immensely (apparently the warnings slow it down for some reason).  Also, I have added a "flags" column to the tag information.  Attached is my current version of the script.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Wernfried

#3
Hi

Ok, I modified my perl to export also the "flags". In contradiction to Phil's version it always export the whole database not a single group. See attached file.


Image::ExifTool::TagInfoXML::Write(undef, undef, Flags => 1);


And I added column tag_id to the views

Update: Also changed column names to be "less ambiguous".

Best Regards
Wernfried

Update 2: But it also works when I put a group, but I cannot reproduce your problem, i.e. I don't get any "non-unique value" warning

Phil Harvey

You may not get the warnings because of a different XML::Simple version.  Mine is version 2.20.  Regardless, I don't think that your script will export tags with the same name in the same table.  This is the major change in my version (adding the "KeyAttr => { }" option to the XMLin call, and the necessary changes to the parsing of the returned tables).

Putting in the group has nothing to do with this.  I added this as an option just so I could generate a smaller test database more quickly.

- Phil

BTW, you have some serious SQL talents.  Impressive.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Wernfried

Quote from: Phil Harvey on February 22, 2017, 11:36:57 AM
You may not get the warnings because of a different XML::Simple version.  Mine is version 2.20.  Regardless, I don't think that your script will export tags with the same name in the same table.  This is the major change in my version (adding the "KeyAttr => { }" option to the XMLin call, and the necessary changes to the parsing of the returned tables).

Hi Phil

I have also version 2.20 but I don't get this warning. But I get error when I use use XML::Simple qw(:strict); which seems to be even more secure.
But you are right, non-unique tag names (i.e. tag 'id') are not inserted, your code is correct!

The attached script considers your changes and also my minor changes on database schema. I think now it is complete.

Some readable examples:


c:\Temp>sqlite3 Exif.db
SQLite version 3.12.0 2016-03-29 10:14:15
Enter ".help" for usage hints.
sqlite> .headers on
sqlite> .mode column
sqlite> select * from v_tags where tag like '%bitspercomponent%';
group_id         g0          g1          g2          tag_id      tag_index   tag               type        writable    flags
---------------  ----------  ----------  ----------  ----------  ----------  ----------------  ----------  ----------  ----------
IPTC::NewsPhoto  IPTC        IPTC        Image       135                     BitsPerComponent  int8u       true
Jpeg2000::Image  Jpeg2000    Jpeg2000    Image       10                      BitsPerComponent  int8u       false
Jpeg2000::Main   Jpeg2000    Jpeg2000    Image       bpcc                    BitsPerComponent  ?           false
sqlite> .quit


c:\Temp>sqlite3 Exif.db ".headers on" ".mode column" "select * from v_values_en where tag = 'FocusContinuous'"
group_id               g0          g1          g2          tag_id      tag_index   tag              type        writable    flags       value       group_name            tag_name          value_name
---------------------  ----------  ----------  ----------  ----------  ----------  ---------------  ----------  ----------  ----------  ----------  --------------------  ----------------  ----------
Canon::CameraSettings  MakerNotes  Canon       Camera      32                      FocusContinuous  int16s      true        Permanent   0           Canon CameraSettings  Focus Continuous  Single
Canon::CameraSettings  MakerNotes  Canon       Camera      32                      FocusContinuous  int16s      true        Permanent   1           Canon CameraSettings  Focus Continuous  Continuous
Canon::CameraSettings  MakerNotes  Canon       Camera      32                      FocusContinuous  int16s      true        Permanent   8           Canon CameraSettings  Focus Continuous  Manual


Best Regards
Wernfried

Phil Harvey

Hi Wernfried,

Great, thanks!  Your version is now looking very similar to mine (I had already changed some variable names in the same way you did in this latest version).

The only thing that I think should be improved is to add the ExifTool version number to the database.

As well, a minor change to a print statement to avoid a warning about an undefined variable:

print "\nCreating tag information database in XML format", $grpName ? " for group $grpName" : '' , "\n";

I am thinking about making this available from the Exiftool home page, but I will need to document it first.

- Phil

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Wernfried

Quote from: Phil Harvey on February 23, 2017, 07:27:39 AM
The only thing that I think should be improved is to add the ExifTool version number to the database.

Hi Phil

No problem, I will add the table. Is there a direct way to retrieve the version number of Exiftool? Otherwise I have to "grep" it from XML, because XMKL::Simple does not provide access to XML comments.

Wernfried

Wernfried

Hi Phil

I use my $version = qx/exiftool -ver/;, I assume your know better way - if available.

Please note, unlike all other tables and views table 'version' is not dropped in case you use an existing SQLite file, I think this makes sense. Otherwise it would  be quite obvious how to drop it.

How does the documentation should look like? Perhaps I can give some support on that.

Best Regards
Wernfried

Phil Harvey

Hi Wernfried,

Thanks for adding the version number.

The way to get the ExifTool version is like this:

use Image::ExifTool;
my $ver = $Image::ExifTool::VERSION;


I was just thinking that the table schema and views need documenting, but I think I can do that myself.

I'm also thinking about renaming some variables to make them more consistent with ExifTool terminology ("groups" renamed to "tables", etc), although I realize that it conflicts somewhat with SQL terminology.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Wernfried

Hi Phil

"GROUP", "TABLE", "VALUES" are reserved words in SQL, you should not use them for any objects like table or column names. Otherwise you would need some dirty tricks to do that.
However "GROUPS" or "TABLES" or "VALUE" should be no problem.

Also note, expressions in SQLite are not case-sensitive (like most other database engines), i.e. it does not matter whether you use "Table" or "table" or "TABLE" - they all will raise a "Syntax error"

Wernfried



Phil Harvey

Hi Wernfried,

Thanks.  Yes, I have been keeping a close eye on the SQL reserved keywords.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Wernfried

Since it is an SQLite Database (not SQL Server), better refer to this list: SQLite Keywords

Wernfried

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Just found a minor problem:  The value_index wasn't being set properly.  Fix this by replacing "$value{index}" with "$$values{index}" in the script.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I've improved the database by adding groups to the tags (since they may be different from the table-level groups), and added a link from the Programming section of the ExifTool home page to download the SQLite database and script (called tagInfoSql).

Thanks again!

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Wernfried