Import csv with multiple lines per filename (migration from pixelpost to piwigo)

Started by Brettleger, January 11, 2019, 06:30:19 PM

Previous topic - Next topic

Brettleger

Dear (more experienced) exiftool users,

I experience an issue with importing metadata ("EXIF:UserComment" and "IPTC:keywords") from a .csv-file that has multiple lines for one filename. It looks like only the very last line is in the file is recognized/ processed.

An example to demonstrate this:

[At the start, both fields are empy and that is just fine]
MacBook-Pro:images moritz$ exiftool -UserComment -keywords -a  test.jpg
MacBook-Pro:images moritz$


[Setting the desired end state manually with exiftools works:]
exiftool -UserComment="My test title" -keywords="START" -a  test.jpg
    1 image files updated
MacBook-Pro:images moritz$ exiftool -keywords+="END" -a  test.jpg
    1 image files updated
MacBook-Pro:images moritz$ exiftool -UserComment -keywords -a  test.jpg
User Comment                    : My test title
Keywords                        : START, END


Working my way through the exiftool documentation, I thought import the following .csv-file would create the identical result:
MacBook-Pro:images moritz$ cat test.csv
SourceFile,UserComment,Keywords
test.jpg,My test title,START
test.jpg,My test title,END
MacBook-Pro:images moritz$


However, the result is different:
MacBook-Pro:images moritz$ exiftool -UserComment -keywords -a -csv+=test.csv ./
Ignored superfluous tag names or invalid options: -UserComment ...
    1 directories scanned
    1 image files updated
MacBook-Pro:images moritz$ exiftool -UserComment -keywords -a  test.jpg
User Comment                    : My test title
Keywords                        : END


I thought the "-a" switch enables the processing of duplicate entries. What am I doing wrong?

Background-information (for Google and whoever is interested):
In ran a pixelpost photo gallery on a shared server and the hosting company upgraded to a newer PHP version which broke the pixelpost installation. Pixelpost is no longer maintained and I neither have time nor expertise to fix the code so I decided to migrate to piwigo.

I invested quite some time to assign titles and assign categories to all photos. Piwigo neatly reads EXIF and IPTC metadata during photo import (see https://piwigo.org/doc/doku.php?id=user_documentation:metadata), so I want to retrieve that metadata from the pixelpost database and store in the photos' metadata before import into piwigo.

The following SQL-query generates a table with filenames, titles and tags/keywords for all photos:
SELECT pixelpost.image,pixelpost.headline,tags.tag FROM pixelpost LEFT JOIN tags ON tags.img_id=pixelpost.id
That result can easily be exported into a .csv and adding the header should be easy. However, for photos with multiple tags/keywords, multiple lines are created, so here I am.

Any help appreciated,

Brettleger

StarGeek

Quote from: Brettleger on January 11, 2019, 06:30:19 PM
I thought the "-a" switch enables the processing of duplicate entries. What am I doing wrong?

From the docs on the -a (duplicates) option
   Allow (-a) or suppress (--a) duplicate tag names to be extracted. (emphasis mine).

The reason that you get the Ignored superfluous tag names or invalid options: -UserComment is because you are writing tags from a CSV file and the -UserComment format is to extract tags (see the -TAG option).

As for dealing with duplicate file name entries, I'm not sure at the moment on how to handle that.  I'll do some testing unless Phil comes up with the answer first.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

StarGeek

I can't come up with anything pure exiftool to add all the tags when they are on separate rows like that.  But you might take a look at loading the csv into a spreadsheet and looking for a way to combine them.  For example, if you load the csv into google docs, it looks like you can merge and combine the duplicate rows.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

The way ExifTool reads the CSV, the second entry for a file will override the first.  There is no option to change this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Brettleger

Quote from: StarGeek on January 11, 2019, 07:38:19 PM
I can't come up with anything pure exiftool to add all the tags when they are on separate rows like that.  But you might take a look at loading the csv into a spreadsheet and looking for a way to combine them.  For example, if you load the csv into google docs, it looks like you can merge and combine the duplicate rows.

I agree that modifying the .cvs-file to merge the rows referring to the same filename is doable, e.g. using awk. For my example case, the may result in something like:
MacBook-Pro:images moritz$ cat test.csv
SourceFile,UserComment,Keywords
test.jpg,My test title,START END
MacBook-Pro:images moritz$ 


Importing this .csv file into exiftool works and both words are read and applied, however they are recognized as one single keyword, not two separate entries. See:
MacBook-Pro:images moritz$ exiftool -a -csv=test.csv ./
    1 directories scanned
    1 image files updated
MacBook-Pro:images moritz$ exiftool -UserComment -keywords  test.jpg
User Comment                    : My test title
Keywords                        : START END


For my purposes, I need the same behaviour as with the TAG+=VALUE option, i.e.
MacBook-Pro:images moritz$ exiftool -keywords+="The very end"  test.jpg
    1 image files updated
MacBook-Pro:images moritz$ exiftool -UserComment -keywords  test.jpg
User Comment                    : My test title
Keywords                        : START END, The very end


No surprise that adding a colon "," to the .csv doesn't help, as it is the field-separator:
MacBook-Pro:images moritz$ cat test.csv
SourceFile,UserComment,Keywords
test.jpg,My test title,START, END
MacBook-Pro:images moritz$ exiftool -a -csv=test.csv ./
    1 directories scanned
    1 image files updated
MacBook-Pro:images moritz$ exiftool -UserComment -keywords  test.jpg
User Comment                    : My test title
Keywords                        : START
MacBook-Pro:images moritz$


Can I teach exiftool to either use a different field separator when reading the .csv-file or define a custom character as separator when interpreting the entries for the keywords in the .csv?

If that's too complicated, the workaround might be to use awk to invoke exiftool again and again for each line in the .csv. Looks feasible, but certainly is a mess in terms of performance.

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Brettleger on January 12, 2019, 08:41:26 AM
For my purposes, I need the same behaviour as with the TAG+=VALUE option, i.e.

The same behavior isn't possible with a csv file and as Phil says, you need to use the -sep option.  This is mentioned in the docs on the -csv option:
      List-type tags are stored as simple strings in a CSV file, but the -sep option may be used to split them back into separate items when importing.

I will point out that depending upon your data, you may need to take it a step further.  If your keywords have a space it them, then you need to use a different separator.  For example, if a key word was "John Smith", then you couldn't use space as a separator.

Since you are familiar with awk, the safest thing to do is pick a separator that you know isn't part of any keywords.  The separator can be more than one character.  I'm fond of double hashtag -sep ## for example.  Then encase the keyword string in double quotes.

Examples:  Given keywords of "John Smith" and "Jane Doe", the final string in the csv file could be "John Smith,Jane Doe", using a comma as a separator -sep ,.  Commas can be included in a cell in a csv file as long as they are embedded between double quotes.   Given a more complicated "Smith, John" and "Doe, Jane" (keywords with commas), you would have to pick a different separator, "Smith, John##Doe, Jane and -sep ##.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Brettleger

Thank you StarGeek and Phil. It was a little more complex than expected, but finally I managed to close the gap between mySQL and exiftool using awk.

So here is how I transferred my photo gallery from pixelpost to piwigo while preserving titles and keywords/ tags.

First one has to extract the associated title and tags from the pixelpost mySQL-database. The following statement extracts filename, title and tag for each file in the gallery:

SELECT pixelpost.image,pixelpost.headline,tags.tag FROM pixelpost LEFT JOIN tags ON tags.img_id=pixelpost.id ORDER BY pixelpost.image ASC

Notes:

  • If you configured a database-prefix in pixelpost (variable $pixelpost_db_prefix in pixelpost.php) you need to add that prefix also to all database names in the SELECT statement above.
  • I couldn't come up with a SQL statement that exports multiple tags per file in the same line. In such case, the SELECT statement prints out multiple lines per file, one with each tag assigned. The ORDER BY statement ensures that those lines are printed consecutively (this is needed for the awk script down below to work correctly).
The output of the SELECT statement needs to be exported in .csv format (Pixelpost_Titles+Tags.csv in this example) . There are multiple ways to achieve this; I found phpMyAdmin quite useful.

While exiftool can import .csv-files directly, the output from mySQL does not have the right format. This can be corrected with the following awk-script (I am by far not an awk expert, so my apologies if this looks ugly. But hey, it works. At least for me):

BEGIN {
#exiftool expects tagnames as fieldheaders in first row
print "SourceFile,UserComment,Keywords"
#.csv --> use "," as separator
FS = ","
#doesn't hurt to initiate variables
outputline = "" #holds the line until printout
prevfirstfield = "" #holds the first field of the previous line
}

{
if (NF == 0) {
        #skip empty lines
next
}
if (prevfirstfield == "") {
#No prior line --> Copy this one to output buffer
outputline = $0
prevfirstfield = $1
next
}
if ($1 == prevfirstfield) {
#Field 1 identical to prior line
#delete trailing <"> of current output line
gsub(/"$/,"",outputline)
#delete heading <"> of additional keyword
gsub(/^"/,"",$3)
#--> append "##" + additional keyword to output buffer
outputline = outputline "##" $3
} else {
#New field 1 (filename)
#--> print output buffer
#--> refill output buffer and reset prevfirstfield-variable
print outputline
outputline=$0
prevfirstfield=$1
}
}

The awk-script adds one line to the beginning of the csv-file holding the tag-name info for exiftool. Additionally, it combines consecutive lines with the same filename in the first field into a single line and concatenates the tags in the third field (i.e. the problem I wasn't able to solve using SQL). As a little trick, the script deletes trailing and heading quotation marks and inserts a double hashtag as keyword separator as StarGeek suggested.

Presuming the awk-script is saved as Pixelpost_Titles+Tags.awk, a suitable command-line to execute awk is:

awk -f Pixelpost_Titles+Tags.awk Pixelpost_Titles+Tags.csv > exiftool_Titles+Tags.csv

The result is saved into the file exiftool_Titles+Tags.csv in the right format to be imported into exiftool.

The last two steps of the migration are now to store titles and keywords in the image metadata and to import the updated files into piwigo. I recommend to work with a copy of the image-directory of your pixelpost installation just in case that something goes wrong. Assuming that all photos as well as the awk-output file ( e.g. exiftool_Titles+Tags.csv) are in the current directory, the command line to invoke exiftool is

exiftool -m -delete_original! -sep "##" -csv=exiftool_Titles+Tags.csv ./ > exiftool.log 2>&1

Note:

  • The option "-m" suppresses minor warnings. It is optional.
  • The option "-delete_original!" makes exiftool overwrite the original files. It is recommended to not use this option if you are working with your original files.

The output of exiftool is saved to a log-file (exiftool.log). You may want to look for errors there.

If piwigo is properly set-up to read and map the metadata during import (see https://piwigo.org/doc/doku.php?id=user_documentation:metadata), one only need to upload the .jpgs from the working directory to the piwigo import folder and that's it.

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).