append to csv from csv - delete doublicate entrys - how to?

Started by Bette, September 03, 2021, 10:14:25 PM

Previous topic - Next topic

Bette

Hi to all,

right now I'm using this: https://exiftool.org/forum/index.php?topic=8295.msg42604#msg42604 method from StarGeek:

if exist C:\Users\[...]\exif_out.csv (
    exiftool -ext jpg -a -u -g1 -n -csv %* |findstr /V /R "^SourceFile" >> C:\Users\[...]\exif_out.csv
) else (
    exiftool -ext jpg -a -u -g1 -n -csv %*  >> C:\Users\[...]\exif_out.csv
)

to append new image data to a csv Database (win batch for folders) >> Master_MetaDB.csv.
If existing images metadata is updated and batch is run again >> doublettes appear in the csv-Database.
In every image folder exists a csv file (out_Important.csv) with the metadata for only this folder.

Questions:
1) Is it possible to append csv-files (out_Important.csv) to this Master_MetaDB.csv rather than calling the images directly?
2) How can older doublicate entrys in the Master_MetaDB.csv be overwritten or deleted?

Thankful for every hint.

Greetings Bette

StarGeek

Quote from: Bette on September 03, 2021, 10:14:25 PMQuestions:
1) Is it possible to append csv-files (out_Important.csv) to this Master_MetaDB.csv rather than calling the images directly?
2) How can older doublicate entrys in the Master_MetaDB.csv be overwritten or deleted?

Exiftool can't really help here.  It doesn't process CSV files.

I did some searching for a way to update a CSV with data from a second CSV file but didn't find much that was helpful.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Alan Clifford

#2
Quote from: StarGeek on September 03, 2021, 11:43:31 PM
I did some searching for a way to update a CSV with data from a second CSV file but didn't find much that was helpful.

But you could do something like this but you would need to re-insert the header line at the top

cat newfile.csv  masterfile.csv > newmasterfile.csv; sort newmasterfile.csv | uniq > masterfile.csv


later ...
Sorry, that might not be helpful as it is a bash command line although I vaguely remember that windows could run bash these days.

StarGeek

But that only filters out exact duplicates and I believe the major problem here is if Image001.jpg is updated with new metadata, then the new CSV file needs to be created, which ends up with 2 different Image001.jpg entries when you combine.

And additional problem there might be that some fields, such as Description, could include a new line, which would break the file if sorted.  Though that would depend upon Bette knowing whether or not the data as like this.

The best I could think of would be see if a spreadsheet like Excel or LibreOffice could be used this way, but I don't know enough about spreadsheets.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Alan Clifford

Quote from: StarGeek on September 04, 2021, 10:15:46 AM
But that only filters out exact duplicates and I believe the major problem here is if Image001.jpg is updated with new metadata, then the new CSV file needs to be created, which ends up with 2 different Image001.jpg entries when you combine.

I misunderstood the meaning of "duplicate"

Quote
The best I could think of would be see if a spreadsheet like Excel or LibreOffice could be used this way, but I don't know enough about spreadsheets.

A spreadsheet would work.  You could sort on the file name column but you'd have to decide which was the unwanted duplicate.

Perhaps I'll have a think about the logic of this one.  Maybe temporarily create a date field or similar in the temporary csv file to identify the latest version of metadata for each photo.

Alan Clifford

Been sitting thinking.  Perl should be good for this.

Assuming the master csv is enormous and the new one is small:

Step through each master file photo line

For each photo name, check against each photo in the new file. 

If not found, save old entry to new master file otherwise save new entry.

StarGeek

Quote from: Alan Clifford on September 04, 2021, 12:39:39 PM
I misunderstood the meaning of "duplicate"

Your point must be considered as well, though.  There would be lines which would be exact duplicates which would need to be removed.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Bette

Thanks for your brilliant heads and good ideas Alan and StarGeek!

Yes, one problem while updating/appending to a csv is, that the SourceFile entry would stay the same and part of the rest entries could be modified -> duplicate entries for SourceFile will appear.

Possible solutions:
1) use Libreoffice Calc like StarGeek suggested. I use this method with basic scripts to ease the manual tagging. But for aggregating one central DB for all images I need a more mistake-resistent way (no manual handling).
2) use a win batch file like Alan suggested. I decided for this cause once created it does the job with one click (I'm lazy :-)

This is what I made:

a) In every image folder exists a csv file (out_Important.csv) with the metadata for only this folder (exiftool batch). This is updated/overwritten with all images data whenever changes are made to one or more image metadata in the folder.

b) Created imagesDB.bat that:
- Merges multiple csv files which are under different subfolders into a single csv file (files have the same name)
- Removes duplicates (especially for first lines)
- Creates: AllMetaDB_noDoublettes.csv with all images metadata

The batch scripts I modified to my needs are these:
https://stackoverflow.com/questions/32995274/how-to-merge-multiple-csv-files-which-are-under-different-subfolders-into-a-sing
and
https://stackoverflow.com/questions/53641754/remove-duplicates-on-csv-file-using-batch?noredirect=1&lq=1

Another possibility would be an exiftool batch, that goes through the subfolders with images and writes direct from the image files to AllMetaDB_noDoublettes.csv. But this is not suitable for me (sometimes I have image dublettes in these subfolders) and of course would take much longer.

If there is an easier way - please tell me.

Bette :)