Request: CSV processing | including or excluding specific columns

Started by Martin Z, May 30, 2023, 01:13:30 PM

Previous topic - Next topic

Martin Z

Hi Phil, StarGeek, everyone,

I recently asked if it was possible to get EXIFtool to process only some of the columns when updating metadata from a CSV (or similarly, exclude one or more columns from the update). StarGeek kindly replied and confirmed that this isn't currently possible (and the -csv is essentially an "all or nothing" option). Also, that using -TAG in combination with -CSV does not work.

Request
While I appreciate the above being the current situation, I wondered if it would be possible to consider enhancing the -CSV option so that:
  • It could be used in conjunction with the -TAG and --TAG options, to process only specific columns in the CSV
  • There was an option like -IgnoreNonTagColumns

Rationale
  • While I realise CSVs are far from perfect, part of the appeal for users like myself is their extremely easy interoperability with programs like Excel and their tabular format makes it very easy to visualise (and so spot errors and make corrections) to metadata
  • To assist with the above, I will often add various 'helper columns' (e.g. A column which checks the lists the earliest date given across all date columns... and another which then flags if the CreateDate is different to this 'earliest date'
      └ Currently, if I leave these extra columns in the CSV, EXIFtool will advise that column X is not a valid tag name and (I believe) won't process the file at all... It would be great to be able to essentially tell EXIFtool "if you don't recognise a column, that's fine, just process the columns that are valid tag names and ignore any that aren't"
  • In addition, given the broad adoption of CSVs across many apps, it is likely that a CSV could be multi-functional and be intended for multiple apps/scripts to share... For example, personally I feed the same CSV that EXIFtool uses into PowerShell afterwards to rename selected files. Now, as the filename column is not processed by EXIFtool, I can use this column to 'hide' my extra data for PowerShell in, however I am conscious that there are hardly any other columns that can be "repurposed" like this and so ability to do much more is very limited.
  • Ultimately, while making custom CSVs for EXIFtool to import is possible (and a route I planned to go down initially), in reality this option is somewhat clunky and inefficient, and it would be great if I could create 'one CSV to rule them all' and just tell EXIFtool to process selected columns from it and/or skip columns it does not recognise.

I realise that you likely have a queue of planning changes and so there is no guarantee that this change could be implemented any time soon, I hope however that you agree this would be a beneficial change to EXIFtool, would help users interact with EXIFtool more easily, and would make it easier for users to use EXIFtool to manage/update their metadata.

Thanks in advance!

Phil Harvey

I can see how this could be useful.  I'll think about this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

wywh

It seems a quick workaround is to just make a copy of the .csv, delete unnecessary columns in the spreadsheet app, save, and use that as an input.

exiftool -overwrite_original -csv=out.csv .

... I have used .csv only to make a backup copy of my image and movie metadata, and to check errors in metadata. But now I have started to experiment how to copy that info back, if needed. Just the other day I manually copied ~200 metadata items from an old spreadsheet memo to movies (Keys) and in the process manually checked that everything was correct. But I'd rather not do that again manually...

- Matti

Martin Z

Quote from: Phil Harvey on May 30, 2023, 07:13:45 PMI can see how this could be useful.  I'll think about this.   - Phil
Thanks Phil, that would be great!... Obviously just whenever you have time to fit this in is fine (Assuming you decide to implement it).



Hi Matti (@wywh) -- Thanks for your comment / suggested workaround...

Quote from: wywh on May 31, 2023, 11:19:58 AMIt seems a quick workaround is to just make a copy of the .csv, delete unnecessary columns in the spreadsheet app, save, and use that as an input.
Generally speaking I'd say that is probably a pretty reasonable approach (and is basically what I am doing at the moment), however in my case part of the challenge is that I am running running EXIFtool with CSV datasets sometimes 10+ times a day and so manually editing the columns in each CSV file is a bit tedious.

NB: I have looked at automating the creation of a second CSV, specifically just for EXIFtool however as it is a work-in-progress (and my EXIF / EXIFtool knowledge grows bit-by-bit), I am continually having to edit this code (plus there are times when I might want to make exceptions, and produce a "non-standard EXIFtool CSV", which has to be done manually still), and hence thinking there might be a better way to address this. (Plus, having EXIFtool process the -TAG and --TAG parameters the same way (or at least very similar) whether you're working from a CSV or another method, seemed like generally a good idea.

StarGeek

Another option might be using programs like CSVKit or xsv, which are command line programs that edit csv files in various ways.  You could pipe the output from them directly into exiftool along these lines

xsv <some commands> | exiftool -CSV=- /path/to/files/
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Martin Z

Quote from: StarGeek on June 01, 2023, 06:55:24 PMAnother option might be using programs like CSVKit or xsv, which are command line programs that edit csv files in various ways.  You could pipe the output from them directly into exiftool along these lines... xsv <some commands> | exiftool -CSV=- /path/to/files/

Thanks @StarGeek.

-- I had quick look into CSVKit, it seems quite useful but I think it's python-based right(?) - if so, I have only written 1 script in python so don't really know it very well at all (need to try at some point as it does keep popping up admittedly)

-- In terms of CSV manipulation or creating secondary CSV's I accept that this can be done (is a valid workaround), in fact I do already do this in my main PowerShell script that runs the whole workflow. The issue is I am already up to 3 CSV's per file...
  • EXIFmetadata.csv: This is built by 1 of 5 custom JSON-->CSV parsers I wrote and is explicitly for the purpose of passing to EXIFtool for it populate tags with metadata scraped by the likes of gallery-DL
  • FileDates.csv: This is copy of EXIFmetadata.csv but just contains SourceFile, FileCreateDate, FileModifyDate -- This is my idea of an [anti-"touch"] file. i.e. if I make a spot change to a file (crop a fex px off the edge, or add a keyword etc). This file can be used to reset the file's created/modified timestamps if the app used didn't preserve these (but again, a seperate CSV is needed [even though it contains identical data to EXIFmetadata.csv, just less of it] as if I pass EXIFmetadata.csv to EXIFtool again it will reset ALL the tags (i.e. undo the keyword I just added, etc).
  • ImageData.csv: This one caches data about images that other apps/scripts in the workflow require. I have to keep this separate as the columns aren't applicable to EXIFtool / don't match tag names, so EXIFtool would reject the whole file if these extra columns were included



Just to be clear, I'm not trying to be super-forceful on this point (I accept that you, Phil and others are busy and this request may not get implemented for a while (or maybe even never), that's fine - just wanted to shed a bit more light on my wider workflow and why it's not as easy as just say doing CTRL+C / CTRL+V my current CSV and delete a column or two 👍🏼

StarGeek

Quote from: Martin Z on June 17, 2023, 04:12:01 AM-- I had quick look into CSVKit, it seems quite useful but I think it's python-based right(?) - if so, I have only written 1 script in python so don't really know it very well at all

You don't need to know python.  You just have to have Python installed and then you use it like any command.  And the install is usually easy.  For csvkit, you would use
pip install CSVKit[/quote]

And from there, it's just getting to know the details on the various options.

There are a lot of pretty neat python scripts on github that can be used for very specific purposes.

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Martin Z

Quote from: StarGeek on June 17, 2023, 03:17:06 PMYou don't need to know python.  You just have to have Python installed and then you use it like any command.
Ah OK cool -- so it's more like a free-standing app that I can run/call from the (windows) cmd line, rather than being (just) a python module* that can that I can import into my existing existing python script (which is my case, I wouldn't have)...  *I realise that it is likely available as a python module too, but my point/initial concern was that this was the only way to use it.

Like I say, I will put it on my list of things to look into (can't promise when, but I do appreciate you sharing it none-the-less.

FYI for anyone interested, here is a quick comparison against PowerShell (i.e. the tool I'm currently using to do this stuff) for their listed use-cases on the homepage. [Key: ⬆️ Better than PowerShell  |  🟢 Same as PowerShell  |  🟡 Dubious/can't comment]...
  • Convert Excel to CSV -- 🟢 Useful (PowerShell also has a plugin to do this)
  • Convert JSON to CSV -- 🟡 Sounds good but am skeptical*
  • Print column names -- 🟢 This is pretty straight forward in PowerShell too
  • Select a subset of columns -- 🟢 This is pretty straight forward in PowerShell too
  • Reorder columns - ⬆️ Nice, this is actually not straight-forward in PowerShell
  • Find rows with matching cells - ⬆️ This is quite robust in PowerShell but their syntax looks simpler/slicker potentially
  • Convert to JSON -- 🟡 Sounds good but am skeptical*
  • Generate summary statistics - ⬆️ Nice, never tried to do this in PowerShell

* Just from having tried to this myself recently, converting structured / hierarchical data (like JSON) to tabular data (like CSV) and getting a nice clean output is not easy.