exiftool processing output and tracking

Started by fxstein, February 25, 2022, 12:58:06 PM

Previous topic - Next topic

fxstein

Hi,

Have made good progress on my exiftool wrapper https://github.com/fxstein/GoProX specifically for a GoPro based media workflow. I believe I picked up all the GoPro and mp4 related Q&As from the forums here and added a few findings myself.

My simple question: How can I programmatically track what files exiftool has written (vs read)?

Some background:

In the tool I am establishing a simple workflow for media that should get collected and processed on an ongoing basis. For that the tool creates a simple file based Media library with a few main parts: imported, processed and others like archived.
These are directory trees that contain evolutions of the various media files.

As a first step I only copy and rename. When adding files from an SD card I take the incoming file name like GOPR1353.JPG and copy the file into one named 20210615110514_GoPro_Hero9_9650_GOPR1353.jpg. During this import no contents of the file are being touched and the files are being sorted into yearly and daily subfolders: 2021/20210615/

The exiftool progress output will list the incoming filenames but not the written files names:


[2022-02-25 09:12:02] ======== ./test/originals/GS013292.360 [1/14]
[2022-02-25 09:12:03] ======== ./test/originals/GOPR0182.JPG [2/14]
[2022-02-25 09:12:03] ======== ./test/originals/IMG_4785.HEIC [3/14]
[2022-02-25 09:12:03] ======== ./test/originals/GX012304.MP4 [4/14]
[2022-02-25 09:12:03] ======== ./test/originals/GS__1614.JPG [5/14]
[2022-02-25 09:12:03] ======== ./test/originals/GOPR2320.JPG [6/14]
[2022-02-25 09:12:03] ======== ./test/originals/GH013156.MP4 [7/14]
[2022-02-25 09:12:03] ======== ./test/originals/GOPR3422.JPG [8/14]
[2022-02-25 09:12:03] ======== ./test/originals/GOPR2313.JPG [9/14]
[2022-02-25 09:12:03] ======== ./test/originals/GH010739.MP4 [10/14]
[2022-02-25 09:12:04] ======== ./test/originals/GX010093.MP4 [11/14]
[2022-02-25 09:12:04] ======== ./test/originals/IMG_9055.MOV [12/14]
[2022-02-25 09:12:04] ======== ./test/originals/IMG_9134.MOV [13/14]
[2022-02-25 09:12:04] ======== ./test/originals/20220219130031_SCENIC.jpg [14/14]


For reference the import statement:
 
  exiftool -r -progress -q -q -o "${importdir}"'/NODATE/'\
  '-FileCreateDate<FileCreateDate'\
  '-FileCreateDate<CreateDate'\
  '-filename<${FileName}'\
  '-filename<${FileCreateDate;DateFmt("%Y%m%d%H%M%S")}_NODATA_%f.%e'\
  '-filename<${CreateDate;DateFmt("%Y%m%d%H%M%S")}_NODATA_%f.%e'\
  '-filename<${CreateDate;DateFmt("%Y%m%d%H%M%S")}_'\
'${Model;s/\s/_/g;}_%f.%e'\
  '-filename<${CreateDate;DateFmt("%Y%m%d%H%M%S")}_'\
'${Model;s/\s/_/g;}_'\
'${CameraSerialNumber;$_=substr($_,-4);}_%f.%e'\
  '-directory<'"${importdir}"'/${FileCreateDate;DateFmt("%Y")}/${FileCreateDate;DateFmt("%Y%m%d")}'\
  '-directory<'"${importdir}"'/${CreateDate;DateFmt("%Y")}/${CreateDate;DateFmt("%Y%m%d")}'\
  --ext lrv --ext thm --ext xmp --ext .\
  -api 'Filter=s/HERO10 Black/GoPro_Hero10/g;'\
's/HERO9 Black/GoPro_Hero9/g;'\
's/GoPro Max/GoPro_Max/g;'\
's/HERO8 Black/GoPro_Hero8/g'\
  "${source}"


Once imported I can optionally run a geonames task that takes the first valid geolocation based on embedded GPS data and performes a geonames lookup to determine the true timezone the media was taken in. Together with the file creation date that allows me to get the exact time offset at the day the files were taken. I can then use that for a timeshift task to correct incorrect date/time when eg a camera reset to its default startup date. There will be more tasks in the future like importing GPX files from GPS apps and other metadata I might want to pull in.

This is all basically setup and data gathering until now. I then have a process task that takes all of that data and now rewrites the files contents leveraging all that data to create keywords, correct GPS tags, add additional information and further sort the files into the processed structure. The outgoing and processed files are sorted by major file type, year and date and are marked up as processed. Above example becomes processed/JPEG/2021/20210615/P_20210615110514_GoPro_Hero9_9650_GOPR1353.jpg

The processing output once again is by source file (I separate into 3 passes because of unique tag/metadata requirements for processing):


[2022-02-25 09:12:04] Info: First pass: 1/3 - All files but mp4 and 360
[2022-02-25 09:12:05] ======== ./test/imported/2022/20220219/20220219130029_iPhone_13_Pro_Max_20220219130031_SCENIC.jpg [1/9]
[2022-02-25 09:12:05] ======== ./test/imported/2022/20220128/20220128180829_iPhone_13_Pro_Max_IMG_4785.HEIC [2/9]
[2022-02-25 09:12:05] ======== ./test/imported/2022/20220206/20220206144720_GoPro_Hero10_8034_GOPR2313.JPG [3/9]
[2022-02-25 09:12:05] ======== ./test/imported/2022/20220206/20220206145556_GoPro_Hero10_8034_GOPR2320.JPG [4/9]
[2022-02-25 09:12:06] ======== ./test/imported/2021/20210825/20210825111649_iPhone_12_Pro_Max_IMG_9134.MOV [5/9]
[2022-02-25 09:12:06] ======== ./test/imported/2021/20210825/20210825091917_iPhone_12_Pro_Max_IMG_9055.MOV [6/9]
[2022-02-25 09:12:06] ======== ./test/imported/2021/20210606/20210606094917_GoPro_Hero9_4139_GOPR0182.JPG [7/9]
[2022-02-25 09:12:06] ======== ./test/imported/2021/20210806/20210806114935_GoPro_Hero9_4139_GOPR3422.JPG [8/9]
[2022-02-25 09:12:06] ======== ./test/imported/2016/20160130/20160130231216_GoPro_Max_6013_GS__1614.JPG [9/9]
[2022-02-25 09:12:07] Info: Second pass: 2/3 - Only mp4 files
[2022-02-25 09:12:07] ======== ./test/imported/2022/20220214/20220214114307_NODATA_GH010739.MP4 [1/4]
[2022-02-25 09:12:07] ======== ./test/imported/2022/20220214/20220214114307_NODATA_GX010093.MP4 [2/4]
[2022-02-25 09:12:07] ======== ./test/imported/2022/20220206/20220206143427_GoPro_Hero10_8034_GX012304.MP4 [3/4]
[2022-02-25 09:12:07] ======== ./test/imported/2021/20210627/20210627102316_GoPro_Hero9_0021_GH013156.MP4 [4/4]
[2022-02-25 09:12:08] Info: Third pass: 3/3 - Only 360 files
[2022-02-25 09:12:08] ======== ./test/imported/2021/20211015/20211015084940_GoPro_Max_6013_GS013292.360 [1/1]
[2022-02-25 09:12:08] Info: Finished media processing


These filenames are again the incoming files from imported not the outgoing ones written to processed

The process task can perform simple incremental, partial or all processing of the imported originals.

As stated in my original question is there:

  • a way to display progress by output filename?
  • a way to collect all files written to in a separate file that can be fed into other tasks?

especially in the second case skipping files that already existed. My Media library is already TBs of media files so I have to be highly selective how much I touch or process at any given point in time.

Sorry for the long post, but wanted to provide context. Thanks!
If you want to help fix GoPro and related EXIF metadata please check out: https://github.com/fxstein/GoProX

fxstein

Just checking in to see if there are any thoughts on how to accomplish that?
If you want to help fix GoPro and related EXIF metadata please check out: https://github.com/fxstein/GoProX

Phil Harvey

Sorry, I don't have time to read through your long post.  But I'll answer the questions at the end of your post:

Quote from: fxstein on February 25, 2022, 12:58:06 PM

  • a way to display progress by output filename?
  • a way to collect all files written to in a separate file that can be fed into other tasks?

All normal ExifTool output references the source file name. 

If you are renaming the file, the only way to get the new file name is by parsing the -v output and looking for lines like this:

'a.jpg' --> 'b.jpg'

This should accomplish both things you wanted.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

fxstein

Phil,

Thanks a lot for the answer. Unfortunately, the -v option is not a viable solution as it increases the overall output of the ExifTool by more than an order of magnitude. The amount and complexity to parse that back to resemble the -progess output is prohibitive. Also, there is a significant performance overhead when running in verbose mode.

Maybe this could be considered as a simple extension for the -progress option. Whereby default you keep it as it is today (input file name) but optionally allow for output or both (like in -v) to be specified.

Thank you!
If you want to help fix GoPro and related EXIF metadata please check out: https://github.com/fxstein/GoProX

Phil Harvey

I think the solution is to output the new file names with the -v0 option.  I'll do this in version 12.41.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

fxstein

A quick thank you for making this happen!

Great to have -progress show source and target.

'/tmp/goprox.frj72Q/DCIM/101GOPRO/G0256406.JPG' --> '/Volumes/Office G-RAID/goprox/imported/2022/20221028/20221028081612_GoPro_Hero11_5131_G0256406.JPG'
If you want to help fix GoPro and related EXIF metadata please check out: https://github.com/fxstein/GoProX