Building a Flat File with Exiftool Data and Other Data

Started by dbqandersons, November 05, 2024, 08:43:26 PM

Previous topic - Next topic

dbqandersons

So, I've got another script (bash running on Ubuntu) that dumps image data into a flat file for comparison against image data stored in a database: File Path, File Name, Unique Image ID, and File md5sum.

As you can see by my loop structure, I'm calling exiftool once per image.  Icky. 

for FILE in `cat ${LIST_FILE}`
do
  echo -n $FILE | rev | cut -d "/" -f2- | rev | tr '\n' ' ' | sed -e 's/ $//' >> $OUTPUT_FILE
  echo -n "|" >> $OUTPUT_FILE
  echo -n $FILE | grep -o '[^\/]*$' | tr -d '\n'>> $OUTPUT_FILE
  echo -n "|" >> $OUTPUT_FILE
  echo -n "`exiftool -S -s -imageuniqueid $FILE`" >> $OUTPUT_FILE
  echo -n "|" >> $OUTPUT_FILE
  echo "`md5sum $FILE | awk '{print $1}'`" >> $OUTPUT_FILE
done


Looking to squeeze some more performance out of this thing. Any suggestions to keep exiftool open through the run of the script so it doesn't have to load once per image?  I know I can get exiftool to spit out the file name as well as the Unique Image ID, but I'm not sure how to grab the file path and md5sum and munge everything into a single line per image while not doing it image by image.

Cheers,

Bill

Phil Harvey

Hi Bill,

I'm thinking this may all be done in a single command with the -p option of ExifTool :) :).

The command will likely be something like this:

exiftool -@ $LIST_FILE -p fmt.txt >> $OUTPUT_FILE

Although I can't say exactly what your fmt.txt file will be because I don't have time to figure out what all of your awk/sed/grep shenanigans are doing, but as an example, the md5sum part could be based on something like this (and you could add appropriate regular expression substitutions to match your awk command):

${filepath;$_=`md5sum "$_"`}

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dbqandersons

#2
Thanks Phil; I'll read up on the -p and fmt.txt usage and give that a try.

As for my awk/sed/grep shenanigans, no worries; I can't figure them out half of the time myself! 

Thanks!

Bill

StarGeek

I don't know if this might interest you, but exiftool can do an md5 hash of just the image data with the ImageDataHash tag. This is different from using md5 because any edit of the metadata will change the md5 hash, but the ImageDataHash will stay the same.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

dbqandersons

I did see that as I was sifting through the documentation. Not sure if it'll help me in my use case, but I'll keep it in the back of my mind.

Thanks,

Bill

dbqandersons

So, here's what I came up with that seems to be working pretty well.

exiftool -q -q -p ./exiftool-daily-dump-format.txt `cat /tmp/list-of-images.txt`

Here's the content of the format file.

${filepath;$_=`echo -n "$_" | rev | cut -d / -f2- | rev | tr '\n' ' ' | sed -e 's/ \$//'`}|${filename}|${imageuniqueid}|${filepath;$_=`md5sum "$_" | head -c 32`}

One more question I do have. I noticed that if a path to a file is given as a symlink, then the -filepath is written as the "true" path. 

In this example: /var/www/dev/photos/ is a symbolic link to /sftpjail/dev/photos/

$ exiftool -filepath /var/www/html/dev/photos/bill1.jpg
File Path                       : /sftpjail/dev/photos/bill1.jpg
$


Is it possible to change this behavior with a flag or something? If so, I haven't found it yet.

Thanks,

Bill

dbqandersons

NVM, I added a little more insanity to my formatting file.

${filepath;$_=`echo -n "$_" | rev | cut -d / -f2- | rev | tr '\n' ' ' | sed -e 's/ \$//' | sed -e 's/sftpjail/var\\/www\\/html/'`}|${filename}|${imageuniqueid}|${filepath;$_=`md5sum "$_" | head -c 32`}

thanks for the help, gents.  I appreciate it.

Cheers,

Bill

Phil Harvey

You can use Directory and FileName instead of FilePath if you don't want the full path name.

Also, I would suggest using Perl regular expressions instead of running so many external commands.  But hey, I know you're more familiar with those.  Generally, I would avoid running external commands unless absolutely necessary.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

dbqandersons

So yeah.

Duh on my part for not knowing about and/or finding on my own the -Directory tag/field. Working in IT (primarily UNIX) for 25+ years, I'm very much an RTFM kind of guy and in this case I didn't RTFM enough (or not the right parts of the M, anyway)!  :-[

New formatting file.

${directory}|${filename}|${imageuniqueid}|${filepath;$_=`md5sum "$_" | head -c 32`}

I did consider using the -imagedatahash, but decided to take the overhead hit and get the full file md5sum (at least for now).  Either way, my performance has definitely improved from where I was before.

Thanks as always,

Bill