Currently I am have a fairly generic output string to output all tags on a file. As part of an academic project my intent is to run this script on multiple distinct and unrelated computer systems. I want to uniquely identify each file found and detect duplicates both on the same system as well as multiple systems. The system names I do not believe are included in the output either, but I have ways around that.
Using the filename and date as well as other parameters thus far have failed to identify files uniquely and consistently. I realize SHA1 and MD5 have issues too.. but something relatively straight forward.
Is there a way to add an SHA1 or MD5 file hash to each image file and have it be part of the CSV output so I can detect duplicates as well as potentially a couple other purposes? I figure it might be able to be done via a batch script, but its critical it be in the included CSV output of Exiftool.
exiftool -G -a -csv -r */*.* > csv.txt
Thanks.
Christopher
Hi Christopher,
In general, you shouldn't be running -csv on a large number of files -- it becomes very memory intensive. I suggest using JSON (-j) instead.
Adding a column to the CSV output that is not a tag extracted by ExifTool require a custom script to post-process the CSV file. This is not something you could do with ExifTool.
- Phil
Hi Phil,
Could you not make use of a custom tag that uses the back tick perl construct to capture the md5 value?
Quote from: Hayo Baan on October 15, 2014, 01:26:04 PM
Could you not make use of a custom tag that uses the back tick perl construct to capture the md5 value?
Wow. Bonus points for being smarter than me! :)
Cool idea. The following config file should do it (provided you have a "md5" command on your system):
%Image::ExifTool::UserDefined = (
'Image::ExifTool::Composite' => {
MD5 => {
Require => {
0 => 'FileName',
1 => 'Directory',
},
ValueConv => q{
my $md5 = `md5 "$val[1]/$val[0]"`;
chomp $md5;
$md5 =~ s/.* //; # isolate MD5
return $md5;
},
},
},
);
- Phil