Adding SHA1 or MD5 hash to output for use in detecting uniqueness of files.

Started by crsouser, October 14, 2014, 08:08:40 PM

Previous topic - Next topic

crsouser

Currently I am have a fairly generic output string to output all tags on a file. As part of an academic project my intent is to run this script on multiple distinct and unrelated computer systems. I want to uniquely identify each file found and detect duplicates both on the same system as well as multiple systems. The system names I do not believe are included in the output either, but I have ways around that.

Using the filename and date as well as other parameters thus far have failed to identify files uniquely and consistently. I realize SHA1 and MD5 have issues too.. but something relatively straight forward.

Is there a way to add an SHA1 or MD5 file hash to each image file and have it be part of the CSV output so I can detect duplicates as well as potentially a couple other purposes?  I figure it might be able to be done via a batch script, but its critical it be in the included CSV output of Exiftool.

exiftool -G -a -csv -r */*.* > csv.txt

Thanks.

Christopher


Phil Harvey

Hi Christopher,

In general, you shouldn't be running -csv on a large number of files -- it becomes very memory intensive.  I suggest using JSON (-j) instead.

Adding a column to the CSV output that is not a tag extracted by ExifTool require a custom script to post-process the CSV file.  This is not something you could do with ExifTool.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Hi Phil,

Could you not make use of a custom tag that uses the back tick perl construct to capture the md5 value?
Hayo Baan – Photography
Web: www.hayobaan.nl

Phil Harvey

Quote from: Hayo Baan on October 15, 2014, 01:26:04 PM
Could you not make use of a custom tag that uses the back tick perl construct to capture the md5 value?

Wow.  Bonus points for being smarter than me! :)

Cool idea.  The following config file should do it (provided you have a "md5" command on your system):

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        MD5 => {
            Require => {
                0 => 'FileName',
                1 => 'Directory',
            },
            ValueConv => q{
                my $md5 = `md5 "$val[1]/$val[0]"`;
                chomp $md5;
                $md5 =~ s/.* //; # isolate MD5
                return $md5;
            },
        },
    },
);


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).