Hi,
I am new to this forum, though I have been reading a lot of the posts over the last few months. Apologies if I repeat issues / questions previously raised - if I have please just point me in the direction of where I should go to read up on this.
I am a Perl programmer ( and DB programmer ), so using the Perl Libraries to perform the tasks below, but my questions I think / hope are more theorical / knowledge based than implementation specific.
I want to perform two tasks;
(1) Find duplicate pictures ( I have some 42k photos taken over 10 years with 10-15 camera's / providers - also some are edited copies which I want to filter out to their own area )
(2) Create a process to write an original ( ID, FileName, Camera and Date ) to new tags so that in future I will be able to identify which files are duplcites by counting those with all four fields set the same
So, I have ( after many late nights ) written the code to read all the meta information in the photo tags, inserts it to MySQL database and identifies "potential" duplicates by grouping on as many tag entries as possible. I have also written a script to apply the new tag details to any file that doesn't already have them.
And this leads me to a whole set of questions that I really hope someone will take a little time to answer ( if only short answers or links to useful discussions )
(1) I found the following link of Tags
http://www.exiftool.org/TagNames/index.html
The table lists JPEG, EXIF, GPS,... are these what in theory are refered to as "tags" or just convenient ways of grouping the actual tags? I am assuming the later.
(2) Has this already been done such that I am re-inventing the wheel?
Note that I'll still continue on my quest, but would be good to see other peoples ideas / process on how to do it
(3) Is there ANY tag written ( in general ) that should uniquely identify a photo – I am looking at created times, file names, maker, camera etc. as a key to uniqueness
(4) Will my "new" tags that I add get moved with copies of the pictures? My experiments suggest that they will
(5) Which "TagIDs" should I be using for the above – I used 45000-> 45003?
(6) Will modifying the exif data potentially harm the image of the photo
I guess that is all I have to start, but all information help would be much appreciated.
And what a brilliant tool this is !
Quote from: BuzzBunny on December 26, 2010, 05:10:51 PM
(1) Find duplicate pictures ( I have some 42k photos taken over 10 years with 10-15 camera's / providers - also some are edited copies which I want to filter out to their own area )
If the files are identical, I find the MD5 utility very useful for this purpose. If the metadata or the image changed, this won't work of course. If the metadata may have changed, you can use exiftool to strip the metadata then do an MD5 of the result:
exiftool -all= FILE - | md5.
QuoteThe table lists JPEG, EXIF, GPS,... are these what in theory are refered to as "tags" or just convenient ways of grouping the actual tags?
In ExifTool terminology, these are tag tables. The tags themselves are the names which refer to the elemental units of information that ExifTool reads/writes.
[/quote]
(2) Has this already been done such that I am re-inventing the wheel?
Note that I'll still continue on my quest, but would be good to see other peoples ideas / process on how to do it[/quote]
Not that I know.
Quote
(3) Is there ANY tag written ( in general ) that should uniquely identify a photo – I am looking at created times, file names, maker, camera etc. as a key to uniqueness
There is an ImageUniqueID tag in EXIF which may be what you want.
Quote(4) Will my "new" tags that I add get moved with copies of the pictures? My experiments suggest that they will
I don't know what you mean. Are you asking if they are preserved when editing the image with other applications? This depends on the application.
Quote
(5) Which "TagIDs" should I be using for the above – I used 45000-> 45003?
Now I really don't know what you are talking about. What are the tag names?
Quote
(6) Will modifying the exif data potentially harm the image of the photo
No. ExifTool will not change the image data itself. But a few EXIF tags do affect how an image is rendered. But basically just the the orientation and color space.
- Phil
Hi Phil, thanks for prompt response, and sorry if clarity was lacking - it made sense in my head ;)
(4) I was thinking of two cases, the first is a simple copy of a file using Windows, Linux etc. And the second is say when I edit the picture in an app like PhotoShop and then save it as a copy, in each case I was wandering if the copy would retain my new Tags - I think your response probably answers that.
(5) I mean, if I create 4 new tags, WPSOrigFileDate, WPSOrigFileName, MyUniqueId, MyDescTag, then my exif config looks like this;
%Image::ExifTool::UserDefined = (
# All EXIF tags are added to the Main table, and WriteGroup is used to
# specify where the tag is written (default is ExifIFD if not specified):
'Image::ExifTool::Exif::Main' => {
45000 => {
Name => 'WPSOrigFileDate',
Writable => 'string',
WriteGroup => 'IFD0',
},
45001 => {
Name => 'WPSOrigFileName',
Writable => 'string',
WriteGroup => 'IFD0',
},
},
45002 => {
Name => 'MyUniqueId',
Writable => 'string',
WriteGroup => 'IFD0',
},
},
45003 => {
Name => 'MyDescTag',
Writable => 'string',
WriteGroup => 'IFD0',
},
},
);
Is it ok to use the ID's 45000 to 45003 ?
(7) Added to this, am I ok using IFD0 and Exif::Main, or is there a better location to put my Custom Tags.
OK, I didn't understand that you wanted to create your own tags. I imagine that it is less likely other software will preserve unknown tags, but you will have to test this with the software you use.
As to the choice of Tag ID's, I can't advise on this because I can't see into the future to tell what conflicts you may have. To avoid conflicts and provide better future compatibility, maybe XMP is a better choice than EXIF.
- Phil
Phil,
If you use exiftool to delete the metadata, do an md5, find out the files are identical, delete one of the pair, how do you then get the metadata back ? Will exiftool backup the metadata too ?
Quote
If the files are identical, I find the MD5 utility very useful for this purpose. If the metadata or the image changed, this won't work of course. If the metadata may have changed, you can use exiftool to strip the metadata then do an MD5 of the result: exiftool -all= FILE - | md5.
Do you know of any md5 tool which can do an md5 comparison whilst ignoring tag data ?, this would be brilliant, because I could use it on my digital photos and my mp3's etc.
cheers
Simon
Hi Simon,
You don't need to modify the file. This will work in any Mac or Linux shell, and should also work in Windows if you have an md5 utility installed:
exiftool FILE -all= -o - | md5
- Phil
Phil,
Sorry for the delay in responding, I lost my raid array (= not funny). So I'm interested in finding duplicate pictures and mp3's which have different meta data such as artist or date taken etc but which have the same content otherwise. I was a touch worried about running the command as I thought I saw an overwrite command in there, so I copied a file over an executed it on a duplicate control set. I got the following result:
exiftool Aerosmith\ -\ Living\ on\ the\ edge.mp3 -all= -o - | md5sum
Error: Writing of MP3 files is not yet supported - Aerosmith - Living on the edge.mp3
d41d8cd98f00b204e9800998ecf8427e -
Please can you confirm this command line ? ( I seriously don't want to blank the meta data for any of my pics as I can't get it back)
Thanks by the way ;-)
Simon
Hi Simon,
Sorry, I missed your comment about the MP3's. You aren't doing anything wrong. The problem is that ExifTool doesn't support writing MP3's, so this won't work with these files.
- Phil
Quote from: Phil Harvey on March 01, 2011, 07:35:19 AM
Hi Simon,
Sorry, I missed your comment about the MP3's. You aren't doing anything wrong. The problem is that ExifTool doesn't support writing MP3's, so this won't work with these files.
- Phil
Hi Phil,
You didn't miss anything. I moved the goal posts ;-). What I'm curious about is that I don't want to write to the mp3's or jpg's, I'm just after the md5sum from them so I can see which have the same sum, and which are different. In your previous post you said "You don't need to modify the file" but isn't that what this command does ?
cheers
Simon
Hi Simon,
What I meant was that you don't need to modify the original file, but using this technique you do need to generate a modified copy of the file for the MD5. The pipe is nice because the system handles the management of any intermediate files (probably in memory).
- Phil
I reopen this very old but very interesting post to add some hint about how to achieve the asked task (MD5-hashing mp3 file, which are indeed not read by ExifTool - hence my post is not directly ExifTool related, hopefully I am not hurting anyone here and not infringing any rule or etiquette).
I personally use Audio::Scan module to do such thing, with something such as:
use Audio::Scan;
my $size_to_md5 = 64 * 1024; # perldoc Audio::Scan recommends 64k
foreach my $f ( @ARGV ) {
my $data = eval {
# Enforce md5_offset to explicitely start at 0 (perldoc is ambiguous)
Audio::Scan->scan( $f, { md5_size => $size_to_md5, md5_offset => 0 } );
};
if ( $@ || not defined $data ) {
print STDERR "ERR> $f: cannot Audio::Scan", $@ ? " ($@)": "", "\n";
next;
}
my $md5 = $data->{info}->{audio_md5};
print $md5, ": ", $f, "\n";
}
Hope this helps...