News:

2023-03-15 Major improvements to the new Geolocation feature

Main Menu

Image Unique Identifier

Started by Skippy, August 13, 2015, 08:00:39 PM

Previous topic - Next topic

Skippy

Exif data specifies a tag for a imageuniqueID but only some of my cameras use it.  The lack of a unique image ID in most jpg files makes a lot of photo management operations difficult.  I am wondering if ExifTool generates a hash that can be used to uniquely identify jpeg images.  The hash would exclude tags that can be edited in a photo manager, especially gps tags, title, keywords, description etc.  It may also use the existing imageuniqueID if it exists.  The hash should also be friendly to database indexing as tens of thousands of hashes may be recorded.  Picasa uses hashes to manage photos.  The hashes should also be fast to extract from jpgs so it should read tags that are easier to get at/process.  Does Exiftool posses such a capability?

StarGeek

Quote from: Skippy on August 13, 2015, 08:00:39 PMDoes Exiftool posses such a capability?

As you describe it, no, I don't believe it does.

As for some other options, ExifTool does have a tag called NewGUID which "generates a new, random GUID with format YYYYmmdd-HHMM-SSNN-PPPP-RRRRRRRRRRRR, where Y=year, m=month, d=day, H=hour, M=minute, S=second, N=file sequence number in hex, P=process ID in hex, and R=random hex number; without dashes with the -n option. Not generated unless specifically requested".

There was a Stackoverflow question last month which was similar.  It suggested using ImageMagick's Identify command.
identify -format %# FILE

The OP in that thread went with exiv2 rm | md5.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

I have suggested something like this in the past:

exiftool FILE -imageuniqueid=`exiftool FILE -all= -o - | md5`

This will work on Mac/Linux to add an MD5 checksum that depends only on the image.  I'm not sure how to accomplish this in Windows.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Nifty.  On Windows it'll probably require some PowerShell magic.  I'm going to have to look into it.

But for a Tif based file, you would have to add -CommonIFD0= at the very least, correct?  The fact that -all= wasn't removing all data was the root of the Stackoverflow question I linked to above. 
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Skippy

I am actually thinking that it is not necessary to write the hash into exif data.  What if there was a simple formula that could take some common tags and make a hash.  Shutter speed, aperture, photo sequence number, date-created and maybe a couple of others could be harvested and rolled up into a hash.  Would reading a few tags and making a hash be that much slower than just reading a hash.  Disk access is usually the limiting factor even for SD Cards so making the hash should require very limited overhead.  Picasa makes hashes and does not write them back into photos. 

My thinking is that I would like hash values for photos that are still on the SD card they were originally written on.  Backing up on an SD card is risky (card might be full) and slow and I don't think that exiftool can push the backups it makes to another directory.  I don't want to do that anyway as I would have to write code to later purge the backups.  I also do not want to write to the originals on the SD card as this slows down processing and is potentially risky without a backup.  Lots of my cards are semi-corrupt and who knows what could happen.  So pretty quickly, I come back to the idea of calculation a hash for jpegs on SD cards.  The hash can be written into a database to make it possible to detect new photos or trace which photos went where.  If a database is used, there is only a need to read hashes when new data comes along. 

Background info. 
I take hundreds of photos a week and after a few years discovered I could not find a lot of the photos I was sure I took.  So I audited a single card and found that I was not downloading about 30% of my photos.  To get my photos back, I wrote a database application that scanned my hard drive for photos and made a giant table listing all the photos it found.  I could then stick in an SD card run the application and see how many photos had already been copied.  The application let me rescue over 20 000 photos in just a few weeks.  Unfortunately the heart of the application includes nested SQL queries about six levels deep so the algorithm nearly breaks my head.  If photos had GUID in them, most of this complexity could disappear.  A screenshot of my application is attached.


Phil Harvey

Quote from: Skippy on August 14, 2015, 04:08:16 AM
I don't think that exiftool can push the backups it makes to another directory.

It can write the modified files to a different directory, but the originals are never moved.

QuoteSo pretty quickly, I come back to the idea of calculation a hash for jpegs on SD cards.

You can easily create a hash from whatever you want.  In my example command, I showed how to do this the image data.  You don't need to write it to ImageUniqueID -- that was just an example of one way to use this hash.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Skippy

#6
Hi Phil,
There only seem be a few tags that are universal between camera brands.  This makes it really hard to write a hashing algorithm.  So far I have found only ISO, shutterspeed, aperture, [camera] model and CreateDate.  To be of any use as a hash, the tag value should change between photos.  One tag that I would like to have would be the camera serial number but I am not sure that it exists.  It would also be great to have the photo number, serial number or original file name.  Canons present most of these values and also an imageUniqueID.  It is photos from my Nikon P7800 that don't present any useful values that are my current road block.

The newGUID function calculates a new GUID every time it runs, even with the same jpg, so unless the newGUID is written into the jpg, then I am not sure how it could be used.  I am not sure which exif field would be good to write the newGUID into either.  What would be the safest exif field to write to?

Would it be possible to hash a complex block of data within the jpg file such as the maker note or preview thumbnail and come up with a unique ID? 

Skippy

Phil Harvey

Hi Skippy,

My suggestion was to hash the image data, and if you write it to a tag I suggested ImageUniqueID.

But you can of course hash anything you want and put it anywhere you want.  I suggest you look at the Extra Tags documentation if you want to hash blocks of metadata.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Skippy

#8
Hi Phil,

I have not been able to write the imageUniqueID tag back to nikon photos which do not have that tag to start with, so I have switched to output to the IPTC:ObjectName which is also a tag that is considered write once.  I tried to place newGUID in this field following your suggestion exiftool FILE -imageuniqueid=`exiftool FILE -all= -o - | md5 but on careful reading it seems that one instance of exiftool is calling another instance.  I am not fond of applying that solution to a whole folder full of jpgs.  Can exiftool read and write in one session.  For example generating a newGUID and writing it to a jpeg in one invokation?  If I have to make two passes through the data, one to read tags and another to write tags, then I have to find a way to detect the end of processing of the first operation.  If not, the second would proceed before the first one finishes.  There are a few threads on this issue but I have not found a good solution yet.  It is noted that detecting a process running in a command window is really hard. 

I have found getting to grips with exiftool heavy going so I am being verbose in my responses to be more self-documenting.  Most VBA coders are not in the same league as C#/C++ developers. I am finding that the ExifTool GUI is great for confirming that writes by exiftool are working so I am generating the code in VBA and pasting it into a cmd window to run it then confirming the results with ExitTool GUI.  I am using conemu as a shell for the command line as this makes cut and paste easier. 


Phil Harvey

Quote from: Skippy on August 18, 2015, 12:13:19 AM
I have not been able to write the imageUniqueID tag back to nikon photos which do not have that tag to start with

What was your command?  Were there any messages?  You should be able to do this.

QuoteI tried to place newGUID in this field following your suggestion exiftool FILE -imageuniqueid=`exiftool FILE -all= -o - | md5 but on careful reading it seems that one instance of exiftool is calling another instance.

What system are you using?  What were the messages?  This should work on Mac/Linux, but as I said, not on Windows.

QuoteCan exiftool read and write in one session.  For example generating a newGUID and writing it to a jpeg in one invokation?

Yes.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Since OP mentions VBA and ExifToolGui, I would assume Windows.

I created a batch file which will replicate Phil's linux command.  Windows doesn't come with a built in md5 command, at least as far as I can find.  There may be something power shell related, I found some references, but I'm not familiar enough with the workings of PS to get that to work.

I previously had a command called md5sum from this sourceforge page and I also tried this md5 build.  Either works fine, though the first would require correcting the bat file.

Here's the batch file (I added -CommonIFD0= to deal with tiff based files):
for /f usebackq %%F in (`exiftool %* -all^= -CommonIFD0^= -o -^|md5`) do set args=%%F
exiftool %* -imageuniqueid=%args%


Example output:
PS C:\WINDOWS\system32> md5ToTag.bat X:\!temp\Test3.jpg

C:\WINDOWS\system32>for /F usebackq %F in (`exiftool X:\!temp\Test3.jpg -all= -CommonIFD0= -o -|md5`) do set args=%F

C:\WINDOWS\system32>set args=8FA709452C424CE9B576FEBB35EEF2F3

C:\WINDOWS\system32>exiftool X:\!temp\Test3.jpg -imageuniqueid=8FA709452C424CE9B576FEBB35EEF2F3
    1 image files updated


It can only work on one file at a time.  If you use it on a directory, they will all receive the same id.  Someone with better powershell skills would have to alter it to fix that.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Skippy

@Phil

I am using Win 8.1 and ms-access 2010.   By writing VBA code in ms-access, I can be sure to generate exactly the same cmd string every time.  My test environment involves shelling out to exiftool for each separate photo with a command like below.
"C:\Temp\exif\exiftool.exe" -IPTC:ObjectName=4567 -overwrite_original C:\Temp\exif\141_0604\DSCN6401.JPG
That works.  I have tried to get exiftool to do something like:
"C:\Temp\exif\exiftool.exe" -IPTC:ObjectName=-newGUID -overwrite_original C:\Temp\exif\141_0604\DSCN6401.JPG
That does not work as the text string "-newGUID" is inserted into the ObjectName field rather a newGUID value.  I am guessing that this failure is not just due to getting the argument list syntax correct but is due to my approach.

What I am hoping for is to be able to get a single instance of exiftool to process a directory fully of jpegs, adding a unique ID to each image AND generating a json file which contains the unique ID (and other selected tags).  The json file can then be imported into  a database.  It is likely that other tags will also need to be written during the processing, particularly organisation name and contact details. I am working with ms-access now but the approach I develop now is likely to be ported to C#/SQL Server or to an open source stack with Postgres or Firebird as a backend.

StarGeek

Quote from: Skippy on August 18, 2015, 06:12:04 PM
"C:\Temp\exif\exiftool.exe" "-IPTC:ObjectName=-newGUID" -overwrite_original C:\Temp\exif\141_0604\DSCN6401.JPG
Change the equal sign (assignment) to a less than sign (tag copy) (that's how I remember them, not sure if that's technically correct), remove the second minus sign, and since you're on a windows machine, put double quotes around it to prevent Windows from interpreting it as redirection.
"-IPTC:ObjectName<newGUID"

* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

StarGeek,

Quote from: StarGeek on August 18, 2015, 06:45:28 PM
(that's how I remember them, not sure if that's technically correct)

Thanks.  This is technically correct.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Skippy

I have solved the Image Unique Identifier issue (I think).  The solution is complex but seems to work.  I am trying to trace photos from their origins on a SD Card through to where ever they go.  The simplest solution of course is to buy a Canon camera or other brand that already puts ImageUniqueIDs in ever photo.  I have had Canon, Nikon, Sony, Pentax, Lumix and Fuji cameras and only the Canon photos are good to start with.  This solution works for the others.

Writing ImageUniqueID codes into the photos before they come off the SD card is possible but it is very slow and of course the originals would need to be overwritten.  My solution is to write the ImageUniqueID tags once the photos have been copied to my hard drive.

To make a link between a photos on the drive and a photo on the SD card, I am using a hash value instead of the ImageUniqueID.  It is a hundred times faster to read the exif tags from every photo on an SD card then calculate a unique hash for each and do the same for photos on the hard drive than it is to write ImageUniqueID tags to photos, then read them back into the application.  I am using a database backend so I only have to calculate hash values once as they are stored permanently.  The hash has to be the same for a photo on the SD card and on a freshly copied version of the same photo.  I have gone for the common tags plus just the date part of createdate.  Time is often adjusted to match my GPS time stamps so I can't use time, but the date very rarely changes.   

IMGP9998|2011:12:13 |K-x|18mm|0.0012|f8|400

I have trimmed off the file extensions and excess digits to make the hash shorter.  So far it seems to work very well, however if the photos are renamed, then I have lost the ability to regenerate the same hash so it is a fragile means of identification.  The other attributes are often identical across several photos and have much less fingerprinting power.  However, the hashes can be used to match photos that have not been renamed with photos that are still on the SD card.  I can then write the SD card's GUID into the record for the photo on the hard drive.  That pinpoints where the photo originally came from. 

The next step is to find photos that do not have ImageUniqueIDs and to use Exiftools to generate them as discussed above.  Exiftool reads around 100 photos a second but only seems to write about one photo per second so inserting data into photos is something to minimise when you are dealing with collections of tens of thousands of images.  On my system, exiftool read tags from 14 000 images and my application calculated hashes in just over four minutes. 

The code to detect new photos, create and store hashes then retrofit ImageUniqueIDs runs into several hundred or even a few thousand lines but it that from the point of import all my photos can be traced to their origin and later traced wherever they go.

Phil Harvey

1 per second seems really slow.  On my system I get about 20x this speed.

I get the same speed when I do a straight copy of the files.

So the speed of ExifTool when writing is limited only by the I/O bandwidth of my system.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Skippy

Confirming the above point, exiftool writes about 10 images per second on my laptop.  The bottleneck was in ms-access and was caused by using findfirst on a large recordset.  I switched to using SQL instead of findfirst and got a massive jump in performance.

wayn0i

Quote from: Phil Harvey on August 13, 2015, 09:09:19 PM
I have suggested something like this in the past:

exiftool FILE -imageuniqueid=`exiftool FILE -all= -o - | md5`

This will work on Mac/Linux to add an MD5 checksum that depends only on the image.  I'm not sure how to accomplish this in Windows.

- Phil


Hi Phil,

I have replaced FILE with FOLDER and tried to add individual md5's to imageuniqueid tag for each contained image. It seems to hash the entire folder and add the same hash to each image.

Can you assist?

Wayne


Phil Harvey

Hi Wayne,

Sorry for the delay in responding, I've been away on vacation.

The command I gave will work only for one file at a time.  You would need to write a script to automate this for a whole folder.  Either that, or create a CSV file containing the MD5 for all files in the folder then use the exiftool -csv option to read the values from this file.  See the -csv option documentation for details.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Hayo Baan

Quote from: wayn0i on September 09, 2018, 12:54:28 PM
Quote from: Phil Harvey on August 13, 2015, 09:09:19 PM
I have suggested something like this in the past:

exiftool FILE -imageuniqueid=`exiftool FILE -all= -o - | md5`

This will work on Mac/Linux to add an MD5 checksum that depends only on the image.  I'm not sure how to accomplish this in Windows.

- Phil


Hi Phil,

I have replaced FILE with FOLDER and tried to add individual md5's to imageuniqueid tag for each contained image. It seems to hash the entire folder and add the same hash to each image.

Can you assist?

Wayne

Assuming you're on Linux/Mac, to perform this on every file in a folder is simply a matter of a for loop and/or a smart find command:

If FOLDER contains all files you want to run the command on:
for f in FOLDER/*; do exiftool $f -imageuniqueid=`exiftool $f -all= -o - | md5`; done

If the folder (also) contains subfolders you can use find:
find FOLDER -type f -exec perl -e 'system(qq(exiftool $ARGV[0] -imageuniqueid=`exiftool $ARGV[0] -all= -o - | md5`));'  {} \;
Hayo Baan – Photography
Web: www.hayobaan.nl

BC

Quote from: Phil Harvey on August 13, 2015, 09:09:19 PM
exiftool FILE -imageuniqueid=`exiftool FILE -all= -o - | md5`

I'd like to accomplish this using the Image::ExifTool module.  I am guessing that I could set all of the metadata to an empty hash and write to a temp file, and then read in and hash the temp file to get the desired value.  But it would be far better to just hash the image data while I have the file loaded in memory.  All of the documentation is about dealing with tags (of course) but I can't figure out how to access just the binary image associated with the object.

Phil Harvey

You can write to memory using Image::ExifTool, then use Digest::MD5 to get the MD5 of the image in memory:

use Digest::MD5;
$exifTool->WriteInfo($file, \$buff);
my $md5 = Digest::MD5::md5($buff);


If you want to write the MD5 to the file, then read the file into memory first, then write it twice -- once to clear the metadata, and a second time back to disk with the MD5 embedded.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).