Recursive Searching & DB

Started by scrivocmdivo, September 03, 2011, 11:59:29 AM

Previous topic - Next topic

scrivocmdivo

Hey folks.

Just wondering if anybody has already done any work around adapting ExifTool to recursive search through subdirectories and populate the results into a DB? I'm looking to implement this for serial numbers (I believe contained in the MakerNotes) and the GPS coordinates and I'm just wondering if somebody has already looked at this and can offer some advice?

Thanks

Phil Harvey

#1
This is FAQ number 12.

- Phil

Edit: Fixed bad link
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

scrivocmdivo

Phil,

Many thanks for the reply. The cmd I'm currently running is:-

exiftool -csv -a -f -m -r C:\Location > output.csv

This appears to work fine when testing a few images.

The real implementation I'd like for this is to (read-only) process hundreds of millions of images and populate a CSV file with the EXIF / Makernote info which I can then transfer into a DB (Thanks for the pointer to the FAQ by the way). These hundreds of millions of images will range from format to format, maker, model etc... Ideally I'm looking to capture absolutely everything stored in every single image - makes, models, serial numbers, locations, taken dates, focus etc etc...... Is there any improvements required for the above code then? I am conscious of the vast amount of images being processed and so I'd like to make sure my code is correct to save rerunning it :). Time is not really an issue - more ensuring everything is captured is the priority. Similarly, is there a way to ensure my cmd runs as read-only?

Thanks again

Phil Harvey

The -csv option may be a problem for such a large number of images since (due to the organization of the .csv file) exiftool must buffer all information in memory before it starts to write the output file. A few thousand images should be fine, but hundreds of millions would likely run you out of memory.

Unless you can run this in batches, I suggest using another format for the database export. JSON (-j) is probably the best choice here, but it would only work if your database software can import JSON format.

Unless you assign a tag value, exiftool will not modify any file.  You can assign a tag value with =, > or < in an argument.  The only other options that assign tag values are -tagsfromfile and -geotag.  But even if you do write a file accidentally, the originals are always preserved unless you also specify -overwrite_original (which would be hard to do by accident).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

scrivocmdivo

Phil,

Many thanks for the quick reply yet again. I'm aware with the CSV format that, as it comes across new tags / headers as it processes the images, it appears to append the new column header / tag to the header row. Is this the same with JSON export? I've never used JSON and so I'm just wondering how this works behind the scenes so that I can address this during the importing process into my DB.

Similarly Phil, what are your thoughts on me using the "stay_open" command and the "-fast" commands for my requirements? I just hope processing millions of images won't kill my CPUs :)

Thanks again

Phil Harvey

Quote from: scrivocmdivo on September 10, 2011, 10:00:20 AM
I'm aware with the CSV format that, as it comes across new tags / headers as it processes the images, it appears to append the new column header / tag to the header row. Is this the same with JSON export?

No.  The JSON format isn't row/column, so it doesn't need to do this.

Quotewhat are your thoughts on me using the "stay_open" command and the "-fast" commands for my requirements?

Sounds like a good idea unless your images contain AFCP information or something else you might want to export from a JPEG trailer.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).