Connecting directly to cloud services from ExifTool

Started by torbenwg, April 15, 2019, 07:45:10 AM

Previous topic - Next topic

torbenwg

Hi,

I'm looking to offload & archive my GoPro footage in the cloud, and I'm currently trying out BackBlaze B2.
I would like to be able to scan the files in the cloud with ExifTool, but obviously without downloading the whole archive.

It seems that ExifTool doesn't read the whole video file but uses seek() to explore and read() specific parts.
I'm exploring if to would be possible to only download the parts of a file that ExifTool requires.
Would it be possible to extend ExifTool to do this?

I found the RandomAccess.pm module, put in some logging statements, and ran it on a local GoPro file.
This gave me the attached output, starting like this:

SeekTest:passed
Reading 1024 bytes
File   seeking 0 bytes whence:0
Reading 8 bytes
Reading 12 bytes
File   seeking -12 bytes whence:1
Reading 12 bytes
....

If this is all that it ExifTool does when reading files this would be something I would feel comfortable trying to extend to do seeks in the cloud instead and only download the relevant parts.

1. Is this the only file access that ExifTool does?
2. Does this seem like a feasible strategy or are there many more circumstances I should be aware of?
3. There are some calls in between that use the RandomAccess buffer, instead of directly seek()-ing the file. Any hints what that could be?
   (E.g. Buffer seeking 22 bytes whence:1, newPos:30)

Thanks,

     Torben

Phil Harvey

It would be possible to create a new class based on the File::RandomAccess that reads from the cloud.  This should work.  ExifTool sometimes reads from the file into a local buffer, then accesses this buffer with multiple reads, but this should work well from the cloud too.  The real problem is that cloud-based access will be dependent on the service you are using, and will likely require authentication, so this may not be something that we would want to add to the production version of exiftool.

- Phil

Edit: There are a couple of places in the code where the internals of the RandomAccess object are revealed.  I'll see what I can do to remove these to make the an enhancement like this simpler.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

torbenwg

This is good news, thanks for the response.
I agree that the configuration and authentication details of every cloud service setup is probably best kept separately from ExifTool.
Perhaps some sort of module or command-line utility could be created.

I'll take a closer look and see if I can make something. If you have any pointers to item 3. about the buffer seeking I observe, I'd be grateful, I don't quite understand why I get logging from that part of the code.

- Torben

Phil Harvey

Quote from: Phil Harvey on April 15, 2019, 08:00:02 AM
ExifTool sometimes reads from the file into a local buffer, then accesses this buffer with multiple reads
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).