Expedite large file processing (.psb)

Started by sridhar, March 28, 2012, 03:18:27 PM

Previous topic - Next topic

sridhar

Hi,
I have been trying to write some IPTC/Xmp metadata values to a large image file of size > 3GB. It is .psb file.
Used the ExifToolWrapper C# class from this site in a console application.
It had taken apprx. 52 mins. to finish writing same.
Is there any way to expedite this processing. Reason I am asking is, if I have to use this tool where a user is waiting in real-time, like a web app.
Or is it that it has to be a asynchronous process.
I tried with option like "-fast", but did not notice any marked improvement in time.
Appreciate any inputs,
Thanks,
Sridhar

Phil Harvey

That's insanely slow.  I haven't tested PSB files for speed, but this is very surprising to me.

I'll run some tests myself when I get a chance to see if there is a bottleneck I don't know about.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Just as I suspected.  It isn't a problem for the Mac version. :(

I created a 4.1 GB PSB file (which took PS CS4 10 minutes and 25 seconds to save, btw), and edited it with exiftool on my iMac.

ExifTool took 1 minute and 27 seconds to rewrite the file.

So now we're just down to either a difference in the specific file you used, or, more likely, another ActivePerl inefficiency.  My guess is the latter.  We have seen similar problems in the past due to the inefficient memory handling of ActivePerl, which is used for the Windows version.

I'll try this on a Windows system when I get a chance, but it may be next week before I can do this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

sridhar

Thanks Phil.
I was running the tool on Windows 7.

Phil Harvey

I just looked at the PSD/PSB writing code.  It is particularly simple, and memory shouldn't be a problem unless the Photoshop IRB section is huge, which would be surprising (I would assume that the bulk of the file is image data).  But it would help if you could attach the output of exiftool -v2 for your file so I can check this.

So if memory isn't a problem, perhaps it is disk performance.  ExifTool copies the PSD/PSB image data in 64 kB chunks, which for a poorly cached disk may be inefficient.  Just for fun I tried increasing the chunk size to 16 MB on my Mac system, but it didn't change the speed.  However, the disk caching may be different on a Windows system.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

sridhar

Hi Phil, attached the output of 'exiftool -v2'.
As you already stated, noticed that during the process of writing, a temporary file (with suffix '_exiftool_temp' I think) got created which increased in size over time to the original file size, in this case, it grew in size slowly.   

Phil Harvey

Thanks.  In your image the Photoshop IRB information is only 3.6 MB, so this shouldn't be the problem.  That just leaves the copying of the image data in 64 kB chunks.  I will test this on a PC and try different chunk sizes to see if it makes a difference, but I'm not really hopeful that I'll be able to find the cause or a solution because I use the same 64 kB chunk size for copying the image data for all file types, and other people don't seem to be having similar problems with large images.  (Although images this large aren't very common.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I played around with this a bit on my Windows XP system, but I only have enough free disk space for one file this large, so I had to work from the original on a flash drive, which would definitely affect the times below, but I'm not sure how.  On the one hand, the flash drive is USB2 so it will be a lot slower than a hard disk, but on the other hand it may help reduce the seek time compared to when you read and write from the same disk.

Here are the times to rewrite my 4.1 GB test file (times in minutes:seconds):

1) 09:42 - exiftool running on Cygwin Perl 5.8.2

2) 09:39 - exiftool(-k).exe packaged version

3) 09:29 - exiftool running on ActivePerl 5.8.7

4) 09:22 - copying the file in Cygwin using the shell "cp" command

5) 07:54 - copying the file in Windows using the cmd.exe "copy" command

So in this configuration, ExifTool rewrites the file nearly as fast as it can be copied by Cygwin.  Windows can copy the a bit faster, but there still isn't a huge difference from the exiftool speeds.

I won't be able to test with Windows 7 until Monday, but it's not looking good right now.  At this point the only hope is that Windows gets very inefficient when reading and writing to the same disk.

You could help by running some tests yourself:

A) How long does it take your system to copy the file to the same disk?

B) How long does it take exiftool to rewrite the file to another disk?  (Use -o "D:\another directory" to write to a directory on another disk.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

My wife brought her Windows 7 laptop home from school, so I used it to run some tests, rewriting the 4.1 GB image to the same hard disk:

02:17 - rewrite using exiftool.exe version 8.84

02:26 - copy using Windows 7 cmd.exe "copy" command

I repeated this a couple of times, and got the same numbers.  Surprising, but ExifTool is a bit faster than the Windows copy on this system.

This was a 2.1 GHz Intel Core Duo with 4 GB of RAM.  How fast is your CPU and how much RAM do you have?

So the problem you are seeing is either due to your system (which I can't help), or due to the specific file you are using (but I don't think this is likely).

You were using ExifTool 8.75... There was a speed improvement in ExifTool 8.72, but I don't think there was anything since then that would affect the tests, but you might try the most recent version just in case.

At this point, I think I have to pass the ball back to you because I have done all I can and haven't been able to reproduce your performance problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

sridhar

Hi Phil, thank you for all the replies. you are amazing.

My system configuration is almost same as what you tested on. (4GB, Intel core 2 CPU 2.53 GHz)
It may be this specific file or number of processes/memory at the time of my test. Also I forgot to mention initially that, the file is on a network share.
As you pointed out, the file copy/paste in windows itself is taking a long time. Not exactly took note of the time but it is 10+ mins.

In any case, to take care of these one off cases too, we are looking at making this process asynchronous.

Thanks,
Sridhar