News:

2023-03-15 Major improvements to the new Geolocation feature

Main Menu

Very slow reading of files with crs raw metadata

Started by TSM, May 30, 2013, 06:28:37 AM

Previous topic - Next topic

TSM

We finding that it is very slow to read some files we get though which have CRS RAW metadata in the XMP section. Below is an example file but we had a folder with 30 of these and it was taking about 0.5-1s per file to read vs 0.1-0.2s normally.
We use batching to speed up read times on bulk reads but this does not have any real added benefit to these pictures.
In the end i just stripped the XMP section then the files were read super fast.

Version: 9.02 (production) but also tested on latest 9.30
OS: Centos 6.3
CMD: exiftool -use MWG -fast2 -q -g -j

Source: https://docs.google.com/file/d/0B7Gftc42CL6WUTFiUzFicU0wZlk/edit?usp=sharing
Output: attached


Phil Harvey

It certainly looks like the bulk of the processing time is spent parsing XMP.  I get this:

> time exiftool allpix_0019148_0001.jpg -use MWG -fast2 -q -g -j > t1
0.417u 0.011s 0:00.43 97.6% 0+0k 0+3io 0pf+0w
> exiftool allpix_0019148_0001.jpg -xmp -b -a > out.xmp
> exiftool allpix_0019148_0001.jpg -xmp:all=
    1 image files updated
> time exiftool allpix_0019148_0001.jpg -use MWG -fast2 -q -g -j > t1
0.136u 0.009s 0:00.14 92.8% 0+0k 0+4io 0pf+0w
> time exiftool out.xmp -use MWG -fast2 -q -g -j > t1
0.370u 0.008s 0:00.38 97.3% 0+0k 0+0io 0pf+0w


So that's 0.43 seconds parsing the original file, 0.14 seconds without XMP, and 0.38 seconds parsing the (110 kB of) XMP alone.  (And it looks like about 0.09 seconds overhead just to load ExifTool and its XMP library.)

Parsing string-based data is time consuming because the entire string must be scanned for matching patterns, and Perl isn't the fastest of languages.  I have always strongly disagreed with Adobe's strategy of mixing image editing data with the metadata in XMP, and this is one reason why.  I don't know if there is much I can do about this.  I have complained to Adobe, but that didn't help.

You can save a bit of time (0.04 seconds) by not outputting the XMP-crs, but the effect isn't large since this doesn't stop ExifTool from parsing it:

> time exiftool ../testpics/xmp/allpix_0019148_0001.jpg -use MWG -fast2 -q -g -j --xmp-crs:all > t1
0.378u 0.009s 0:00.39 94.8% 0+0k 0+0io 0pf+0w


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

TSM

Hmmmm, annoying.

Thanks anyway for a brilliant program we use it to process about 10k images a day, mostly though a custom PHP Class (ZF1) with your JSON output in batches to get around loading times.