Execution is 5x slower on DNG vs original raw file

Started by horshack, April 16, 2021, 01:58:07 PM

Previous topic - Next topic

horshack

If I run exiftool against a directory of 256 Panasonic GX85 .RW2 raw files (piping stdout to a file), it takes approximately 4 seconds to complete. However, if I convert those .RW2 files into uncompressed DNGs using Adobe's DNG converter and run exiftool against the directory of 256 DNGs it takes 5x longer to complete, around 20 seconds. The only command-line option I'm using is -s

If I compare the output of a single .RW2 and .DNG I see the DNG has more tags (330 vs 256), but the execution time differential seems outsized vs that tag count difference.

If it's helpful to evaluate, here's the output from a single .RW2 and .DNG

.RW2 output
.DNG output


Phil Harvey

All I can say is: Interesting.

Without a lot of digging, I can't say why the DNG is so much slower, but it is unlikely that it is anything that can be improved easily.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

horshack

Thanks for quick response Phil. I'm using this exiftool invocation to potentially stack thousands of files (My GitHub project), so I'll do what I can to get to the bottom of it. You've forgotten more Perl than I've ever known in the first place so it'll probably be a clumsy effort on my part :)

I did a search on how to profile a perl script, which yielded the following initial profile comparison. I switched to my Linux VM to do this. The DNGs run a little faster - 16.78 seconds vs ~20 seconds on Windows.

Running against 256 .RW2s:

user@ubuntu:~/exiftool$ perl -d:DProf exiftool -s '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/' > t.txt
user@ubuntu:~/exiftool$ dprofpp
Total Elapsed Time =     4.12 Seconds
  User+System Time =     3.59 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
19.2   0.690  0.740  92197   0.0000 0.0000  Image::ExifTool::GetValue
13.6   0.490  0.490  81664   0.0000 0.0000  Image::ExifTool::FoundTag
11.7   0.420  2.550   2048   0.0002 0.0012  Image::ExifTool::Exif::ProcessExif
11.4   0.410  0.480 132096   0.0000 0.0000  Image::ExifTool::GetTagInfo
7.24   0.260  0.390 310272   0.0000 0.0000  Image::ExifTool::Get16u
6.13   0.220  0.220 519168   0.0000 0.0000  Image::ExifTool::DoUnpackStd
4.74   0.170  0.320  73728   0.0000 0.0000  Image::ExifTool::ReadValue
3.90   0.140  0.140  14592   0.0000 0.0000  File::RandomAccess::Read
3.90   0.140  0.220 189184   0.0000 0.0000  Image::ExifTool::Get32u
3.06   0.110  0.110  65536   0.0000 0.0000  Image::ExifTool::GetTagName
2.79   0.100  4.070    256   0.0004 0.0159  main::GetImageInfo
2.23   0.080  0.270    256   0.0003 0.0011  Image::ExifTool::BuildCompositeTags
1.95   0.070  0.070 132947   0.0000 0.0000  Image::ExifTool::GetTagInfoList
1.95   0.070  0.150    256   0.0003 0.0006  Image::ExifTool::MakerNotes::GetValueBlocks
1.67   0.060  0.060    256   0.0002 0.0002  Image::ExifTool::Init


Running against 256 DNGs:
user@ubuntu:~/exiftool$ perl -d:DProf exiftool -s '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/dng/' > t.txt
user@ubuntu:~/exiftool$ dprofpp
Total Elapsed Time =    16.78 Seconds
  User+System Time =    16.94 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
36.1   6.130 16.170 112640   0.0001 0.0001  Image::ExifTool::ReadValue
30.1   5.100  9.690 768614   0.0000 0.0000  Image::ExifTool::GetFloat
27.2   4.620  4.620 771225   0.0000 0.0000  Image::ExifTool::DoUnpack
5.19   0.880  0.900 125440   0.0000 0.0000  Image::ExifTool::FoundTag
4.96   0.840 20.090   3072   0.0003 0.0065  Image::ExifTool::Exif::ProcessExif
3.84   0.650  0.770 111904   0.0000 0.0000  Image::ExifTool::GetValue
3.25   0.550  0.610 161792   0.0000 0.0000  Image::ExifTool::GetTagInfo
2.30   0.390  0.390 641792   0.0000 0.0000  Image::ExifTool::DoUnpackStd
1.36   0.230  0.470 352000   0.0000 0.0000  Image::ExifTool::Get16u
1.24   0.210  0.210  30976   0.0000 0.0000  File::RandomAccess::Read
1.06   0.180  0.310 263936   0.0000 0.0000  Image::ExifTool::Get32u
0.83   0.140 21.910    256   0.0005 0.0856  main::GetImageInfo
0.83   0.140  1.970    256   0.0005 0.0077  Image::ExifTool::PanasonicRaw::ProcessJpgFromRaw
0.83   0.140  0.510    256   0.0005 0.0020  Image::ExifTool::BuildCompositeTags
0.65   0.110  0.110  67840   0.0000 0.0000  Image::ExifTool::SetGroup

StarGeek

The fact that the DNG is nearly twice the file size of the RW2 would play a factor. 
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

horshack

Quote from: StarGeek on April 16, 2021, 04:17:23 PM
The fact that the DNG is nearly twice the file size of the RW2 would play a factor.

I profiled the I/O calls from exiftool under Windows using Sysinternal's Process Monitor and didn't see a material difference in I/O between the RW2s and DNGs. In both invocations the entire fileset was cached in filesystem memory.

horshack

I believe I'm getting closer to the issue. I'm not familiar with exiftool's internals (or Perl for that matter) so bear with me. I switched to using NYTProf to profile and used it to compare the DNG vs .RW2 invocations for a single file. Immediately I see that for the DNG there are 7,686,144 calls to GetFloat() vs zero calls for the .RW2. Here's the NYTProf info:



The total time spent on these GetFloat calls covers the majority of the long execution time. I put a Devel::StackTrace call in GetFloat() and see a looping call for up to hundreds of thousands of values then the index resets back to zero and see the loop process repeat again. Sample stack trace:


Trace begun at lib/Image/ExifTool.pm line 5117
Image::ExifTool::GetFloat('SCALAR(0x55b89b4d64c8)', 0) called at lib/Image/ExifTool.pm line 5358
Image::ExifTool::ReadValue('SCALAR(0x55b89b4d64c8)', 0, 'float', 8100, 32400, 'SCALAR(0x55b89b0fdc00)') called at lib/Image/ExifTool/Exif.pm line 6181
Image::ExifTool::Exif::ProcessExif('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7745
Image::ExifTool::ProcessDirectory('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7405
Image::ExifTool::DoProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)', undef) called at lib/Image/ExifTool.pm line 7207
Image::ExifTool::ProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)') called at lib/Image/ExifTool.pm line 2640
Image::ExifTool::ExtractInfo('Image::ExifTool=HASH(0x55b89a651ac0)', undef) called at lib/Image/ExifTool.pm line 2038
Image::ExifTool::ImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng', 'ARRAY(0x55b89a6480b8)') called at exiftool line 2111
main::GetImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng') called at exiftool line 3628
main::ProcessFiles('Image::ExifTool=HASH(0x55b89a651ac0)') called at exiftool line 1811
Trace begun at lib/Image/ExifTool.pm line 5117
Image::ExifTool::GetFloat('SCALAR(0x55b89b4d64c8)', 4) called at lib/Image/ExifTool.pm line 5358
Image::ExifTool::ReadValue('SCALAR(0x55b89b4d64c8)', 0, 'float', 8100, 32400, 'SCALAR(0x55b89b0fdc00)') called at lib/Image/ExifTool/Exif.pm line 6181
Image::ExifTool::Exif::ProcessExif('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7745
Image::ExifTool::ProcessDirectory('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7405
Image::ExifTool::DoProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)', undef) called at lib/Image/ExifTool.pm line 7207
Image::ExifTool::ProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)') called at lib/Image/ExifTool.pm line 2640
Image::ExifTool::ExtractInfo('Image::ExifTool=HASH(0x55b89a651ac0)', undef) called at lib/Image/ExifTool.pm line 2038
Image::ExifTool::ImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng', 'ARRAY(0x55b89a6480b8)') called at exiftool line 2111
main::GetImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng') called at exiftool line 3628
main::ProcessFiles('Image::ExifTool=HASH(0x55b89a651ac0)') called at exiftool line 1811
Trace begun at lib/Image/ExifTool.pm line 5117
Image::ExifTool::GetFloat('SCALAR(0x55b89b4d64c8)', 8) called at lib/Image/ExifTool.pm line 5358
Image::ExifTool::ReadValue('SCALAR(0x55b89b4d64c8)', 0, 'float', 8100, 32400, 'SCALAR(0x55b89b0fdc00)') called at lib/Image/ExifTool/Exif.pm line 6181
Image::ExifTool::Exif::ProcessExif('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7745
Image::ExifTool::ProcessDirectory('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7405
Image::ExifTool::DoProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)', undef) called at lib/Image/ExifTool.pm line 7207
Image::ExifTool::ProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)') called at lib/Image/ExifTool.pm line 2640
Image::ExifTool::ExtractInfo('Image::ExifTool=HASH(0x55b89a651ac0)', undef) called at lib/Image/ExifTool.pm line 2038
Image::ExifTool::ImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng', 'ARRAY(0x55b89a6480b8)') called at exiftool line 2111
main::GetImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng') called at exiftool line 3628
main::ProcessFiles('Image::ExifTool=HASH(0x55b89a651ac0)') called at exiftool line 1811


This looping call is from the following section in ReadValue():
    } else {
        for (;;) {
            push @vals, &$proc($dataPt, $offset);
            last if --$count <= 0;
            $offset += $len;
        }
    }   


This logic is to process the TIFF directories (DNGs are structured as TIFFs). I assume the fact it's performing 7,686,144 calls means it's misinterpreting something as a directory. Will dig deeper...

Phil Harvey

This could be a simple fix then.  Probably some tag with a large value is in float format, and ExifTool is converting it.  The -v3 option would show that.  If you find it, show me the excerpt from the -v3 output.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

horshack

Quote from: Phil Harvey on April 16, 2021, 06:27:44 PM
This could be a simple fix then.  Probably some tag with a large value is in float format, and ExifTool is converting it.  The -v3 option would show that.  If you find it, show me the excerpt from the -v3 output.

- Phil

Ok, I think you nailed it. When I run with -v3 I get the following float tags:


  | 40) ProfileHueSatMapData1 = 2 1 1 2 1.94578909873962 1.01970446109772 2 1.908898591[snip]
  |     - Tag 0xc6fa (32400 bytes, float[8100]):
  |        4798e: 00 00 00 40 00 00 80 3f 00 00 80 3f 00 00 00 40 [...@...?...?...@]
  |        4799e: 9e 0f f9 3f ad 85 82 3f 00 00 00 40 ca 56 f4 3f [...?...?...@.V.?]
  |        479ae: 5a 0b 85 3f 00 00 00 40 10 cb ef 3f 07 91 87 3f [Z..?...@...?...?]
  |        479be: 00 00 00 40 d8 69 eb 3f b3 16 8a 3f 00 00 00 40 [...@.i.?...?...@]
  |        479ce: d9 30 e7 3f 60 9c 8c 3f 00 00 00 40 f5 1d e3 3f [.0.?`..?...@...?]
  |         [snip 32320 bytes]
  | 41) ProfileHueSatMapData2 = 2 1 1 2 1.84525382518768 1.01866734027863 2 1.812045216[snip]
  |     - Tag 0xc6fb (32400 bytes, float[8100]):
  |        4f81e: 00 00 00 40 00 00 80 3f 00 00 80 3f 00 00 00 40 [...@...?...?...@]
  |        4f82e: 47 31 ec 3f b1 63 82 3f 00 00 00 40 19 f1 e7 3f [G1.?.c.?...@...?]
  |        4f83e: 62 c7 84 3f 00 00 00 40 7e d7 e3 3f 14 2b 87 3f [b..?...@~..?.+.?]
  |        4f84e: 00 00 00 40 56 e2 df 3f c5 8e 89 3f 00 00 00 40 [...@V..?...?...@]
  |        4f85e: c9 0f dc 3f 76 f2 8b 3f 00 00 00 40 12 5e d8 3f [...?v..?...@.^.?]
  |         [snip 32320 bytes]
  | 52) ProfileLookTableData = 0 1 1 0 0.878615736961365 1 0 0.885046541690826 1 0 0.89[snip]
  |     - Tag 0xc726 (55296 bytes, float[13824]):
  |        577b2: 00 00 00 00 00 00 80 3f 00 00 80 3f 00 00 00 00 [.......?...?....]
  |        577c2: f6 ec 60 3f 00 00 80 3f 00 00 00 00 69 92 62 3f [..`?...?....i.b?]
  |        577d2: 00 00 80 3f 00 00 00 00 26 81 64 3f 00 00 80 3f [...?....&.d?...?]
  |        577e2: 00 00 00 00 55 d4 66 3f 00 00 80 3f 00 00 00 00 [....U.f?...?....]
  |        577f2: 5d 9d 69 3f 00 00 80 3f 00 00 00 00 93 b2 6e 3f [].i?...?......n?]
  |         [snip 55216 bytes]


That's 8100 + 8100 + 13824 floats per file (30,024 total), times 256 files = 7,686,144, matching the profile output. If I remove these three tags then the execution time drops to the level of the .RW2 files.

Phil, is there any existing command-line option to have exiftool bypass decoding of certain tags? I see the -fast options but those seems specific to some predefined tags.

Phil Harvey

I wonder if decoding this would ever be useful to anyone.  I could maybe just make these undecoded binary blobs to avoid this delay.  I'll think about this and run some tests when I get a chance.

Thanks for doing the ground work for me.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

horshack

Quote from: Phil Harvey on April 16, 2021, 09:18:19 PM
I wonder if decoding this would ever be useful to anyone.  I could maybe just make these undecoded binary blobs to avoid this delay.  I'll think about this and run some tests when I get a chance.

Thanks for doing the ground work for me.

- Phil

Thank you Phil for exiftool!

Phil Harvey

#10
ExifTool 12.25 is available, and should speed things up for you.  This new version only decodes these large arrays unless if the -m option is used.

- Phil

Edit: Fixed typo
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

horshack

Quote from: Phil Harvey on April 22, 2021, 03:02:09 PM
ExifTool 12.25 is available, and should speed things up for you.  This new version only decodes these large arrays unless the -m option is used.

- Phil

Thanks Phil! I just tried 12.25 and the processing time for 256 DNGs dropped from 19.2 seconds to 5.4 seconds, with all files preloaded into the filesystem cache before running both.  :)

Phil Harvey

OK, so 3.5x faster.  Not the 5x we hoped for, but not bad.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).