If I run exiftool against a directory of 256 Panasonic GX85 .RW2 raw files (piping stdout to a file), it takes approximately 4 seconds to complete. However, if I convert those .RW2 files into uncompressed DNGs using Adobe's DNG converter and run exiftool against the directory of 256 DNGs it takes 5x longer to complete, around 20 seconds. The only command-line option I'm using is -s
If I compare the output of a single .RW2 and .DNG I see the DNG has more tags (330 vs 256), but the execution time differential seems outsized vs that tag count difference.
If it's helpful to evaluate, here's the output from a single .RW2 and .DNG
.RW2 output (https://pastebin.com/raw/6GCKXNs2)
.DNG output (https://pastebin.com/raw/9p1Nxtbc)
All I can say is: Interesting.
Without a lot of digging, I can't say why the DNG is so much slower, but it is unlikely that it is anything that can be improved easily.
- Phil
Thanks for quick response Phil. I'm using this exiftool invocation to potentially stack thousands of files (My GitHub project (https://github.com/horshack-dpreview/OctaveRawTools)), so I'll do what I can to get to the bottom of it. You've forgotten more Perl than I've ever known in the first place so it'll probably be a clumsy effort on my part :)
I did a search on how to profile a perl script, which yielded the following initial profile comparison. I switched to my Linux VM to do this. The DNGs run a little faster - 16.78 seconds vs ~20 seconds on Windows.
Running against 256 .RW2s:
user@ubuntu:~/exiftool$ perl -d:DProf exiftool -s '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/' > t.txt
user@ubuntu:~/exiftool$ dprofpp
Total Elapsed Time = 4.12 Seconds
User+System Time = 3.59 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
19.2 0.690 0.740 92197 0.0000 0.0000 Image::ExifTool::GetValue
13.6 0.490 0.490 81664 0.0000 0.0000 Image::ExifTool::FoundTag
11.7 0.420 2.550 2048 0.0002 0.0012 Image::ExifTool::Exif::ProcessExif
11.4 0.410 0.480 132096 0.0000 0.0000 Image::ExifTool::GetTagInfo
7.24 0.260 0.390 310272 0.0000 0.0000 Image::ExifTool::Get16u
6.13 0.220 0.220 519168 0.0000 0.0000 Image::ExifTool::DoUnpackStd
4.74 0.170 0.320 73728 0.0000 0.0000 Image::ExifTool::ReadValue
3.90 0.140 0.140 14592 0.0000 0.0000 File::RandomAccess::Read
3.90 0.140 0.220 189184 0.0000 0.0000 Image::ExifTool::Get32u
3.06 0.110 0.110 65536 0.0000 0.0000 Image::ExifTool::GetTagName
2.79 0.100 4.070 256 0.0004 0.0159 main::GetImageInfo
2.23 0.080 0.270 256 0.0003 0.0011 Image::ExifTool::BuildCompositeTags
1.95 0.070 0.070 132947 0.0000 0.0000 Image::ExifTool::GetTagInfoList
1.95 0.070 0.150 256 0.0003 0.0006 Image::ExifTool::MakerNotes::GetValueBlocks
1.67 0.060 0.060 256 0.0002 0.0002 Image::ExifTool::Init
Running against 256 DNGs:
user@ubuntu:~/exiftool$ perl -d:DProf exiftool -s '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/dng/' > t.txt
user@ubuntu:~/exiftool$ dprofpp
Total Elapsed Time = 16.78 Seconds
User+System Time = 16.94 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
36.1 6.130 16.170 112640 0.0001 0.0001 Image::ExifTool::ReadValue
30.1 5.100 9.690 768614 0.0000 0.0000 Image::ExifTool::GetFloat
27.2 4.620 4.620 771225 0.0000 0.0000 Image::ExifTool::DoUnpack
5.19 0.880 0.900 125440 0.0000 0.0000 Image::ExifTool::FoundTag
4.96 0.840 20.090 3072 0.0003 0.0065 Image::ExifTool::Exif::ProcessExif
3.84 0.650 0.770 111904 0.0000 0.0000 Image::ExifTool::GetValue
3.25 0.550 0.610 161792 0.0000 0.0000 Image::ExifTool::GetTagInfo
2.30 0.390 0.390 641792 0.0000 0.0000 Image::ExifTool::DoUnpackStd
1.36 0.230 0.470 352000 0.0000 0.0000 Image::ExifTool::Get16u
1.24 0.210 0.210 30976 0.0000 0.0000 File::RandomAccess::Read
1.06 0.180 0.310 263936 0.0000 0.0000 Image::ExifTool::Get32u
0.83 0.140 21.910 256 0.0005 0.0856 main::GetImageInfo
0.83 0.140 1.970 256 0.0005 0.0077 Image::ExifTool::PanasonicRaw::ProcessJpgFromRaw
0.83 0.140 0.510 256 0.0005 0.0020 Image::ExifTool::BuildCompositeTags
0.65 0.110 0.110 67840 0.0000 0.0000 Image::ExifTool::SetGroup
The fact that the DNG is nearly twice the file size of the RW2 would play a factor.
Quote from: StarGeek on April 16, 2021, 04:17:23 PM
The fact that the DNG is nearly twice the file size of the RW2 would play a factor.
I profiled the I/O calls from exiftool under Windows using Sysinternal's Process Monitor and didn't see a material difference in I/O between the RW2s and DNGs. In both invocations the entire fileset was cached in filesystem memory.
I believe I'm getting closer to the issue. I'm not familiar with exiftool's internals (or Perl for that matter) so bear with me. I switched to using NYTProf to profile and used it to compare the DNG vs .RW2 invocations for a single file. Immediately I see that for the DNG there are 7,686,144 calls to GetFloat() vs zero calls for the .RW2. Here's the NYTProf info:
(https://photos.smugmug.com/photos/i-gVg29m2/0/5493835a/O/i-gVg29m2.png)
The total time spent on these GetFloat calls covers the majority of the long execution time. I put a Devel::StackTrace call in GetFloat() and see a looping call for up to hundreds of thousands of values then the index resets back to zero and see the loop process repeat again. Sample stack trace:
Trace begun at lib/Image/ExifTool.pm line 5117
Image::ExifTool::GetFloat('SCALAR(0x55b89b4d64c8)', 0) called at lib/Image/ExifTool.pm line 5358
Image::ExifTool::ReadValue('SCALAR(0x55b89b4d64c8)', 0, 'float', 8100, 32400, 'SCALAR(0x55b89b0fdc00)') called at lib/Image/ExifTool/Exif.pm line 6181
Image::ExifTool::Exif::ProcessExif('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7745
Image::ExifTool::ProcessDirectory('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7405
Image::ExifTool::DoProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)', undef) called at lib/Image/ExifTool.pm line 7207
Image::ExifTool::ProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)') called at lib/Image/ExifTool.pm line 2640
Image::ExifTool::ExtractInfo('Image::ExifTool=HASH(0x55b89a651ac0)', undef) called at lib/Image/ExifTool.pm line 2038
Image::ExifTool::ImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng', 'ARRAY(0x55b89a6480b8)') called at exiftool line 2111
main::GetImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng') called at exiftool line 3628
main::ProcessFiles('Image::ExifTool=HASH(0x55b89a651ac0)') called at exiftool line 1811
Trace begun at lib/Image/ExifTool.pm line 5117
Image::ExifTool::GetFloat('SCALAR(0x55b89b4d64c8)', 4) called at lib/Image/ExifTool.pm line 5358
Image::ExifTool::ReadValue('SCALAR(0x55b89b4d64c8)', 0, 'float', 8100, 32400, 'SCALAR(0x55b89b0fdc00)') called at lib/Image/ExifTool/Exif.pm line 6181
Image::ExifTool::Exif::ProcessExif('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7745
Image::ExifTool::ProcessDirectory('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7405
Image::ExifTool::DoProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)', undef) called at lib/Image/ExifTool.pm line 7207
Image::ExifTool::ProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)') called at lib/Image/ExifTool.pm line 2640
Image::ExifTool::ExtractInfo('Image::ExifTool=HASH(0x55b89a651ac0)', undef) called at lib/Image/ExifTool.pm line 2038
Image::ExifTool::ImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng', 'ARRAY(0x55b89a6480b8)') called at exiftool line 2111
main::GetImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng') called at exiftool line 3628
main::ProcessFiles('Image::ExifTool=HASH(0x55b89a651ac0)') called at exiftool line 1811
Trace begun at lib/Image/ExifTool.pm line 5117
Image::ExifTool::GetFloat('SCALAR(0x55b89b4d64c8)', 8) called at lib/Image/ExifTool.pm line 5358
Image::ExifTool::ReadValue('SCALAR(0x55b89b4d64c8)', 0, 'float', 8100, 32400, 'SCALAR(0x55b89b0fdc00)') called at lib/Image/ExifTool/Exif.pm line 6181
Image::ExifTool::Exif::ProcessExif('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7745
Image::ExifTool::ProcessDirectory('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89ab46788)', 'HASH(0x55b89a642430)') called at lib/Image/ExifTool.pm line 7405
Image::ExifTool::DoProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)', undef) called at lib/Image/ExifTool.pm line 7207
Image::ExifTool::ProcessTIFF('Image::ExifTool=HASH(0x55b89a651ac0)', 'HASH(0x55b89a72f380)') called at lib/Image/ExifTool.pm line 2640
Image::ExifTool::ExtractInfo('Image::ExifTool=HASH(0x55b89a651ac0)', undef) called at lib/Image/ExifTool.pm line 2038
Image::ExifTool::ImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng', 'ARRAY(0x55b89a6480b8)') called at exiftool line 2111
main::GetImageInfo('Image::ExifTool=HASH(0x55b89a651ac0)', '/mnt/hgfs/vmxferhd/downloads/dngrawload/stack/test/256/png/P1040064.dng') called at exiftool line 3628
main::ProcessFiles('Image::ExifTool=HASH(0x55b89a651ac0)') called at exiftool line 1811
This looping call is from the following section in ReadValue():
} else {
for (;;) {
push @vals, &$proc($dataPt, $offset);
last if --$count <= 0;
$offset += $len;
}
}
This logic is to process the TIFF directories (DNGs are structured as TIFFs). I assume the fact it's performing 7,686,144 calls means it's misinterpreting something as a directory. Will dig deeper...
This could be a simple fix then. Probably some tag with a large value is in float format, and ExifTool is converting it. The -v3 option would show that. If you find it, show me the excerpt from the -v3 output.
- Phil
Quote from: Phil Harvey on April 16, 2021, 06:27:44 PM
This could be a simple fix then. Probably some tag with a large value is in float format, and ExifTool is converting it. The -v3 option would show that. If you find it, show me the excerpt from the -v3 output.
- Phil
Ok, I think you nailed it. When I run with -v3 I get the following float tags:
| 40) ProfileHueSatMapData1 = 2 1 1 2 1.94578909873962 1.01970446109772 2 1.908898591[snip]
| - Tag 0xc6fa (32400 bytes, float[8100]):
| 4798e: 00 00 00 40 00 00 80 3f 00 00 80 3f 00 00 00 40 [...@...?...?...@]
| 4799e: 9e 0f f9 3f ad 85 82 3f 00 00 00 40 ca 56 f4 3f [...?...?...@.V.?]
| 479ae: 5a 0b 85 3f 00 00 00 40 10 cb ef 3f 07 91 87 3f [Z..?...@...?...?]
| 479be: 00 00 00 40 d8 69 eb 3f b3 16 8a 3f 00 00 00 40 [...@.i.?...?...@]
| 479ce: d9 30 e7 3f 60 9c 8c 3f 00 00 00 40 f5 1d e3 3f [.0.?`..?...@...?]
| [snip 32320 bytes]
| 41) ProfileHueSatMapData2 = 2 1 1 2 1.84525382518768 1.01866734027863 2 1.812045216[snip]
| - Tag 0xc6fb (32400 bytes, float[8100]):
| 4f81e: 00 00 00 40 00 00 80 3f 00 00 80 3f 00 00 00 40 [...@...?...?...@]
| 4f82e: 47 31 ec 3f b1 63 82 3f 00 00 00 40 19 f1 e7 3f [G1.?.c.?...@...?]
| 4f83e: 62 c7 84 3f 00 00 00 40 7e d7 e3 3f 14 2b 87 3f [b..?...@~..?.+.?]
| 4f84e: 00 00 00 40 56 e2 df 3f c5 8e 89 3f 00 00 00 40 [...@V..?...?...@]
| 4f85e: c9 0f dc 3f 76 f2 8b 3f 00 00 00 40 12 5e d8 3f [...?v..?...@.^.?]
| [snip 32320 bytes]
| 52) ProfileLookTableData = 0 1 1 0 0.878615736961365 1 0 0.885046541690826 1 0 0.89[snip]
| - Tag 0xc726 (55296 bytes, float[13824]):
| 577b2: 00 00 00 00 00 00 80 3f 00 00 80 3f 00 00 00 00 [.......?...?....]
| 577c2: f6 ec 60 3f 00 00 80 3f 00 00 00 00 69 92 62 3f [..`?...?....i.b?]
| 577d2: 00 00 80 3f 00 00 00 00 26 81 64 3f 00 00 80 3f [...?....&.d?...?]
| 577e2: 00 00 00 00 55 d4 66 3f 00 00 80 3f 00 00 00 00 [....U.f?...?....]
| 577f2: 5d 9d 69 3f 00 00 80 3f 00 00 00 00 93 b2 6e 3f [].i?...?......n?]
| [snip 55216 bytes]
That's 8100 + 8100 + 13824 floats per file (30,024 total), times 256 files = 7,686,144, matching the profile output. If I remove these three tags then the execution time drops to the level of the .RW2 files.
Phil, is there any existing command-line option to have exiftool bypass decoding of certain tags? I see the -fast options but those seems specific to some predefined tags.
I wonder if decoding this would ever be useful to anyone. I could maybe just make these undecoded binary blobs to avoid this delay. I'll think about this and run some tests when I get a chance.
Thanks for doing the ground work for me.
- Phil
Quote from: Phil Harvey on April 16, 2021, 09:18:19 PM
I wonder if decoding this would ever be useful to anyone. I could maybe just make these undecoded binary blobs to avoid this delay. I'll think about this and run some tests when I get a chance.
Thanks for doing the ground work for me.
- Phil
Thank you Phil for exiftool!
ExifTool 12.25 is available, and should speed things up for you. This new version only decodes these large arrays unless if the -m option is used.
- Phil
Edit: Fixed typo
Quote from: Phil Harvey on April 22, 2021, 03:02:09 PM
ExifTool 12.25 is available, and should speed things up for you. This new version only decodes these large arrays unless the -m option is used.
- Phil
Thanks Phil! I just tried 12.25 and the processing time for 256 DNGs dropped from 19.2 seconds to 5.4 seconds, with all files preloaded into the filesystem cache before running both. :)
OK, so 3.5x faster. Not the 5x we hoped for, but not bad.
- Phil