Question about proper usage and large filecount - Out of Memory

Started by bignick0, May 15, 2019, 09:51:37 AM

Previous topic - Next topic

bignick0

Hello - thank you for this fantastic tool. The work that must go into producing and supporting this by a single person is incredible.

As an admitted "Newbie" I have a quick question about HOW I should be using the tool, as it seems that I may be asking for more than it can deliver (within the context of my current project).

I have a very large number of DICOM images that were extracted from a PACS server. I am attempting to dump the headers into a CVS file that I can manipulate in Excel.

The issue I am having is that exiftool quickly runs out of memory as I try to step through the directories recursively and write out the CVS. It seems, based on the 0 byte output file, that exiftool is trying to read all of the headers into memory before outputting the file. I think this well never work, as I have on the order of 300,000 files. I am using a basic command like this:

exiftool.exe -csv -r -ext ".dcm" . > out.csv

Is there any hope for a command line switch that may force the output to CVS after each image header is read? Or should I look at writing a script to step through the directories and call exiftool 300,000 times? If I do this, I am also worried about possible "out of memory" situations as my script may not release memory correctly after each invocation of exiftool (based on some other threads I have read).

Any suggestions?

Thanks again!

Phil Harvey

From the documentation for -csv:

            Note that this option is fundamentally different than all other
            output format options because it requires information from all
            input files to be buffered in memory before the output is written.
            This may result in excessive memory usage when processing a very
            large number of files with a single command.  Also, it makes this
            option incompatible with the -w option.


I suggesting using the XML (-X) or JSON (-j) output instead if possible.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

bignick0

Thanks for your feedback. Yes, this was exactly the issue. I switched to XML output and after many hours the output succeeded. Now I have to figure out how to deal with a 8gb XML file.

  ;D

Thanks again!