Intermittent "Argument List Too Long" error

Started by inanealex, September 13, 2012, 05:24:57 PM

Previous topic - Next topic

inanealex

Hi all. I'm got a FilemakerPro database of file paths for 17,000+ AI files, and I want to get 10 pieces of metadata from the AI into the database. I've written an AppleScript to generate a list of 10 arguments (-xmp:metadata1 -xmp:metadata2.. etc.) held in the MD_Lookup variable which gets passed to exiftool in the following line of AppleScrtipt:

set rawMetadata to (do shell script "exiftool -T" & MD_Lookup & " " & quoted form of graphicPath) as string

This is embedded in a repeat loop that iterates through each of the 17,000 graphicPaths, building the metadata argument list, calling the shell script, returning the tabbed list, and parsing it into the correct fields for each graphicpath in the loop. Works great....but after 400 files (+/-) it throws the error "Argument List Too Long."

Restarting the script somehow resets things, but I don't understand why. The shell script is run for each record passed to it by the repeat loop.. so it's a new argument list for each iteration, not one impossibly long one.

Any ideas how to handle this, like adding a line to the shell script that resets the argument list on each iteration?
Thanks

Phil Harvey

You can't just give ExifTool a directory and let it iterate through the files?  Also, this would be faster since you avoid the (significant) overhead of launching exiftool for each file.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

inanealex

Unfortunately the graphics aren't in a single directory. We've got basically four locations, each with between 200 and 2000 job folders, each holding any number of .AI files. The four locations don't have uniform structures, and I need to exclude subdirectories from each job folder too, so a broad point and shoot approach isn't ideal.

Once the primary database ingest is done, additions of new jobs won't be too substantial, so I'm not concerned about the time overhead. Updates would include a periodic check to ensure the most current version of the .AI file is represented in the DB, that it's location is current (they do move sometimes), and the addition of 400-500 new graphics (worst case) monthly. My current approach can ingest the metadata from 400 files in about 10 minutes across a very slow network. Not ideal, but not onerous enough to be a concern.

I'm already using a shell script to extract from 20K+ directories a list of paths to my 7000+ job directories, and another that iterates through the 7000+ looking for .ai files that meet certain criteria (specific parent directory, certain filename conventions) to generate my 17,000+ graphic file paths. Both of these scripts call its shell script for each path passed to it in the iteration, and does so without choking.

I'm looking for the content of the same 10 xmp tags for every file. Is it possible to pass a list of paths to exiftool, rather than one path at a time? Something like:

exiftool -T -xmp:MD1 -xmp:MD2 $filepathlist

Since I'm a newb, I'd have guessed that this would have flooded the arguments list with too much data.
It would also pose an interesting parsing problem for me, separating out each set of ten xmp data items from the massive list that would be returned before writing to the DB.
Is still don't understand how launching exiftool on each of 17000 iterations has a different effect on the arguments list than running the script 45 times for 400 iterations each.

Thanks,
Alex

inanealex

further research suggests that I read more about batching files... wow, a 60x increase in performance if I can avoid starting exiftool for each file. If it works, its will fix my  "arguments list too long" problem, and do it faster too.  Yep, I'm a newb.

ryerman

Have a look at the -@ option, here.  An ARGFILE can be a list of file paths.
Windows 10 Home 64 bit, Exiftool v12.61

Phil Harvey

Quote from: ryerman on September 14, 2012, 01:51:35 AM
Have a look at the -@ option, here.  An ARGFILE can be a list of file paths.

Citebite. Cool.  I learned something. :)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Arnoldcav


Long Path Tool helped me in this situation.  : http://PathTooDeep.com   :)