-execute and 2 different format files and 2 output files ?

Iwonder · June 15, 2024, 06:43:23 AM

Hello,
is it possible to run exiftool only once with two format files and one output for each ?
instead of this :
exiftool -T -L -p print01_fmt.txt *.jpg > output01.txt
exiftool -T -L -p print02_fmt.txt *.jpg > output02.txt

StarGeek · June 15, 2024, 12:01:38 PM

Using the -execute option on only two command like this won't save much time. Exiftool will still have to process all the files twice. The time saved over running two separate commands will be less than a second.

The main use of -execute is to keep exiftool running when you have a lot of individual commands to run on a lot of different files.

Otherwise the command to combine them might be something like this. It would require the use of the -w (-TextOut) option instead of the file redirection > because redirection is handled by the command line, not exiftool and will not work on two separate file.
exiftool -p print01_fmt.txt -w+ output01.txt -execute -p print02_fmt.txt -w+ output02.txt -common_args -T -L *.jpg

Note I have not tested this, so there might be an error.

Iwonder · June 20, 2024, 01:00:59 AM

Thank you StarGeek
With each post I know a little more how Exiftool works.
I noticed earlier, on the forum, command lines without the > redirecting, but I didn't understood

Talking about time :
My friend runs a command to extract tags in a text output, acting on a folder containing about 20.000 photos, and he says it takes about 3 hours to finish the job. Is it ok? not too long? (I suggested him to use -fast1 and -fast2 but it didn't make it faster)

StarGeek · June 20, 2024, 01:50:56 AM

Is it a single command or are they running it once on each file. If the latter, then that is Common Mistake #3.

I have a bat file that I picked up somewhere that will list the time it takes for a command to run. I just ran a basic exiftool command (FAQ #3 command) to list all the data in 28K+ files and the result was this, with the last line being the output from the bat.

Code Select

   36 directories scanned
28383 image files read
command took 0:18:8.37 (1088.37s total)

So to me, it sounds like they're running it once per file. In your case, you're only running two commands, which is why I say it's not worth it to make a complex command that uses execute. But running exiftool 20k+ times, the startup time adds up.

Post from Phil on the subject from the early days of this forum. And the blog post mentioned via Archive.org

Iwonder · June 23, 2024, 04:19:47 AM

@StarGeek
Thanks for your answer !

I was not talking of running those 2 commands on 20.000 files. We only have to make them run on a few hundreds files and it's ok, and fast enough.

It was about another exiftool command (executing only once), extracting less than 10 tags, with a format file, from 20.000 photos, to a text file.
I attached batch and format files, if you want to have a look.
Do three hours seems too long to you ?

(Sorry, the Phil's post is too complicated for me to understand, but it seems to concern the -execute)

StarGeek · June 23, 2024, 10:56:12 AM

Is this on a local drive or over a network? Because that will affect the speed.

When editing, Exiftool should take only a little bit more time than it would be to copy all the data. Just listing data like your bat file should take even less time, as shown by my 18 minute result. So a 3 hour result indicates to me that this is over a network, as my drives are all slow (5400 rpm, WD blues/Seagate BarraCudas) and took significantly less time in my test.

I'm setting up a test directory with the 28k files and copying them with Teracopy with verification on. It will take less than an hour to do so with Teracopy and verification on (so it will reread the file after the copy). Then I'll set up the files with random data in the tags used in your format file and time that. Finally, I'll run your bat file and time that. I'll let you know the results when done.

StarGeek · June 24, 2024, 10:56:21 AM

Ok, I stand corrected. Using your FMT file, it took most of the day to run the single command on my test setup. I have slow drives, so that didn't help.

I'm not sure why it took so long. Maybe the large FMT file. I didn't time individual points but it seems the first 1,000 didn't take too long, but it seemed to slow down as time went on.

Phil Harvey · June 24, 2024, 11:25:16 AM

If it got slower as more files are processed, then memory is likely an issue because the Perl memory garbage cleanup is fairly time consuming. Optimizing memory usage should help. You can do this by adding -api ignoretags=all to the command so only the tags in the .fmt file are extracted. From the last paragraph in the -p option documentation:

Note that the API RequestTags option is automatically set for all
tags used in the FMTFILE or STR. This allows all other tags to be
ignored using -API IgnoreTags=all, resulting in reduced memory
usage and increased speed.

Also, the -fast1 option is doing nothing since you are also specifying -fast2.

- Phil

StarGeek · June 24, 2024, 12:32:09 PM

I was going to check with the -api IgnoreTags option when I got a chance, but I didn't want to possibly raise expectations too much beforehand.

Iwonder · June 27, 2024, 12:39:55 PM

Hello
@StarGeek, @Phil

Quoteit seems the first 1,000 didn't take too long, but it seemed to slow down as time went on

Yes ! My friend noticed that too !
Its running on a local PC

Thank you for your api option Phil and StarGeek for your prompt answer and your test !
I'll give you feedback about this !

StarGeek · June 27, 2024, 12:53:46 PM

I'm assuming that you have some ported linux progams instealled due to the use of sed/sort/uniq. You might check to see if you have the split program. You could then dump all the filenames to process into a text file, use split to separate that into, say, 1,000 line batches, and then use the -@ (Argfile) option to process each batch:
exiftool -T -L -m -progress -ext jpg -r -sep ## -p print03_fmt.txt -@ Splitfile1.txt >> tout01.txt
Here I used >> to append the redirected text instead of overwriting it.

Iwonder · June 28, 2024, 10:09:50 AM

1/@StarGeek :
Sorry but I don't understand what should Splitfile1 should contain. I don't know yet how the split unix command works.
And I don't see where you are using it...

2/ @StarGeek @Phil
testing the -api ignoretags=all :

3h30min to complete the 20.000 files :

After running 1 hour : 12 000 files were treated
After 13.000 files procecessed, processing is severely slower
After 14.000 files processed, about one file/second is processed
After 16.000 files processed, slowing-down is higher

2.5 hours to process 18.000 files
1 more hour to process the last 3.000 files

Then it's not better

StarGeek · June 28, 2024, 01:38:45 PM

Quote from: Iwonder on June 28, 2024, 10:09:50 AM1/@StarGeek :
Sorry but I don't understand what should Splitfile1 should contain. I don't know yet how the split unix command works.
And I don't see where you are using it...

I didn't give an actual example, it was just an idea. I don't know the actual commands well enough to give an exact process (actually, using ChatGPT, I figured this out). But my basic thought would be

1. Use find (unix version) to get a text file with all the filenames. ChatGPT came up with this
find /path/to/files/ -type f \( -iname "*.jpg" -o -iname "*.jpeg" \) >temp.txt
This took less than a second. Exiftool was significantly slower for just listing all the files.

I'm not sure what the proper command for the Windows version of find would be. And Windows will use that by default.

2 Use split to split the output into 1,000 line batches. Again, ChatGPT and some other searches
split --additional-suffix=.txt -d input.txt OutputList
This requires split 8.16+. I'm using the MSys2 ports and that has version 8.32

I was able to combine them with a pipe
find /path/to/files/ -type f \( -iname "*.jpg" -o -iname "*.jpeg" \) | split --additional-suffix=.txt -d - OutputList
This resulted in 29 text files, OutputList00.txt to OutputList28.txt

3 Next, run exiftool on each of these output files. I really dislike trying to figure out Windows BAT file looping, and since I'm already using the Linux find command, this can be done with the -exec option

find . -maxdepth 1 -type f -name "OutputList*.txt" -exec exiftool -T -L -m -progress -ext jpg -r -sep ## -p print03_fmt.txt -@ {} ; >> tout01.txt

I used -maxdepth 1 to prevent find from looking for more OutputList files in the subdirectories. Probably not needed, but might as well be safe about it.

I just ran this sequence and it took 42 min, 42 seconds for 28.3K files

Doing some more pipes, combining the sed commands, and using sort's -u instead of uniq and here is the BAT file I ended up with

Code Select

find ./speedtest/ -type f \( -iname "*.jpg" -o -iname "*.jpeg" \) | split --additional-suffix=.txt -d - OutputList
find . -maxdepth 1 -type f -name "OutputList*.txt" -exec exiftool -T -L -m -progress -ext jpg -r -sep ## -p print03_fmt.txt -@ {} ; >> tout01.txt 
sed -e "/^-/d" -e "s/\#\#/, /g"  tout01.txt | sort -u >tout05.txt

Iwonder · June 28, 2024, 03:46:45 PM

waouh !!! this sounds great !
I'll have a try !

(I didn't think about ChatGPT, because I was very very very disappointed in the past for a specific question about a .dotm file : it never succeed...)

StarGeek · June 28, 2024, 04:55:15 PM

Yeah, ChatGPT is hit or miss, and you often have to test and double check. But for basic Linux commands, it does pretty well. And I've had good results with ffmpeg commands as well.

Actually, it also did well creating a simple GreaseMonkey script for me, as well as a simple AutoHotkey script.

News:

-execute and 2 different format files and 2 output files ?