Hello,
is it possible to run exiftool only once with two format files and one output for each ?
instead of this :
exiftool -T -L -p print01_fmt.txt *.jpg > output01.txt
exiftool -T -L -p print02_fmt.txt *.jpg > output02.txt
Using the -execute option (https://exiftool.org/exiftool_pod.html#execute-NUM) on only two command like this won't save much time. Exiftool will still have to process all the files twice. The time saved over running two separate commands will be less than a second.
The main use of -execute is to keep exiftool running when you have a lot of individual commands to run on a lot of different files.
Otherwise the command to combine them might be something like this. It would require the use of the -w (-TextOut) option (https://exiftool.org/exiftool_pod.html#w-EXT-or-FMT--textOut) instead of the file redirection > because redirection is handled by the command line, not exiftool and will not work on two separate file.
exiftool -p print01_fmt.txt -w+ output01.txt -execute -p print02_fmt.txt -w+ output02.txt -common_args -T -L *.jpg
Note I have not tested this, so there might be an error.
Thank you StarGeek
With each post I know a little more how Exiftool works.
I noticed earlier, on the forum, command lines without the > redirecting, but I didn't understood :)
Talking about time :
My friend runs a command to extract tags in a text output, acting on a folder containing about 20.000 photos, and he says it takes about 3 hours to finish the job. Is it ok? not too long? (I suggested him to use -fast1 and -fast2 but it didn't make it faster)
Is it a single command or are they running it once on each file. If the latter, then that is Common Mistake #3 (https://exiftool.org/mistakes.html#M3).
I have a bat file that I picked up somewhere that will list the time it takes for a command to run. I just ran a basic exiftool command (FAQ #3 command) to list all the data in 28K+ files and the result was this, with the last line being the output from the bat.
36 directories scanned
28383 image files read
command took 0:18:8.37 (1088.37s total)
So to me, it sounds like they're running it once per file. In your case, you're only running two commands, which is why I say it's not worth it to make a complex command that uses execute. But running exiftool 20k+ times, the startup time adds up.
Post from Phil (https://exiftool.org/forum/index.php?msg=6121) on the subject from the early days of this forum. And the blog post mentioned (https://web.archive.org/web/20120223091305/http://www.christian-etter.de/?p=458) via Archive.org
@StarGeek
Thanks for your answer !
I was not talking of running those 2 commands on 20.000 files. We only have to make them run on a few hundreds files and it's ok, and fast enough.
It was about another exiftool command (executing only once), extracting less than 10 tags, with a format file, from 20.000 photos, to a text file.
I attached batch and format files, if you want to have a look.
Do three hours seems too long to you ?
(Sorry, the Phil's post is too complicated for me to understand, but it seems to concern the -execute)
Is this on a local drive or over a network? Because that will affect the speed.
When editing, Exiftool should take only a little bit more time than it would be to copy all the data. Just listing data like your bat file should take even less time, as shown by my 18 minute result. So a 3 hour result indicates to me that this is over a network, as my drives are all slow (5400 rpm, WD blues/Seagate BarraCudas) and took significantly less time in my test.
I'm setting up a test directory with the 28k files and copying them with Teracopy with verification on. It will take less than an hour to do so with Teracopy and verification on (so it will reread the file after the copy). Then I'll set up the files with random data in the tags used in your format file and time that. Finally, I'll run your bat file and time that. I'll let you know the results when done.
Ok, I stand corrected. Using your FMT file, it took most of the day to run the single command on my test setup. I have slow drives, so that didn't help.
I'm not sure why it took so long. Maybe the large FMT file. I didn't time individual points but it seems the first 1,000 didn't take too long, but it seemed to slow down as time went on.
If it got slower as more files are processed, then memory is likely an issue because the Perl memory garbage cleanup is fairly time consuming. Optimizing memory usage should help. You can do this by adding -api ignoretags=all to the command so only the tags in the .fmt file are extracted. From the last paragraph in the -p option documentation:
Note that the API RequestTags option is automatically set for all
tags used in the FMTFILE or STR. This allows all other tags to be
ignored using -API IgnoreTags=all, resulting in reduced memory
usage and increased speed.
Also, the -fast1 option is doing nothing since you are also specifying -fast2.
- Phil
I was going to check with the -api IgnoreTags option (https://exiftool.org/ExifTool.html#IgnoreTags) when I got a chance, but I didn't want to possibly raise expectations too much beforehand.
Hello
@StarGeek, @Phil
Quoteit seems the first 1,000 didn't take too long, but it seemed to slow down as time went on
Yes ! My friend noticed that too !
Its running on a local PC
Thank you for your api option Phil and StarGeek for your prompt answer and your test !
I'll give you feedback about this !
I'm assuming that you have some ported linux progams instealled due to the use of sed/sort/uniq. You might check to see if you have the split program. You could then dump all the filenames to process into a text file, use split to separate that into, say, 1,000 line batches, and then use the -@ (Argfile) option (https://exiftool.org/exiftool_pod.html#ARGFILE) to process each batch:
exiftool -T -L -m -progress -ext jpg -r -sep ## -p print03_fmt.txt -@ Splitfile1.txt >> tout01.txt
Here I used >> to append the redirected text instead of overwriting it.
1/@StarGeek :
Sorry but I don't understand what should Splitfile1 should contain. I don't know yet how the split unix command works.
And I don't see where you are using it...
2/ @StarGeek @Phil
testing the -api ignoretags=all :
3h30min to complete the 20.000 files :
After running 1 hour : 12 000 files were treated
After 13.000 files procecessed, processing is severely slower
After 14.000 files processed, about one file/second is processed
After 16.000 files processed, slowing-down is higher
2.5 hours to process 18.000 files
1 more hour to process the last 3.000 files
Then it's not better :(
Quote from: Iwonder on June 28, 2024, 10:09:50 AM1/@StarGeek :
Sorry but I don't understand what should Splitfile1 should contain. I don't know yet how the split unix command works.
And I don't see where you are using it...
I didn't give an actual example, it was just an idea. I don't know the actual commands well enough to give an exact process (actually, using ChatGPT, I figured this out). But my basic thought would be
1. Use
find (unix version) to get a text file with all the filenames. ChatGPT came up with this
find /path/to/files/ -type f \( -iname "*.jpg" -o -iname "*.jpeg" \) >temp.txtThis took less than a second. Exiftool was significantly slower for just listing all the files.
I'm not sure what the proper command for the Windows version of
find would be. And Windows will use that by default.
2 Use
split to split the output into 1,000 line batches. Again, ChatGPT and some other searches
split --additional-suffix=.txt -d input.txt OutputListThis requires
split 8.16+. I'm using the MSys2 ports (https://www.msys2.org/) and that has version 8.32
I was able to combine them with a pipe
find /path/to/files/ -type f \( -iname "*.jpg" -o -iname "*.jpeg" \) | split --additional-suffix=.txt -d - OutputListThis resulted in 29 text files,
OutputList00.txt to
OutputList28.txt3 Next, run exiftool on each of these output files. I really dislike trying to figure out Windows BAT file looping, and since I'm already using the Linux
find command, this can be done with the
-exec option
find . -maxdepth 1 -type f -name "OutputList*.txt" -exec exiftool -T -L -m -progress -ext jpg -r -sep ## -p print03_fmt.txt -@ {} ; >> tout01.txtI used
-maxdepth 1 to prevent
find from looking for more OutputList files in the subdirectories. Probably not needed, but might as well be safe about it.
I just ran this sequence and it took 42 min, 42 seconds for 28.3K files
Doing some more pipes, combining the
sed commands, and using
sort's
-u instead of
uniq and here is the BAT file I ended up with
find ./speedtest/ -type f \( -iname "*.jpg" -o -iname "*.jpeg" \) | split --additional-suffix=.txt -d - OutputList
find . -maxdepth 1 -type f -name "OutputList*.txt" -exec exiftool -T -L -m -progress -ext jpg -r -sep ## -p print03_fmt.txt -@ {} ; >> tout01.txt
sed -e "/^-/d" -e "s/\#\#/, /g" tout01.txt | sort -u >tout05.txt
waouh !!! this sounds great !
I'll have a try !
(I didn't think about ChatGPT, because I was very very very disappointed in the past for a specific question about a .dotm file : it never succeed...)
Yeah, ChatGPT is hit or miss, and you often have to test and double check. But for basic Linux commands, it does pretty well. And I've had good results with ffmpeg commands as well.
Actually, it also did well creating a simple GreaseMonkey script for me, as well as a simple AutoHotkey script.
@StarGeek
Hello
coming back with the results !
Although I didn't manage this running all this in only 3 ligns with my environment(but it's not important), the The whole process only lasted... 15 minutes instead of 3.5 hours !!!!!
This is amazing !!!
Thank you so much !
One more question about this command line : Could you explain this -@ {} ?
On the help file I only can see that -@ is used for introducing an argument file, but there is no ARG file here...
Hmmm... looking at it now, I think that I made a mistake in adding the -@. For some reason, I was thinking that find would be piping the data. The -@ option can read data from a pipe, redirection, or STDIN, but that would have to be -@ - and I think find is directly providing the file list, which is inserted by find at the { }.
As a result, I think your processed list is one file short, as the -@ option would try to read the first file as an ARGS file. Try running it again and dropping the -@.
Never mind, see below
in fact I tried omitting -@ {}
but it doesn't work, giving me an help page for ExifTool instead
but I don't know how to comment this
Yeah, it's needed.
Find is looking for the 1,000 line file list text files and exiftool is using -@ to read each of those for the list of files to process. I was taking a nap and remembering this woke me up to come make this post :D
thank you !
Hope you went back to your bed, with a rest mind ^^
Yes, -@ shouldn't be there.
Also, your "find" is looking for .txt files, but your exiftool command processes only .jpg files.
And I needed quotes around the semicolon because "find" needs it to terminate the arguments, and without quotes it was eaten by the shell.
Other than that, the command worked for me.
- Phil
No, -@ does need to be there. Find isn't feeding file names directly to exiftool. It's giving exiftool text files with 1,000 filepaths per text file.
Back in this post (https://exiftool.org/forum/index.php?msg=86897), I used find to gather all the filenames to be processed. It takes exiftool several minutes to generate a list for 28,000+ files, even with -fast5, while find was able to generate the list in a couple seconds.
Then split is used to split the results of that find into separate files with 1,000 filepaths per file named "OutputList##.txt"
find is used again to gather the names of each of these "OutputList##.txt" files and that is what is passed to exiftool with the -@, running exiftool once per "OutputList##.txt" file.
I don't know where the slow down is, maybe memory management like you said, but my 28K test directory was running for over 6 hours before I shut it down, while running it in 1,000 file batches only took 42 minutes. And @Iwonder says splitting like this takes only 15 minutes instead of 3.5 hours.
Quote from: StarGeek on July 03, 2024, 04:51:19 PMNo, -@ does need to be there. Find isn't feeding file names directly to exiftool. It's giving exiftool text files with 1,000 filepaths per text file.
Ah, sorry. I missed that. I didn't read the whole thread.
Quote@Iwonder says splitting like this takes only 15 minutes instead of 3.5 hours.
When I get a chance I should look into this.
- Phil
Quote from: Phil Harvey on July 03, 2024, 09:25:19 PMWhen I get a chance I should look into this.
My first thought is that it's a problem with the Windows version, but I just realized I'm not sure what OS @Iwonder is using. For some reason I was thinking Windows, but Linux commands are listed.
@all
Yes I'm using Windows 10, with some .exe coming from UnixUtils for Windows :)
Regarding UnixUtils, you might want to take a look at MSYS2 (https://www.msys2.org/). MSYS2 versions are more up to date than UnixUtils.
@StarGeek
thank you for this suggestion.
if I knew this at the begining of this project, I surely would have used it.
have a nice day ! 8)