Scan only recently modified images in directory structure

Started by tosz, November 11, 2018, 04:59:59 AM

Previous topic - Next topic

tosz

Hello,
I scan all my (60000) media files in a subdirectory structure and output the metadata to a text file. This takes about an hour on my notebook (with several instances started). Is it possible to reduce this time by instructing exiftool to scan only files modified within the last 14 days? Or created since a certain date?

My current command is:
D:\util\exiftool\exiftool.exe
-config D:\test\xmp.config
-a -ext .jpg -ext .xmp -m -n -r -t -q
-p D:\test\tagsfromfile.txt
D:\media > D:\test\xmp.txt


Thank you in advance for any feedback on this,
Tosz.

Phil Harvey

Hi Tosz,

Unfortunately the -if condition currently extracts all metadata from the file so this can't be used to speed up the process.  I will look into adding the ability to quickly check pseudo tags like FileModifyDate, which could help speed things up a lot for you.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

#2
OK.  ExifTool 11.18 will have this feature:

            Adding NUM to the -if option causes a separate processing pass to
            be executed for evaluating EXPR at a -fast level given by NUM (see
            the -fast option documentation for details).  Without NUM, only
            one processing pass is done at the level specified by the -fast
            option. For example, using -if3 is possible if EXPR uses only
            pseudo system tags, and may significantly speed processing if
            enough files fail the condition.


And will allow you to do this to process only files that were modified in the last 14 days:

exiftool -if3 "not ShiftTime($now,'-14 0') and $filemodifydate gt $now" ...

(This new feature may be a bit difficult to understand, but I couldn't think of a simpler way to implement/document this, which is why it has taken me so long to add it.)

- Phil

Edit: Tweaked the wording of the documentation.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

tosz

Oh, this sounds great, Phil!
Your support is just outstanding.
I think I will take me more time to understand this feature as it took you to implement it ;-)

My question regarding the functionality:

If I want to fetch a certain date I would write (all images of this year for instance) ->
      exiftool -if3 "not ShiftTime('2018-01-01',' 0') and $filemodifydate gt $now"
or simply
      exiftool -if3 "$filemodifydate gt '2018-01-01'"

I'm a bit confused about the '$filemodifydate gt $now' which would fetch point to files in the future as I understand it.

Again thanks for the swift reply,
Tosz.

Phil Harvey

Quote from: tosz on November 12, 2018, 04:08:56 AM

or simply
      exiftool -if3 "$filemodifydate gt '2018-01-01'"

Exactly.  Comparing to a static date is simpler.  The use of ShiftTime() and $now was necessary only to satisfy your criterion of processing the files from the last 2 weeks.

QuoteI'm a bit confused about the '$filemodifydate gt $now' which would fetch point to files in the future as I understand it.

Correct, except that the ShiftTime() function shifted the value of $now back by 14 days in my example.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

tosz

Hello Phil,

Thanks for releasing the latest version so fast.
Of course I was tempted to test the new -ifNUM-argument.
I guess I must be missing something in my command line.
The arguments below process ALL files.

D:\util\exiftool\exiftool.exe -if3 "$filemodifydate gt '2018-06-20'" -p D:\argfile.txt -a -m -n -q -r -t D:\testimages > D:\test.txt

The testimages-folder contains images around the 2018-06-20 date.
The argfile.txt contains only the variable: $filemodifydate (my actual argfile has more vars)
But in the test.txt ALL files are listed:

...
2018:06:19 14:23:42+02:00
2018:06:19 14:36:16+02:00
2018:06:19 18:04:12+02:00
2018:06:19 18:20:00+02:00
2018:06:20 10:53:36+02:00
2018:06:20 12:51:44+02:00
2018:06:20 14:13:00+02:00
2018:06:20 14:20:34+02:00
2018:06:20 14:51:48+02:00
2018:06:20 17:50:31+02:00
2018:06:20 19:00:59+02:00
2018:06:20 19:18:38+02:00
2018:06:20 19:42:45+02:00
2018:06:20 19:44:16+02:00
2018:06:20 19:45:31+02:00
2018:06:20 19:46:01+02:00
2018:06:20 20:14:30+02:00
2018:06:20 20:18:59+02:00
2018:06:20 23:48:29+02:00
2018:06:21 08:41:42+02:00
2018:06:21 09:06:19+02:00
2018:06:21 09:38:27+02:00
2018:06:21 09:50:30+02:00
2018:06:21 10:49:14+02:00
2018:06:21 13:15:15+02:00
2018:06:21 14:06:11+02:00
...


Any ideas what I might havn't checked? I'm on Windows 10 Home.
Thanks for any reply,
Tosz.

Phil Harvey

I failed to notice that you used dashes instead of colons as date separators.

It should work if you switch to colons.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

tosz

Hello Phil,

The colons fixed the output. Now it works as expected.

But what remains critical for me is: speed. I tried it right away on 38000 files and one instance of Exiftool took about an hour. This might be twisted to 40 minutes if I changed energy plans or priority of the Exiftool-thread, but I think that's not what is going on.

For comparison I have a Javascript-version that scans all 60000 files for modified date and dumps the 400 found file names to a text-file in 30 seconds. But it uses OS-information only, no binary reading, so it's fast.
I was thinking along this way with Exiftool: check for modified date (OS-information) and read the file only if not older than (eg) 2 weeks. I could even combine both scripts: get the filenames of modified files with Javascript and feed them to Exiftool. But how? Can Exiftool read a text file with paths and process them? Something like TAGSFROMFILE (eg PATHSFROMFILE)?

D:\testimages\image0003.jpg
D:\testimages\image0056.jpg
D:\testimages\image5603.jpg


400 files red by Exiftool took less than a minute. 38000 to filter out those 400 modified: an hour. Have you managed to get better results?
Thanks for any thoughts,
Tosz.

Phil Harvey

#9
With my small test I was only getting a 4x speed increase using the -if3 option.  At that rate I was getting about 120 files/second, which would be about 5 minutes for 38000 files.

I don't know why you are much slower than this, but if you have another utility that will output only the names of the files you want to process, you can do this:

some_other_app OTHER_APP_ARGS | exiftool -@ - EXIFTOOL_ARGS

- Phil

Edit:  I just tried the -if3 option on a directory of 9228 files (with a condition that failed on all files but one), and it took 1 minute and 21 seconds of clock time, with only 20 sec of CPU time.  I ran the same command again, and it took less than 18 seconds of clock and CPU time (presumably faster since the disk cache was then primed with the headers of all the files -- ExifTool reads only the first 1 kB of a file in -fast3 mode).  Then I ran the command without -if3 and it ran almost exactly 4x slower.

Edit2:  I have been experimenting with a -if4 feature that would only extract the system information and avoid reading the header of the file.  This cuts the time down to 7 seconds for my 9228 files.  I'll implement this in the next release, but it sounds like you may have a solution with the Javascript that would probably still be faster for you.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

tosz

Hello Phil,
I made an interesting observation:
If I scan smaller folders with 3000 to 15000 files I get results even faster than yours (180 files per second). But starting with the top folder the entire structure of 40000 files takes... ages (almost no increase in speed).
My guess is: Exiftool chokes on certain files. I have a mixture of media files (jpg, swf, mp3, mp4, mp5, xmp etc). I keep testing.

If you implement the -if4-feature that would be the most elegant way.
Right now I do the following:

  • Scan recently modified files with Javascript (45 seconds for 55000 files)
    Feed those files to Exiftool with -@ (5 seconds for about 100 files)
    Import new meta-data-file into database

Since Exiftool is based on Perl I assume it is comparably fast as Javascript (on Windows 10).
Thanks for your replies,
Tosz.

Phil Harvey

Interesting theory, but I was also running my command on my full set of a wide variety of files, so I don't see how that could be it.  Another theory could be that the files in the entire structure are scattered widely on your disk, and that the disk seek time dominates the execution time.  If so, the -if4 option would really help.

Yes, Perl is probably comparable speed to Javascript, but ExifTool is a very general utility, and as such there will be a lot more code executed than in a dedicated script.  However, we'll see how the -if4 option goes.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).