ExifTool > Developers

ExifTool PHP Fast Processing Script using StayOpen and Gearman

(1/8) > >>

TSM:
Ive created a script that can be used to run ExifTool in StayOpen mode within PHP.
The script can be inited as a singleton or instance.
Ive also supplied two additional scripts that wrap the class ready for use with gearman for scaleout.
The class does not apply any logic to the parameters supplied to ExifTool, what you push though gets passed though and the result is then returned.
The class detects if ExifTool has died and restarts it on the next call.

It still has some work to do and cleaning up but seems to work well in my environment and scales out quite nicely.
Originally I was using PHP streams but this proved to be a problem so instead using fgets to parse the return.
Note that script is hard coded to work with 9.03+ of ExifTool only because this is all I have been testing against since.

Let me know what you think or any changes to make it better.

https://github.com/tsmgeek/ExifTool_PHP_Stayopen


Performance tests.
These figures are to be taken as a guide of performance increase possible with supplied scripts but will vary depending on your hardware setup and arguments supplied to ExifTool.

100 iterations fetching metadata from a JPEG (-use MWG -g -j -*:*)

1 GM Instance - 52s
2 GM Instances - 25s
3 GM Instances - 17.5s
4 GM Instances - 12.5s

Usage

Below is a basic example on how to use this class.
Put all your commands in an array and push it into the stack using the $exif->add() function, you can add multiple jobs to process before calling fetch/fetchAll.

getInstance setup class as a singleton
setExifToolPath($path) set/change the path of ExifTool if not supplied at start
close() terminate ExifTool background process
start() start ExifTool background process
test() to check if ExifTool is running.
clear() clear the stack
fetch() will return one processed item off the stack at a time.
fetchAll() will return a single array with all items in the stack processed.

There are also calls to fetchDecoded/fetchAllDecoded which essentialy will decode the output in one step if your default arguments contains '-j' JSON output, the default for the script is ('-g','-j') to assist in this.

As you fetch items they are taken off the stack.





--- Code: ---$data=array('-use MWG','-g','-j','-*:*','test1.jpg');
$exif = ExifToolBatch::getInstance(/path/to/exiftool');
$exif->add($data);
$result=$exif->fetchAll();

--- End code ---

Phil Harvey:
This looks great, but I'm a bit surprised by the slow speed.  On my iMac here, running exiftool on 100 random JPEG images:


--- Code: ---> time exiftool -use MWG -g -j -all:all tmp3 > out.txt
    1 directories scanned
  100 image files read
1.041u 0.017s 0:01.15 91.3% 0+0k 1+2io 0pf+0w
--- End code ---

That is 1.15 clock seconds for all 100 files (or 11.5 ms per file).  But you are getting 52 seconds for 100 files?  Are you sure you aren't somehow launching a separate ExifTool application for each file?  That would explain the difference (45x slower).

- Phil

TSM:
Hmmm, i think its all got slow once i moved to using fgets using a buffered socket, before i was using streams but found it incompatible with different version of php.
Ive checked and the PID does not change of the underlying perl once it has been started so its related to the fgets.
Ile look into it and get it sorted hopefully.

TSM:
Ok I was going in circles then tested the file I had done the original performance tests on and found it was the JPG that had the problem.
It was the same JPG that caused this issue https://exiftool.org/forum/index.php/topic,5074.msg24427.html#msg24427

So I re tested it again with another file.

Note that this is putting 100 individual tasks into the GM queue and running them as individual '-execute' calls to exiftool, all GM instances were running on same vmware machine.

1 GM Instance - 1.3s
2 GM Instances - 0.9s
3 GM Instances - 0.7s
4 GM Instances - 0.6s


More stats
300 iterations with 4 GM Instances - 1.6s
100 iterations 3 files with 4 GM Instances - 1.4s
50 iterations 6 files with 4 GM Instances - 1.1s
100 iterations 10 files with 4 GM Instances - 3.1s

Note that all testing was looping over the same file stored locally.

I think I can get a little more speed out of the script by changing if my script uses buffered/unbuffered streams.

Phil Harvey:
Now those numbers are more like it!

- Phil

Navigation

[0] Message Index

[#] Next page

Go to full version