PHP version of ExifTool libraries

Started by wimvan, October 28, 2011, 11:55:57 AM

Previous topic - Next topic

wimvan

Hi,
Exiftool works nice, very nice, but, it would be more users-friendly if it should run under PHP.
PHP is more distributed than PERL ...
The reason why I ask this is simple, when running a web, sometimes you permit to load up images ...  Using GD from php do not return all stuff correctly of not at all.
For example, the lens-data are very poor. The reason for this is are the Makernotes  which are not fully retrieved in PHP and well in Exiftool.

Thanks in advance ...

etmmger

I wanted exactly the same, tried to recode the perl libraries to php or C for php module, but gave up.
php does support reading exif data (via exif_read-data(), see manual), but it is limited in functionality and does not support multipage TIFF file and images with multiple IFD sections.
As an intermediate solution, I modified exiftool to my needs to it generates a PHP file as output, which you can include in your php code.
An array with one subarray per IFD section is generated if option -php is specified at the exiftool command line.
A flat array with keys "group:tag" is generated is option -php:flat is specified.
The patch for Image-Exif 8.74 is attached.

You can use it with:
system('exiftool -php YOURIMAGEFILE > /tmp/$$.php')
$data = require("/tmp/$$.php");
unlink("/tmp/$$.php");


If you want is as directly evaluatable code in php, you should probably remove the <?php and ?> code from exiftool, so you can run it from php width eval(`exiftool -php YOURIMAGEFILE`)

Regards, Marcel

Phil Harvey

#2
Hi Marcel,

This is cool.  Do you think that it would be worthwhile including this patch in the official version?

- Phil

Edit: The patch includes some unimplemented code for the -ep option.  Did you ever make use of this?
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

etmmger

Hi Phil,

I would love to see it in the official version!
But... I would suggest carefully checking the modifications I made, as I'm not an experience perl programmer.
I also found an error in my patch (didn't work properly when scanning multiple files).
I attached an update of my patch.

You can use the latest patched exiftool from php with the following code:
$x = eval(preg_replace("/^<\?php|\?>$/", "", `exiftool -php /tmp/Post/*`));
printf("Number of items: %d\n", count($x));


I briefly tested it in normal and flat array format with one JPEG and 1 up to 28 TIFF files.
TIFF files containing one to 29 scanned pages (With MS Office Document Imaging; the OCR content goes into EXIF data :-))
I think it needs some more testing, especially when using combinations of options (I didn't test that myself).

Kind regards, Marcel

Phil Harvey

Hi Marcel,

Yes, I'm ahead of you already.  Also, I don't think your version handled structured-output properly.

I've been playing with the code and have changed things to conform exactly with the -json feature.  Here is an example of how it works now:

> exiftool a.jpg b.jpg -php -g1 -filename
return Array(Array(
  "SourceFile" => "a.jpg",
  "System" => Array(
    "FileName" => "a.jpg"
  )
),
Array(
  "SourceFile" => "b.jpg",
  "System" => Array(
    "FileName" => "b.jpg"
  )
));
    2 image files read


And I'm using this PHP script to test it out:

<?php
$array 
= eval(`exiftool -php -g1 -q -q -struct a.jpg b.jpg`);
print_r($array);
?>


This is a potentially useful modification, so I'm thinking that I'll include it in the next release.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Just so we are working on the same page, here is the patch for the version I'm currently testing.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I've cleaned up things a bit and added the ability to extract binary information.  Attached is an updated patch file.

Note that the patch (like my last one) includes some other unrelated changes that will appear in exiftool 8.75.

Unless you have some suggestions or find a problem, I'll consider this the final version.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

etmmger

I just did a quick review on the patch file, and I must admit that you cleaned it up very nicely!
I see that you removed the starting and ending tags for PHP (<?php and ?>).
I think it would be good to have an option to generate these tags, so one will be able to generate a standalone PHP file.

- Marcel

Phil Harvey

#8
Hi Marcel,

What would be the advantage of using a stand-alone PHP file?

I went with the eval compatibility because I can see how this could be used (and more convenient because there is no temporary file).  I even toyed with the idea of leaving out the "return" because this is easy to add with a string concatenation before the eval.  What do you think -- would there ever be a time when you didn't want the "return"?  Adding extra stuff is much easier than removing it after the fact.

You could do this with the <?php ?> too, and save adding an extra option (which I really try to avoid):

system('echo "<?php" > a.php;exiftool -php a.jpg >> a.php;echo "?>" >> a.php');

- Phil

Edit:  Hmmm.  The forum color syntax highlighting really doesn't know what I was trying to do there. :P
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

etmmger

Hi Phil.

I agree with you (even about leaving out the return statement).
The fewer options the better.

The reason I started with the php patch was that I have a huge collection of scanned documents and photo's which I want to have searchable from a webpage, based on given search string (regex).
Now, before I start digging into exiftool code again, a last question:
I see that there is a -if option.
Is is possible to look through serveral files and check if regex matches ANY tag?
I would be great if the tagnames of matching tags could be given in a new tag,eg exiftool:machtingTags. This could be a CSV formatted string or so.
Eg:
exiftool -php -if /My favorite post/i *

Would give me an array with all matching files and details, with per file a tag:
Array( ... Array("SourceFile"=>"filename", ... , "exiftool::matchingTags"=>"tag1,tag2"), ...)

Kind regards, Marcel

Phil Harvey

#10
Hi Marcel,

OK, I think I'll drop the "return" then.  So the example PHP script will be:

<?php
$array 
= eval('return ' . `exiftool -php -q image.jpg`);
print_r($array);
?>


or

<?php
eval('$array=' . `exiftool -php -q image.jpg`);
print_r($array);
?>


About searching through all tags for a match: The exiftool application won't do this for you, but this is something that wouldn't be too difficult for you to code on the PHP side now that you have all of the information. :)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

duodraco

Hi.
Thanks for Exiftool and also php format support!

I want suggest to -php return a php serialized format. "eval" is a evil function to many php developers (like me) - and unserialize is a natural way to get a object/structure  from a string.
for information:

a simple array as currently returned by exiftool:
array(array("attribute1"=>"value1","attribute2"=>"value2"));

and we need a code like this to get the value:
$array = eval('return '.`/path/to/exiftool -php /path/to/file/to/be/parsed`);

a serialized string from above array is like this:
a:1:{i:0;a:2:{s:10:"attribute1";s:6:"value1";s:10:"attribute2";s:6:"value2"}}
where:

       
  • a:1 -> array with 1 element
  • i:0 -> index 0 from array
  • a:2 -> array with 2 elements
  • s:10 -> string with 10 chars
and a php code to get exiftool information:
$array = unserialize(`/path/to/exiftool -php /path/to/file/to/be/parsed`)

if possible to change php behavior, how can I help?
I will build a PHP Wrapper soon.

Phil Harvey

#12
Quote from: duodraco on January 13, 2012, 10:14:01 PM
I want suggest to -php return a php serialized format.
[...]
if possible to change php behavior, how can I help?

A good suggestion, but unfortunately this serialized format would be very inefficient as an output format, so I wouldn't recommend this.

The problem is that exiftool would need to buffer the entire output in memory for all files to determine the total number of files processed before it could write the size of the first array.  Since exiftool may be run on many thousands of files, this would lead to allocating a huge chunk of memory, likely crashing exiftool on Windows (since the Windows version of Perl crashes if you try to allocate more than about 1 GB of memory).

- Phil

Edit: Also, this format is extremely picky.  If you change the tiniest thing, then unserialize() doesn't work but I get no indication of why.  Your serialized string is an example -- it won't unserialize because it needs a semicolon after the last value. (...s:6:"value2";}})  You can't even insert extra whitespace between entries to make the format readable. :(
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

etmmger

Hi, if you don't want to use eval(),  you can use exiftool in json mode.
However, I found out that php's json_decode() not always decodes the generated json correctly.
Maybe it's an issue with how json is escaped by exiftool.

Marcel

duodraco

Hi

I understood Phil about format and I agreed with him. I had some issues using json format in the past due to returned values and earlier versions of this feature - but it was fixed. Json is a good format but if you decode a huge string you can get in trouble about memory and cpu usage. In fact eval + direct injection of exiftool return is a faster way  to get a ready array at php side.
I'll make some benchmarks over json/php formats - including a security layer on php format on eval way. - and publish the results here