ExifTool used for TIFF format validation

Started by Luiz, January 17, 2017, 05:18:39 AM

Previous topic - Next topic

Luiz

Hi,

FYI there is a blog post about using ExifTool as a validation tool.
http://openpreservation.org/blog/2017/01/17/tiff-format-validation-easy-peasy/

Here is what they say:

Validation: ExifTool is not really meant for validation, either. It's for metadata extraction. The information about image errors is just a by-product if the tool runs into any problems while trying to extract metadata. So it's not really fair to treat ExifTool like a validation tool, as it would never complain about an absolute unreadable TIFF which cannot be opened by any viewer, as long as all the metadata can get extracted. That might be the reason why ExifTool has the highest percentage of presumably valid TIFF files within this test. So, "valid" for ExifTool means, that there were no warnings or errors in the metadata output.

Handling: It's a command-line-tool with quite good possibilities to batch whole folders and output human-readable csv (though the csv can have many, many columns, as images can have a myriad of metadata).

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

#2
Why do I get the feeling that Common Mistake 3 was used during the testing :)

And this is the first I hear of the Google ImageTestSuite and it's gone :( Nevermind

Still a cool article.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

#3
Yes, it would be interesting to see the script they used.

I just downloaded the TIFF  google image test suite from the link in the article, so it is still there. :)

I am also inspired to add a new -validate feature to the next ExifTool release.  Currently only 7 of the 166 google test images pass the validation (or 49 if minor warnings are ignored).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Phil Harvey on January 17, 2017, 01:41:39 PM
I just downloaded the TIFF  google image test suite from the link in the article, so it is still there. :)

Ah, I didn't have GoogleApis.com whitelisted in my Noscript.  I see it now.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Luiz

Quote from: Phil Harvey on January 17, 2017, 01:41:39 PM
Yes, it would be interesting to see the script they used.

I did not do the testing but as far as I know it was
exiftool -a -u -U –H -g1 -r -csv inputfolder > out.csv

Quote from: Phil Harvey on January 17, 2017, 01:41:39 PMI am also inspired to add a new -validate feature to the next ExifTool release.  Currently only 7 of the 166 google test images pass the validation (or 49 if minor warnings are ignored).

I think that would make a lot of people in the digital preservation community very happy. But as you see that is no easy task (or you have to limit that feature to selected file formats)

Phil Harvey

Quote from: Luiz on January 19, 2017, 02:39:06 PM
I did not do the testing but as far as I know it was
exiftool -a -u -U –H -g1 -r -csv inputfolder > out.csv

OK.  That's not really a very good way to validate images.

QuoteI think that would make a lot of people in the digital preservation community very happy. But as you see that is no easy task (or you have to limit that feature to selected file formats)

I'm not thinking about writing a full validator for all file formats.  I don't know if it is even feasible to implement a full validation of TIFF images.  I'm just thinking about an option to add more validation at the expense of  processing speed for users that are looking for this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Luiz

Quote from: Phil Harvey on January 17, 2017, 01:41:39 PM
I am also inspired to add a new -validate feature to the next ExifTool release.

If you want to go ahead with this topic,  there is another blogpost called "repairing TIFF images - a preliminary report" containing a collection of real world examples with common errors in TIFF. Maybe to know these errors helps you.
https://kulturreste.blogspot.de/2017/01/repairing-tiff-images-preliminary-report.html

Quote from: Phil Harvey on January 20, 2017, 07:29:20 AM
OK.  That's not really a very good way to validate images.

What would be your adivce for validating and simultaneously getting as much metadata as possible out of an image?

Phil Harvey

Quote from: Luiz on February 07, 2017, 05:47:11 AM
If you want to go ahead with this topic,  there is another blogpost called "repairing TIFF images - a preliminary report" containing a collection of real world examples with common errors in TIFF. Maybe to know these errors helps you.
https://kulturreste.blogspot.de/2017/01/repairing-tiff-images-preliminary-report.html

Thanks.

QuoteWhat would be your adivce for validating and simultaneously getting as much metadata as possible out of an image?

With ExifTool 10.41 or later, I would do this:

exiftool -api validate -a -u -G1 FILE

This will add extra warnings when problems are detected.

But before the validate feature was added, I would have recommended trying to write something to the file using ExifTool since the writing code was much more strict than the reader.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Luiz

Phil, this new validation feature looks very helpfull. Thanks for the advice and explanation.