ExifTool Forum

ExifTool => Archives => Topic started by: Archive on May 12, 2010, 08:54:34 AM

Title: Any hope of getting recursive extraction of metadata for nested documents added?
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by johnmccash on 2009-04-07 15:21:02-07]

Hi,

I'm new to the forums, but just ran into an issue with exiftool, & wondered if anybody else might be interested in the same enhancement. PDF files and MS Office documents can include other files, such as JPEGs, encapsulated within them. Currently, there's no easy way to get at the metadata for these documents. What I'm using now is hachoir-subfile (http://hachoir.org/wiki/hachoir-subfile) to extract the encapsulated documents, and exiftool to enumerate their metadata. It would be really cool if this could be done within exiftool itself. The hachoir suite also includes hachoir-metadata, but it doesn't extract nearly as many metadata tags as exiftool, and it doesn't do recursion anyway.
Thoughts?
Thanks

John
Title: Re: Any hope of recursive extraction?
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-04-07 15:46:44-07]

Hi John,

Thanks for the suggestion.

This is an interesting idea.  The -ee option already
extracts information from some types of documents embedded
within PDF files.  Presumably this could be expanded to include
embedded images, but the amount of data extracted like
this could easily get overwhelming, and would certainly
be hard to manage.

As far as MS Office documents go, it is unlikely I will do more
than extract the metadata tags currently being extracted.  These
formats are the ugliest I have seen, and are a real pain to
work with. (They are designed from the ground up based on the
fixed 512-byte floppy disk sector size.)

- Phil