ebooks and exiftool

Started by Lplates, April 19, 2014, 12:23:28 PM

Previous topic - Next topic

Lplates

Hi All,
I notice PDF's are in exiftools list.
just wondering if exiftool will ever be able to handle other eBook types (there must be a dozen or so different types)
'cause my eBooks collection is in much the same mess as my picture collection was before exiftool came along :)
Dave.


Phil Harvey

Hi Dave,

Other ebook formats?  I haven't any experience with these, and until now haven't had any requests to support them.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Lplates

Ah!
Thought that might be the case after searching this forum :)
there must be about a dozen "common" formats and loads (20+) others with some related to others!
and I'm not even sure they all or most have a concept of meta data in them!
If I get the chance I'll look into some of them...
worth asking :)
Dave.

MC1953

Dear Phil,
I also have been searching for ebook support in exiftool, and thus saw this post. The follwing link gives an idea about the ebook formats.
http://en.wikipedia.org/wiki/Comparison_of_e-book_formats
In my experience, .epub format and .mobi formats are the most popular ebook formats. It will be a great help if you can include these file formats in exiftool
MC

Phil Harvey

Wow, that's a lot of formats.  And many of them are proprietary. :(

But .epub and .mobi are open formats based on XML and XHTML.  ExifTool already has limited XML and HTML support, so might extract some information from them already.

Another problem is that I don't have any sample files of these formats.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MC1953

Any number of epub or mobi(also called "kindle" sometimes because Amazone's Kindle e reader support this format) can be downloaded from  http://www.gutenberg.org  ( which is a legitimate download site for ebooks).

I am attaching one book in both epub and mobi format(downloaded from  http://www.gutenberg.org/ebooks/76. These are without images(to keep the attachment size small).

The same books with images (approx 12 MB) can be downloaded from  http://www.gutenberg.org/ebooks/76

MC

Phil Harvey

Thanks.  Just taking a quick look at these files, it may be possible to extract metadata from the .epub file.  It is a zip file with metadata sorted in XML format in 76/content.opf.  The difficulty is that I must first parse META-INF/container.xml to determine the location of this content file.

The .mobi file will be more difficult.  It doesn't even seem to have a unique magic number at the start of the file for identification.

I will do more work on these when I get some time.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MC1953

#7
Please see whether the following link is helpful. The content of the link is Greek or Latin to me  ;D
http://en.wikipedia.org/wiki/EPUB

I  found a few links which explains .mobi format in different ways. But since I could not understand the contents, I am not sure whether those links will be of any help to you. However I am including them also below

http://wiki.mobileread.com/wiki/MOBI

http://www.helenhanson.com/ebook-formatting/amazon-mobi-and-azw-file-format-v-s-epub-file-format-%E2%80%93-what%E2%80%99s-the-difference-dog-or-cat/

http://filetonic.com/file-extension/results/mobi

MC

MC1953

The following link gives a Python Program for reading mobi book contents. Informative.

https://github.com/kroo/mobi-python

MC

Phil Harvey

ExifTool 9.63 (released last weekend) now supports EPUB, MOBI and AZW e-books.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Nichtraucher

I'd be very interested in to see write/create support in Exiftool! :-)