ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: kapouer on January 30, 2016, 10:56:19 AM

Title: idea: gettings tags from html files
Post by: kapouer on January 30, 2016, 10:56:19 AM
Every html file has a <title> tag, and many online files have tags to ease link inspection,
meta tags used to be useless (keywords and description have clearly been abused) but
schema.org, opengraph, twitter card, oembed all help at representing a web page with
a title and a thumbnail.
See https://github.com/kapouer/url-inspector/blob/master/lib/inspector.js#L383
for a simple example of what could be interesting to return as tags.
Title: Re: idea: gettings tags from html files
Post by: Phil Harvey on February 05, 2016, 08:30:11 AM
Have you tried running ExifTool on an html file?  It should return tags from the header section, including Title.

- Phil
Title: Re: idea: gettings tags from html files
Post by: pjux on June 23, 2016, 01:56:46 PM
Hello, I'm using exiftool to conduct analysis of metadata as OP described. While exiftool will pull metadata for HTML files, I noticed that it is a bit particular in what it will pull.

For example,

on a page with the following tags in the head, exiftool will not pull the metadata that is presented as meta property=


<meta property="og:type" content="article" />
<meta property="og:site_name" content="HHS.gov" />
<meta property="og:url" content="http://www.hhs.gov/about/budget/fy2017/fy2015-summary-of-performance/goal-1/index.html" />
<meta property="og:title" content="FY 2015 Summary of Performance – Goal 1" />
<meta property="og:description" content="FY 2015 Summary of Performance – Goal One: Strengthen Health Care" />
<meta property="og:updated_time" content="2016-02-22 00:00:00" />
<meta name="dcterms.creator" content="Office of Budget (OB), Assistant Secretary for Financial Resources (ASFR)" />
<meta name="dcterms.title" content="FY 2015 Summary of Performance – Goal 1" />
<meta name="dcterms.description" content="FY 2015 Summary of Performance – Goal One: Strengthen Health Care" />
<meta name="dcterms.date" content="2016-02-19 11:45:00" />
<meta name="dcterms.modified" content="2016-02-22 00:00:00" />
<meta name="dcterms.type" content="Text" />
<meta name="dcterms.format" content="text/html" />


It would be awesome if it could.

Exiftool is an amazing program and I really have found it useful.