Naked Metadata

Jonathan O'Donnell

"But in winter the tree stands cold and naked, nothing can be hidden from view. The true souls of both the tree and its artist are exposed to the world's scrutiny."

Colin Lewis, Bonsai: The Naked Truth

The problem

Metadata in Web pages often doesn't get updated when the pages get updated.

The solution

Tag data, and point to it from the appropriate metadata field. Ian Davis has developed RDF in HTML to provide a way of doing this.

Example

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://purl.org/NET/erdf/profile" >
<link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" />
</head>
<body>
<h1 class="dc-title">Naked Metadata</h1>
<h2 class="dc-creator">Jonathan O'Donnell</h2>
<p class="dc-rights">http://purl.nla.gov.au/net/jod/tutorial/naked-metadata.html © Jonathan O'Donnell <span class="dc-date">23 October 2005<span></p>
</body>
</html>

Background

When I first learned to put Dublin Core into Web pages, I often found myself replicating data. I would place a DC.creator tag in the head, even though the name of the author was on the Web page. This annoyed me, because I knew that it is bad practice to replicate data like that. When I mentioned this to a workmate at the time, he said that I could probably make a link from the metadata field to the data in XML. At that stage, I didn't understand enough XML to even understand the concept, much less make it work.

Fast forward eight years to DC-ANZ 2005, where Eve Young and Baden Hughes made the point that people updating Web pages often don't update the metadata. One of the problems that they talked about was that metadata in the header is essentially invisible to people editing the page (when, for example, using some wysiwyg editors).

In general, data (including metadata) should be stored in one place only. This prevents drift: if it is only stored in one place, it can only be updated in that place.

Often, the information that we want to store as metadata already appears in the Web page. Examples include the title, description (especially as opening paragraph) and the author's name. In footers, we often find rights information, the Web address, and date information.

If this information already exists in the data, and we replicate it in the metadata, there is the danger of drift. Perhaps pointing to the data from the metadata fields is a way of preventing drift, and ensuring that the metadata is as up-to-date as the data.

Method

Ian Davis, of Talis (UK), has developed RDF in HTML, which allows us to point to the data from the metadata fields. The system uses 'class' attributes to delineate metadata information. Many Web developers already use 'class' attributes to style particular aspects of a Web site.

To use RDF in HTML, you should add

In his description of RDF in HTML, Ian Davis shows that it can be used for much more than this. Here, I have just shown how to embed basic Dublin Core metadata in the body of your Web page.

Harvesting

It is all well and good to put metadata into a document. You have to be able to get it out again for it to be any use.

RDF in HTML is designed to be harvested by Gleaning Resource Descriptions from Dialects of Languages (GRDDL). GRDDL is a mechanism for "getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT".

Although the example in that document illustrates extraction of DC metadata from <meta> html elements, there would be no reason why the mechanism should not extract the metadata from arbitrary elements identified by id; it is just a different XSLT transformation.

Alan Cox, Post to DC-General mailing list, 2 November 2005

One example of an extractor that will parse RDF in HTML is the Embedded RDF Extractor. You can use this extractor to check that you have built your page correctly.

Future developments

Misha Wolf pointed out that XHTML2 tackles this problem well.

World Wide Web Consortium, "Introduction to XHTML2.0: Major differences with XHTML 1", http://www.w3.org/TR/xhtml2/introduction.html#s_intro_differences, accessed 2 November 2005

As far as I can see, this means that:

  1. Meta elements can appear in the body of the document, not just the head
  2. Any element can link to them.

References