Weblogs: Semantic Web

Machine readable and digestible knowledge is the future, the question is always when are we going to get there. Here we examing tools and websites that form part of this idea.

Friday, October 20, 2006

WSG Microformats Talk, London 2006

Stuart Colville's second WSG meeting was yesterday night, on the topic of microformats. Speakers were Norm Francis, Jeremy Keith and Drew McLellan. I've known about microformats since the XFN and XOXO days, XOXO being particularly attractive as a semantic container for outline-type information trees. I see microformats being implemented, but the perennial question of "Now what?" remains for me unanswered.

Capitalisation battle

Norm compared the history of the W3C's uppercase Semantic Web initiatives with the lowercase microformats effort, and for him, microformats have made visible progress while the W3C's efforts currently fail to offer practical solutions to real world issues.

Invisible metadata

Semantic web (either upper or lower case) have to do with metadata - or adding extra semantic value to content. Metadata is one of the most overlooked pieces of information out on the web particularly because of one problem - it is invisible. And so out of sight (or site) is out of mind. Microformats works its way around that problem by being right there in the middle of the content so that the web developer cannot easily ignore it.

People first

The definition of microformats contains the element of serving people first, and machines second. Norm points out this is useful, because the number of machines consuming microformats is completely dwarfed by the number of web developers that author microformatted data.

This highlights one of the factors of encouraging valuable and useful metadata - keeping the barrier to authors as low as possible. Microformats takes on board the arguments raise by Cory Doctorow's piece on metacrap, and walks away with a superior system for encouraging these extra semantics.

There is an element that microformats are designed to be machine readable, I have doubts on this, particularly on aggregating microformats across a range of websites.

W3C failure

On the other hand, the vision of an uppercase Semantic Web has made very little ground in the Web world. Its seen as being an area that can only be understood by very smart people. To be honest, it is a fair reflection. When looking at the Semantic Web through the XML serialisation of RDF, it certainly isn't as straightforward as HTML. And that obscures the real elegant simplicity of triples.

I agree, the uppercase Semantic Web isn't as easily adoptable by web developers as Microformats. Each microformat is designed to solve one problem, and designed in such a way as to allow microformats to be embedded within other microformats (that's a point I hadn't grasped until this talk).

And yet, developers shouldn't have to deal with the complex plumbing of the uppercase Semantic Web. That should be done with tools that generate this metadata - like the Exif, an RDF tagging format that's embedded into JPEG images by digital cameras

But that's a cop out: "Should be tools" is a stark proof that the uppercase Semantic Web isn't ready for real world adoption yet. The tools aren't there so developers have no choice but to handle the complications of XML-serialised RDF and XML namespaces, knowing that the typical browser user will find these efforts completely wasted unless there's a web-friendly representation, like an HTML page.

Write content once

Norm's other point about the benefits microformats offer over the Semantic Web. Microformats marks up existing content right where it is stored. There's no need to create a second copy of your content in a different format to actually use microformats. The content exists once and once only. And that's important.

The Semantic Web approach is harder. I don't know of a way human approachable and authored content can exist in the same document as machine-friendly RDF serialisations without a fair degree of duplication, separate documents, or markup approaches that make it fairly likely browsers will just fail to render sites.

Microformats solves the big problem the Semantic Web is ignoring. Making semantic web concepts usable today in the technology that's already out there today. This is in opposition to the approach of high intellectual abstract approach, that however logically beautiful, fails the practical test of working in the real world in real browsers.

And so Norm fulfilled his role of the ghost of Semantic Web past, we turn our attention to the semantic web present, with Jeremy Keith.

Jeremy Keith

Jeremy led us through practical examples of using microformats on websites, starting from the simple one attribute rel-license and XFN, through to more complicated hCard, and right through to hCalendar and hReview.

Its a little startling that so much logical semantic structure can be expressed in very little space and with very little impact on the overall structure of the document. Microformats work with the markup structure you already have, it doesn't add in any elements, although in some cases, where an appropriate element hook is not available, then our good friend the span element is used to provide that container.

Looking ahead

Drew's talk was impressive. He tackled the relatively new area of using microformats, extracting microformats from pages. His position was that microformats can be used instead of having to create a machine friendly API.

Drew's approach was engaging, and he breathed life into what could easily be a dry subject. By contrasting Web services APIs with microformats already available, Drew points out that microformats can effectively replace a number of read-only type services.

Drew's pragmatic approach made the strong point that sometimes what is delivered by APIs is already available as microformats. Working his way through Flickr, Del.icio.us and Cork'd, he shows microformats already delivering what you'd normally get from an API, and in some cases, getting more and better information.

State of unrest

I was disappointed in that the REST services being talked about were perhaps the furthest from REST. When you have a query string containing a method name, you're not doing REST, you are just doing a plain old remote procedure call. Drew's suggested microformat replacements are far more in the spirit of REST - using HTTP verbs as the basis of "method calls".

But that didn't detract Drew's core argument, it perhaps bolstered it even more. Microformats and REST, now that's a natural web like combination (experts, please forgive the obvious redundancy!). Drew's argument is compelling in its elegance and simplicity. It just works. People claim how XML-RPC just works. Microformats just works easier and better.

I guess the main drawback of microformats is that it assumes / requires a web standards based structure - well formed markup is absolutely essential to make microformats more than write-only. This has a corresponding positive spin, when microformats start to get aggregated on a regular basis, and applications emerge to mash up microformats, that will encourage websites to adopt microformats, and thus getting them closer to actually adopting web standards as part of their development process.

In fact, microformats is an excellent practical benefit of web standards. Particularly for cost-conscious organisations wanting to share data efficiently on the Web.

Questions and Answers: Accessibility

I nipped in with my stock standard accessibility question on what the potential implications of microformats on web accessibility. Jeremy had a clearly well-thought out answer to hand, which demonstrates that accessibility has been considered, and not as an add-on. Essentially, microformats create no new barriers for people with disabilities - in the very rare case this may one day happen, the microformats are ignorable. But there's a great wealth of richness on offer to assistive technologies - in the microformat itself there are identifiers and unambiguous data that can aid understanding and comprehension. It is up to the assistive technologies vendor to take advantage of these microformats in their devices.

Assistive technologies that support web standards should take microformats on board, and use it to make content more accessible to their users. The very idea of extracting contact details directly from the page so as to contact the author directly is a compelling use-case. Being able to extract out the summary of a review (along with what's being reviewed) is another compelling accessibility technique - breaking down content into simpler constructs.

Concerns

I harbour some concerns about microformats. Cork'd was a great example of an aggregate of reviews - on one page there were all the reviews of a particular wine. That neatly solves the problem of the ambiguity of what is being reviewed, its taken for granted that all the reviews on the page are reviews of the one wine on that page. It feels a very siloed way of aggregation.

Now taking two hReviews of something called Lord of the Rings: Fellowship of the Ring from two separate and independent websites, perhaps an additional hReview from a third website of something called Fellowship of the Ring, there's unresolved ambiguities like:

Am I sure that they are talking about the same thing (are all three talking about the same movie - the cinematic edition)
Two different names for the same thing - how do I reconcile them

Essentially what seems to be missing, which will present some problems in machine parsing microformats, is the idea of a unique identifier for object that we describe in microformats. This is a solvable problem, it just detracts slightly from the elegant simplicity of microformats. But its not a show stopper; content marked up in microformats won't suddenly stop working - they can always be transformed into the next thing (if the Semantic Web appears in a practical form in the next 30 years).

Conclusion

I walked into this talk with a skeptical but open mind. Microformats seemed to be just too basic for practical use in applications on the web. Norm was very convincing in both his passion and well-thought out arguments of why the Semantic Web hasn't been useful, and why microformats are. Jeremy was his usual excellent self in talking through the practical aspects of marking up content with microformats. Drew was just spectacular and engaging in giving us a good look at the potential (and already some practical real-world examples) of microformats in a machine readable context.

I was a little disappointed to miss Tantek Celik's microformats presentation at @media 2006 (there's absolutely no way I would miss an Andy Clarke presentation), but this event, which was far more engaging than I'd hoped. I walk away impressed with the practical usefulness of microformats, an eagerness to relook at my markup from with the microformats tinted glasses, and a deeper interest in the machine readable potential of this very useful semantic tool.

Congrats to Stuart for bringing together a very high quality talk, and still keeping it free of charge. Now this is what truly is a grassroots event.

Related resources

Christian Heilmann: The WSG Meeting on Microformats in London

Older Posts:

[05/06/2006] World Cup prediction using the hcup microformat
[01/10/2005] OPML - the XML format with no friends
[14/07/2003] Bill Kearney on Syndicating Topics
[22/05/2003] Webmonkey teaches RSS
[13/05/2003] Renaissance Web / Semantic Web
[03/05/2003] RSS' troubled past, present and future
[17/04/2003] Surfing or Sucking on the Web?
[20/03/2003] RSS Search Engines
[21/02/2003] NewsMonster irks Mark Pilgrim
[02/01/2003] Tim Bray Article about XML
[21/10/2002] Mitch Kapor, Agent and organising knowledge
[17/10/2002] The Mozilla Application Platform
[11/09/2002] Google News Plex

[ Weblog | Categories and feeds | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 ]