The Structure of Things To Come

There’s something new and important you bloggers should all know about: Structured Blogging. Here’s what it is, and why it’s good.

The World Wide Web, which has been around for about fifteen years now, was conceived as a way to make it easier and more intuitive for people to get around the Internet. Prior to its development, if you wanted to access a file on the Internet, you had to know the file’s exact name and network location. Tim Berners-Lee‘s brilliant inspiration was to make use of markup language – already in use for formatting the diplay of text documents – to embed the addresses of other files into human-readable text pages, and to devise simple software applications – browsers – that would navigate to those embedded links when the user clicked them.

As you probably know, the idea caught on rather well, and the World Wide Web has become quite popular – so much so, in fact, that a new industry, Web search, was born.

In considering Internet search, keep in mind that Web pages are just ordinary text documents, into which additional symbols are added to tell the browser application about how to display the page. The language used is HTML, an abbreviation for HyperText Markup Language. Aside from the embedded links, the markup is used almost entirely for formatting the page’s appearance. Search engines crawl around the Web, read and index the text of the pages they find, and follow the embedded links to other pages. When you make a search on Google, you are simply asking them to tell you about any pages they have found in the past that contain the keywords you provide. But there is no way to conduct a search based on the semantic content of the page – what sort of page it is, what the page really is about – because, for the computers that do the crawling and indexing, making those sorts of discriminations is very difficult. In other words, the Web, in its current form, is designed to be read and understood by people, not computers.

But as ubiquitous personal computers and broadband access have made the Web more and more of an essential tool, it has become apparent that it would be very helpful for Web pages to have a standardized way of announcing information about themselves in order to enable them to be discovered and categorized not just by human beings, but by simple computer programs. The value of this has been apparent for quite a while, but the difficulty has been that most Web pages are created by individual designers, and there has been no standard and simple way for these designers to embed such machine-readable content. They would make the effort, of course, if there was some value to them in doing so – that is, if there were services and search engines that would take profitable advantage of such embedded “metadata” – but of course such services wouldn’t exist until there was content for them to read. So the whole thing has been a bit of a “Catch-22” up till now.

Finally, though, a group of interested parties, led by PubSub Concepts, where I work, realized that the only way to break the logjam would be to make the creation of machine-readable semantic metadata so simple that developers would not mind doing it. So the group, aware that the blogging community was a natural group to be “early adopters” of this new practice, commissioned the development of a set of simple tools for creating this structured content with the most popular blogging software. What this means is that if you are using WordPress, say, and wish to write a post that falls into particular category – for example a review of a book you’ve read – upon opening your post-writing interface you simply select the “Review” template, and go ahead and write your post. You’ll also have some special fields in which to enter the book’s title, author’s name, etc. When you now publish the post, there will now be dedicated fields in the page – specifically intended to be read by search tools – containing all of this information. So if a Web user later wants to find reviews of the book, it will be possible for search services to identify such posts, and return them, without all of the chaff and junk that would be fetched by a simple keyword search.

But creating “structured content” goes far beyond book reviews. Once the idea takes hold, it is almost infinitely extensible: items for sale or wanted, announcements of upcoming events, job postings, personal ads, special text, graphical, or audio content, pretty much anything you can think of – all will be able to selectively pop themselves up once the right search tools are in place. At PubSub we are pioneering a new kind of search called “prospective” search – I’ll write a post about that next time – that is perfectly suited to this new paradigm.

The real point is that it is not about the tools, it’s about the activity. We’ve gone out of our way to make the tools easy to use, and they cost nothing. There are simple plugins for MovableType and WordPress, and more are on the way. Please take a look.

Finally, if you’d like to see what Structured Blogging posts look like (I’ll be doing some of my own, of course), take a look at this. Talk about “early adopters”!

Related content from Sphere