Semantic Web

RDF Resources

Cleaning out a few tabs that have been open for too long…

Josh Tauberer (of govtrack.us fame) has launched aboutrdf.net with the hope of turning it into a first destination for those looking for primers on RDF. It’s a resource that’s been needed for a long time and hopefully will help demystify RDF.

And on a related note, Danny Ayers has started a Microformats FAQ for RDF developers. It’s a useful resource for those looking to understand how microformats can fit into the Semantic Web agenda and will hopefully raise the level of discussion between the two communities.

Solvent: Semantic data from almost any page

Spending a weekend in Chicago last month and looking for a non-starbucks coffee shop in the loop, I was frustrated to find that the otherwise very handy delocator.net didn’t have an option to limit a search to a radius of less than 5 miles or to plot a group of results on a map. We eventually gave up and went to one of the many Starbucks highly visible in our immediate vicinity.

Of course, I could have written a scraper to pull the data off delocator’s results page and produce a map from it, but it would likely have taken more than the 5 minutes I had available. What I needed was Solvent. According to its creators at the Simile project, Solvent is “a Firefox extension that helps you write Javascript screen scrapers for Piggy Bank” and their screencast displays someone solving exactly the problem I found myself faced with.

As the screencast shows, extracting data from any page that has some structure to it is as simple as firing up the plugin, highlighting a few lines and selecting an appropriate description for them. The interface will feel familiar to anyone who’s worked with javascript debuggers, and it only takes a couple of minutes to get the data off the page, into PiggyBank and—thanks to PiggyBank’s google maps integration—onto a map.

For those who are comfortable with the DOM and Javascript, this is a fantastic tool. Along with the growing suite of microformats and the Greasemonkey scripts Mark Pilgrim is writing to parse them, this project shows that we’re rapidly moving towards a world where a decentralized store of semantically-rich information is possible.

Simile even have a companion project, Semantic Bank, that provides long-term storage of the captured data. It would be nice if users were prompted to set up an account with that (or other semantic banks) when they first install Piggy Bank. Coupled with some UI developments to make both Solvent and Piggy Bank more accessible to the non-technical user, and we could quickly see publishing data to the Semantic Web become as simple as blogging.

Learning By Copying, Conversing, and Interacting

Ryan’s been writing some thought provoking posts on microformats and related topics of late. In The Self Organized Web he pulled up this (two year old) quote from Tim Bray:

RDF has ignored what I consider to be the central lesson of the World Wide Web, the “View Source� lesson. The way the Web grew was, somebody pointed their browser at a URI, were impressed by what they saw, wondered “How’d they do that?�, hit View Source, and figured it out by trial and error.

Early adopters and other techies learn by looking at others’ source code, and I suspect it’s right to say that the web took off because of those of us who looked at someone else’s HTML and said “I can do that.” But it’s innovation, not general content production that’s being sustained by those people. Increasingly web content isn’t being produced by-hand, or even using HTML editing tools, but through content management systems (from blogging tools on up). In his entry Tim responds to that, preceding comments on the RDF/XML syntax with:

At this point, the RDF evangelists pipe up and say “Well, Ordinary People ™ don’t have to look at the source, there will be tools to sort all that out.â€? Sorry, I just don’t believe that. If, in 1994, you’d needed DreamWeaver or equivalent to write for the Web, there wouldn’t be a Web today.

While that’s undoubtedly true (and I’d agree that RDF/XML is not a nice syntax), I’m not convinced that the fact that one medium grew up without a particular toolset is reason enough to argue against another that requires a toolset. For myself, I learned most of my early HTML and coding skills from examining other peoples’ work, but I don’t have the wherewithal to have stopped there. Before I came into conversation with other people producing web pages and writing code, my HTML was sloppy and my code made little attempt at abstraction.

The abstraction may have come with time, but regardless of that, as the increasing prominence of Design Patterns illustrates, the formalism that allows tools to play well with others is something that arises once we’ve moved on from that initial learning by imitation to a stage of learning through conversing, and learning through working together.

If the Semantic Web is going to take off something may well need to be done to ease the transition from copying what works to understanding why it works. Semantic Web toolsets are less forgiving than web browsers have been. When using them it is more important that people have good examples to work from (not an equivalent of the slapdash HTML so many of us first encountered) understand why things are done within the parameters that they are.

The growth in knowledge of web standards, and increasing concern at all levels for semantically rich XHTML may be an important part of introducing people into the ways of thinking that RDF requires (or at least the problems it is designed to solve). Microformats, for example, are both a useful tool for structuring XHTML documents and also an entry-point into a growing realisation of the many facets of producing semantically-rich information representations.

In that regard, I’m very glad to see comments such as Danny Ayers’ latest piece of microformats that bridge the sometimes-portrayed-as-polarised camps of upper and lower case semantic web thinking. Perhaps there’s also space for some appropriately publicised articles on the limitations of the flat namespaces that HTML provides. This introduction to RDF by Tim Bray and edited by Dan Brickley is a good start, but there need to be more.

At the same time, those of us building tools bear the responsibility for making it easy to make users’ data semantically rich. It’s not hard (and it’s getting easier) to use XSL or some equivalent to produce multiple representations of a document, given enough metadata. When the representation is for a web browser, microformats provide a good framework for differing types of content, but scraping is not going to be enough to extract the more nuanced metadata that I believe we’ll increasingly rely on and for that, some variant of RDF is probably our best choice at present.

UPDATE: This piece by Eric Meyer (published a couple of hours after this entry went up) is well worth a look on the topic of microformats and semantics.

Microformats and extensibility

I’ve been following the chatter over microformats (XFN, xFolk, hCalendar, and their kin) for some time, but having been having a hard time formulating a response to all the discussion. In particular, the discussion over at Ryan’s blog and some postings such as this one by Danny Ayers have triggered further thinking.

The idea of ‘emergent semantics’ is an appealing one, and as many have argued lower-case semantics are far more likely to be adopted by a broad sweep of the web development community in the short-term than are carefully constructed XML vocabularies, or RDF representations of resources. But at the same time I fear that this sort of format will delay adoption of ‘true’ Semantic Web technologies, and balk at the apparent lack of extensibility the microformats offer.

As I work on plans for some future web app development, RDF has become more and more appealing because it is decentralised and allows for the representation of complex relationships between items. If I need to attribute a property that my current vocabulary doesn’t support, there is a standard system of drawing in another namespace which my tools can automatically understand. By contrast, (X)HTML only allows for rudimentary relationships, and there is no standardised way of indicating within the document which vocabularies tools should expect.

That’s not to say that microformats aren’t useful. Now that we have a large community of developers building standards-compliant sites it makes sense to work towards standardised class names for certain page elements and types of content. Having developed a number of screen-scrapers (and suffered the pain when a non-standards redesign then obliterates all that work), I’d love to be rid of the need to re-code whenever a manager decides a layout needs a slight change. But it will be to the benefit of all of us if we ensure that that standardisation doesn’t distract us from improving tools for generating and interpreting RDF, simplifying content-negotiation options, and otherwise making the web more interoperable.

RDF and GRWiFi

Looking to the future of the Grand Rapids WiFi site, I hope to see it become part of an integrated set of local websites supporting and promoting community development and local business. Geolocation seems to be the topic du jour, and while the site has for several months featured geodata about all of its locations the time seemed right to develop it further.

Today I’ve been adding RDF representations of almost all the data on the site. I’ve extended the RDF descriptions of each location to list all the comments on that location. The vocabulary for that is one I found over at FilmTrust and it means that almost all the useful content of the site can now be represented using RDF.

To tie it all together I’ve added an RDF/XML index of the site so that agents can easily get a complete list of WiFi locations listed on the site. Obviously a spider could have worked its way through the site and found all the data, but this removes a hurdle.

It’s remarkably easy to add RDF representations of data on a database-driven site such as this one. There was some tweaking to do, but mostly the work was constructing and validating a new set of templates. Now I need to start harnessing the new power.