Posts tagged semantics

Book Review: Refactoring HTML

Despite years of progress by web standards advocates, and a significant improvement in the quality of the HTML on the web, many of us still end up grappling with outmoded, broken HTML on a regular basis. When confronted with a large site filled with broken pages it can be hard to know where to start. Elliotte Rusty Harold’s Refactoring HTML offers a step by step recipe book for migrating such sites to clean, semantic code.

Harold’s is a well known name in the XML world, and that background shows through in how he approaches the book. While a general audience will probably find useful content, the reader needs to be prepared for a series of command-line and Java-based examples. Tools like tidy are featured prominently, as is the use of regular expressions to seek out broken code to fix and, in the music-to-my-ears category, automated testing.

If you’re equipped to do so, following these steps will lead to much cleaner, more manageable sites, but I found myself wondering how many of those comfortable with command line tools and regular expressions are in the market for a book like this.

In general I suspect the key audience for this will be IT departments inside large organisations tasked with refreshing or extending an intranet. For those developers, who maybe don’t spend much of their time working with HTML and like the idea of using scripting tools similar to those in their regular workflow, this book’s worth a look. If you’re already familiar with current trends in web development, then there are probably other ways of picking up on the scattering of techniques that might be new to you.

Disclaimer: I was sent a copy of this book for review by the publisher. You can find it at amazon US, amazon UK and all sorts of other places.

Microformats and extensibility

I’ve been following the chatter over microformats (XFN, xFolk, hCalendar, and their kin) for some time, but having been having a hard time formulating a response to all the discussion. In particular, the discussion over at Ryan’s blog and some postings such as this one by Danny Ayers have triggered further thinking.

The idea of ‘emergent semantics’ is an appealing one, and as many have argued lower-case semantics are far more likely to be adopted by a broad sweep of the web development community in the short-term than are carefully constructed XML vocabularies, or RDF representations of resources. But at the same time I fear that this sort of format will delay adoption of ‘true’ Semantic Web technologies, and balk at the apparent lack of extensibility the microformats offer.

As I work on plans for some future web app development, RDF has become more and more appealing because it is decentralised and allows for the representation of complex relationships between items. If I need to attribute a property that my current vocabulary doesn’t support, there is a standard system of drawing in another namespace which my tools can automatically understand. By contrast, (X)HTML only allows for rudimentary relationships, and there is no standardised way of indicating within the document which vocabularies tools should expect.

That’s not to say that microformats aren’t useful. Now that we have a large community of developers building standards-compliant sites it makes sense to work towards standardised class names for certain page elements and types of content. Having developed a number of screen-scrapers (and suffered the pain when a non-standards redesign then obliterates all that work), I’d love to be rid of the need to re-code whenever a manager decides a layout needs a slight change. But it will be to the benefit of all of us if we ensure that that standardisation doesn’t distract us from improving tools for generating and interpreting RDF, simplifying content-negotiation options, and otherwise making the web more interoperable.