Solvent: Semantic data from almost any page

Spending a weekend in Chicago last month and looking for a non-starbucks coffee shop in the loop, I was frustrated to find that the otherwise very handy delocator.net didn’t have an option to limit a search to a radius of less than 5 miles or to plot a group of results on a map. We eventually gave up and went to one of the many Starbucks highly visible in our immediate vicinity.

Of course, I could have written a scraper to pull the data off delocator’s results page and produce a map from it, but it would likely have taken more than the 5 minutes I had available. What I needed was Solvent. According to its creators at the Simile project, Solvent is “a Firefox extension that helps you write Javascript screen scrapers for Piggy Bank” and their screencast displays someone solving exactly the problem I found myself faced with.

As the screencast shows, extracting data from any page that has some structure to it is as simple as firing up the plugin, highlighting a few lines and selecting an appropriate description for them. The interface will feel familiar to anyone who’s worked with javascript debuggers, and it only takes a couple of minutes to get the data off the page, into PiggyBank and—thanks to PiggyBank’s google maps integration—onto a map.

For those who are comfortable with the DOM and Javascript, this is a fantastic tool. Along with the growing suite of microformats and the Greasemonkey scripts Mark Pilgrim is writing to parse them, this project shows that we’re rapidly moving towards a world where a decentralized store of semantically-rich information is possible.

Simile even have a companion project, Semantic Bank, that provides long-term storage of the captured data. It would be nice if users were prompted to set up an account with that (or other semantic banks) when they first install Piggy Bank. Coupled with some UI developments to make both Solvent and Piggy Bank more accessible to the non-technical user, and we could quickly see publishing data to the Semantic Web become as simple as blogging.