Posts tagged wsgi

Rack: Layering Ruby Web Apps

I’ve not used it myself, but conceptually I’ve always been very interested in WSGI (the Python Web Server Gateway Interface). WSGI defines a standard interface between web servers and frameworks, giving python web applications the same portability that Java servlets enjoy, and also makes it much easier to layer code—with a standardised interface you can easily add in extra components to process your input and output before or after your main framework has handled it.

So say you had an application with a great web interface but no API to handle the input format of your choice. With WSGI you could intercept the API input and convert it before it ever hits the main application, making the whole process transparent to client and application. These two articles at xml.com are a good overview.

Ruby doesn’t have the same profusion of frameworks of python or java, but that transparent layering is still attractive, and a standardised interface makes it much easier for developers to put together experimental new frameworks without waiting for mongrel or another server to support them, or to build pluggable middleware. And that’s where Rack comes in.

According to Christian Neukirchen in an introductory blog post “dealing with HTTP is rather easy” and the core API of Rack is simply a method call that returns a hash of response code, response headers and response body. From that same blog entry:

class HelloWorld
  def call(env)
    [200, {"Content-Type" => "text/plain"}, ["Hello world!"]]
  end
end

There aren’t many examples out there yet, but Johan Sørensen has a very simple example framework and a bit more discussion on his blog. There are also several sample adapters for existing frameworks available. What would be really nice to see next is an implementation of the atom publishing protocol using Rack, along the lines of this WSGI implementation. This could well be a project to watch.

Collage Mk. 2: Now With Separation

Last year I posted a few times about the aggregation code I wrote to allow Greenbelt to collect festival-related content scattered around the web and republish it. What I may not have gone into was how frustrating that code tended to be to work with, written in a rush before the festival and heavily patched while on site.

This year, with longer to prepare, I decided to throw that one away and start again. I chose python as the language again, partly because I wanted to use some python libraries and partly because it seemed time to get some more python practice in. I also decided that rather than have the parsers for each service (currently technorati, del.icio.us, flickr, pubsub, and magnolia) each update the database, it was time for some abstraction and layering.

This time around I’ve written independent extraction classes for each of the services I want to use, with each returning its data as atom entries. That atom is then fed into a ‘reasoner’ that checks whether we’ve already seen the entry, and creates or updates our store accordingly. Using atom as the intermediary made sense as much of the data is already sourced in atom (or forms that map closely to it) and the requirements for a unique ID and updated time make updates simple to manage.

It’s also ready-serialized, and so nice and portable. To test each component is working, I just have to inspect the atom code produced, which is easy to do visually or programmatically. If I wanted to spread the code across servers it’d be trivial to do so using a toolkit such as WSGI and the Atom Publishing Protocol.

With the Universal Feed Parser for parsing, and SQLObject for database abstraction, there’s a lot I don’t have to worry about.

The festival is not yet upon us, so the code has yet to be battle-tested. With over 1200 photos posted on flickr last year and a much bigger push this year, we’re expecting a lot of content. It’s good to know we have a cleanly separated, maintainable code base this time around. If it works as well as it should, I’ll try to publish the code somewhere.