Posts tagged Python

Rack: Layering Ruby Web Apps

I’ve not used it myself, but conceptually I’ve always been very interested in WSGI (the Python Web Server Gateway Interface). WSGI defines a standard interface between web servers and frameworks, giving python web applications the same portability that Java servlets enjoy, and also makes it much easier to layer code—with a standardised interface you can easily add in extra components to process your input and output before or after your main framework has handled it.

So say you had an application with a great web interface but no API to handle the input format of your choice. With WSGI you could intercept the API input and convert it before it ever hits the main application, making the whole process transparent to client and application. These two articles at xml.com are a good overview.

Ruby doesn’t have the same profusion of frameworks of python or java, but that transparent layering is still attractive, and a standardised interface makes it much easier for developers to put together experimental new frameworks without waiting for mongrel or another server to support them, or to build pluggable middleware. And that’s where Rack comes in.

According to Christian Neukirchen in an introductory blog post “dealing with HTTP is rather easy” and the core API of Rack is simply a method call that returns a hash of response code, response headers and response body. From that same blog entry:

class HelloWorld
  def call(env)
    [200, {"Content-Type" => "text/plain"}, ["Hello world!"]]
  end
end

There aren’t many examples out there yet, but Johan Sørensen has a very simple example framework and a bit more discussion on his blog. There are also several sample adapters for existing frameworks available. What would be really nice to see next is an implementation of the atom publishing protocol using Rack, along the lines of this WSGI implementation. This could well be a project to watch.

Better than BASIC?

David Brin complains about the difficulty of obtaining BASIC for modern computers, in a piece published yesterday on Salon. He’s been trying to teach his son to code, starting with simple algorithms and developing a good sense of what the computer is doing as it processes each step. Java and C++ are considered too complex for this purpose, and he seems to consider most scripting languages to be too high-level:

The “scripting” languages that serve as entry-level tools for today’s aspiring programmers — like Perl and Python — don’t make this experience accessible to students in the same way. BASIC was close enough to the algorithm that you could actually follow the reasoning of the machine as it made choices and followed logical pathways. Repeating this point for emphasis: You could even do it all yourself, following along on paper, for a few iterations, verifying that the dot on the screen was moving by the sheer power of mathematics, alone. Wow!

I’m not convinced. Sure, some versions of BASIC let you get pretty low-level with PEEK and POKE, but there’s a perl library that will give you that level of access. And while Perl, Python and Ruby will provide a lot of high-level features, most of the time you don’t have to use them. You can go back to the core of most algorithms and, yes, follow the reasoning.

Meanwhile, you get an easy transition to more refined programming techniques. As a BASIC programmer who wanted to switch to OO techniques, you had to learn a whole new syntax. Learning new languages is a good thing, but it’s nice to be able to encounter new paradigms in a familiar environment.

Of course, for those who insist on BASIC, you can always follow Brin’s lead and buy a Commodore 64 on eBay, or there’s BBC Basic for Windows.

Collage Mk. 2: Now With Separation

Last year I posted a few times about the aggregation code I wrote to allow Greenbelt to collect festival-related content scattered around the web and republish it. What I may not have gone into was how frustrating that code tended to be to work with, written in a rush before the festival and heavily patched while on site.

This year, with longer to prepare, I decided to throw that one away and start again. I chose python as the language again, partly because I wanted to use some python libraries and partly because it seemed time to get some more python practice in. I also decided that rather than have the parsers for each service (currently technorati, del.icio.us, flickr, pubsub, and magnolia) each update the database, it was time for some abstraction and layering.

This time around I’ve written independent extraction classes for each of the services I want to use, with each returning its data as atom entries. That atom is then fed into a ‘reasoner’ that checks whether we’ve already seen the entry, and creates or updates our store accordingly. Using atom as the intermediary made sense as much of the data is already sourced in atom (or forms that map closely to it) and the requirements for a unique ID and updated time make updates simple to manage.

It’s also ready-serialized, and so nice and portable. To test each component is working, I just have to inspect the atom code produced, which is easy to do visually or programmatically. If I wanted to spread the code across servers it’d be trivial to do so using a toolkit such as WSGI and the Atom Publishing Protocol.

With the Universal Feed Parser for parsing, and SQLObject for database abstraction, there’s a lot I don’t have to worry about.

The festival is not yet upon us, so the code has yet to be battle-tested. With over 1200 photos posted on flickr last year and a much bigger push this year, we’re expecting a lot of content. It’s good to know we have a cleanly separated, maintainable code base this time around. If it works as well as it should, I’ll try to publish the code somewhere.

Feed Parser: Universal Feed Parser Tests

Inspired by Sam Ruby’s work on applying the Universal Feed Parser tests to the Ruby FeedTools, I’ve spent a little time this afternoon working on testing XML_Feed_Parser with that same test suite. There’s a lot of work to do!

UFP’s tests consist of a series of feed files, some well-formed, and some illformed, with a description and test condition defined at the top of the file. eg.

<!--
Description: channel description
Expect:      not bozo and feed['description'] == u'Example description'
-->
<rdf :RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/">
<channel rdf:about="http://example.com/index.rdf">
<description>Example description</description>
</channel>
</rdf>

So far all I’ve done is run a script through all the tests for well-formed feeds, testing whether XML_Feed_Parser throws an exception when I try and interpret them. When run against the current CVS, 1181 of the 1273 feeds parsed successfully and 92 failed. 68 of those failures were due to encoding problems (which I’ll try and work around, but won’t be able to cleanly fix until PHP has full unicode support), and another 17 were a result of not supporting CDF, leaving another seven I need to get fixed asap.

The next stage will be to translate the ‘Expect:’ values into something I can use in a PHP test case. I’ve done a little searching for a python lexer for PHP, but aside from this embedded interpreter that hasn’t had a release in nearly three years, I haven’t found one. Lacking the time to write such a beast myself, I suspect I’ll simply put together a series of regexps to do the translation necessary.

Of course, XML_Feed_Parser’s API differs in quite a number of ways from that of the Universal Feed Parser and so quite a few of those tests—unadjusted— would fail. As Sam points out, there would be numerous advantages to (roughly) sharing an API with the Universal Feed Parser, particularly in allowing programmers to easily switch between languages and in the fact that the documentation already written would also apply to XML_Feed_Parser which is (as yet) undocumented. I’m going to spend some time thinking through the implications of making some API adjustments to fit more closely, but I’d love input on how far I should go (is it worth breaking backwards compatibility?)

Greenbelt Collage Updates

After far too long a day of travelling I’m back in the US. The Festival was great, though as tiring as ever (jetlag compensated for the more relaxed on-site schedule). I’ve spent the day catching up on feeds and email, and tinkering with the collage code.

For the most part it’s been working well and new content has been picked up pretty quickly, particularly since I added in a technorati watchlist for links to www.greenbelt.org.uk. The one exception was from flickr, where often there’d be more new photos between checks than were included in the feed, meaning that we only had ~100 of the 500 posted. As a quick fix I added the individual ‘greenbelt2005′ feeds for several of the more active flickr users, but now I’ve rewritten the code to use the flickr API to check for all new photos within the last 45 minutes (we check every half an hour, so 45 minutes should make sure nothing falls through the cracks) and pulls them in that way.

The use of the API turned out to be very simple, particularly since micampe.it’s FlickrClient library saved me from having to parse the XML. Making sure we didn’t get duplicates in the database, syncing tags, and getting the content together was a little trickier but didn’t take too long.

Next up will be some code to go and fetch the content of pages added via delicious/technorati and gather more of the content than their feeds provide, which should mean I can check whether pages picked up through technorati have tags other than the core ‘‘ tag. And of course very soon I should build some nicer navigation options on the frontend so people can start making better use of the content.