Feb 08

Using the Django ORM as a standalone component

Django logoI’ve found myself working in Python lately, both for a new project and while preparing a review of the Django book. Working in Ruby I’ve become used to relying on ActiveRecord whenever I need to talk to a database (whether inside Rails or not) and after a little time refamiliarising myself with MySQLdb I realised I was going to need a decent ORM for this project. As well as being road-tested and well documented, I was also looking for something that could either generate its models on the fly, or had a tool to generate them from an existing schema.

I checked out SQLAlchemy and SQLObject, both of which looked like worthy contendors, but couldn’t find the generator I was looking for (if you know of one, please do let me know in the comments!) so I switched over to Django. I couldn’t find much information on using their ORM standalone, so thought I should share what I discovered. Please bear in mind that my python is rather rusty–if there are better ways to do any of this I’d be very pleased to hear about it.

The tool to generate your model classes from an existing database is django-admin.py which should be in your path if you’ve got Django installed. To get that up and running you’re going to need to set a few options.

First up, you’ll need a settings.py file specifying your database details. Mine looks like:

DATABASE_NAME = 'mydatabase'
DATABASE_USER = 'myusername'
DATABASE_PASSWORD = 'mypassword'

Once that is done and saved, then from the command line you should be able to call:

django-admin.py inspectdb --settings=settings

and the models will be echoed to stdout. To redirect that to a file called models.py you’ll need:

django-admin.py inspectdb --settings=settings > models.py

I found a few areas where the generated objects threw errors. We have a column called ‘try’ which is a python keyword, so that required a small change to the code. And a number of models have foreign key relationships with other models that are declared after them, so a little reorganising was called for. It’d be really nice to have the script handle that, but it wasn’t a big deal to make the changes by hand.

With that done, I tried a very simple new script:

from models import *
a = MyModel.objects.all()
print a

But ran into a few issues:

The ORM is relying on the environment variable DJANGO_SETTINGS_MODULE to tell it where to find your database credentials. You’ll need to set that to point to the settings.py file you created earlier.

You can’t have your models.py file in the same folder as your test script. Doing so throws a “IndexError: list index out of range” error. Instead make a new folder called, say, orm and put models.py in it, along with an empty file called __init__.py which will tell python to treat the folder as a module. You can then update your test script to:

from orm.models import *
a = MyModel.objects.all()
print a

With that done, you should be able to use the ORM just as you would in your Django view code.

Aug 06

Collage Mk. 2: Now With Separation

Last year I posted a few times about the aggregation code I wrote to allow Greenbelt to collect festival-related content scattered around the web and republish it. What I may not have gone into was how frustrating that code tended to be to work with, written in a rush before the festival and heavily patched while on site.

This year, with longer to prepare, I decided to throw that one away and start again. I chose python as the language again, partly because I wanted to use some python libraries and partly because it seemed time to get some more python practice in. I also decided that rather than have the parsers for each service (currently technorati, del.icio.us, flickr, pubsub, and magnolia) each update the database, it was time for some abstraction and layering.

This time around I’ve written independent extraction classes for each of the services I want to use, with each returning its data as atom entries. That atom is then fed into a ‘reasoner’ that checks whether we’ve already seen the entry, and creates or updates our store accordingly. Using atom as the intermediary made sense as much of the data is already sourced in atom (or forms that map closely to it) and the requirements for a unique ID and updated time make updates simple to manage.

It’s also ready-serialized, and so nice and portable. To test each component is working, I just have to inspect the atom code produced, which is easy to do visually or programmatically. If I wanted to spread the code across servers it’d be trivial to do so using a toolkit such as WSGI and the Atom Publishing Protocol.

With the Universal Feed Parser for parsing, and SQLObject for database abstraction, there’s a lot I don’t have to worry about.

The festival is not yet upon us, so the code has yet to be battle-tested. With over 1200 photos posted on flickr last year and a much bigger push this year, we’re expecting a lot of content. It’s good to know we have a cleanly separated, maintainable code base this time around. If it works as well as it should, I’ll try to publish the code somewhere.