Jan 12

Simple behind-the-scenes API authentication with OAuth2

Like many others I’ve been spending a lot of time with OAuth2 lately. The single-sign-on system we’ve built at GDS acts as a very simple oauth provider for our other apps (effectively just joining up the oauth2-provider and devise gems), and we’re probably going to be extending our API adapter code so that we can use it for those apps whose APIs need authentication.

What I’d not explored for a while was the simplest way to implement app-to-app oauth where there’s no UI for user interaction so over the New Year break I pulled something together for another project. It’s all pretty straightforward but not very well documented so I thought I’d better share.

The easiest thing to do if you want to allow an oauth client to work with your app is just to generate the ID, secret and access token for whoever’s responsible for the app and to provide them (securely) for direct use.

In order to do that in the rails app I was focussed on I knocked up a class to help me with that when using the aforementioned oauth2-provider:

and then a few rake tasks for interacting with it:

In the oauth-provider world, any “authorization” can be owned by a resource, which is any other model in your app. In a standard app like our SSO solution that’ll probably be a user, but in the app I’m working on here it’s an organisation that may have many users. You get access to that resource in your controllers with, eg:

And with that I had my API protected using everyone’s favourite standard authentication protocol.

Jun 10

JangoMail, lackadaisical security, and a workaround

A client recently asked me to integrate their site with the JangoMail mass mailing system. I wanted to keep them happy so agreed to investigate, but was horrified by what I saw in the JangoMail API documentation.

JangoMail appears to be optimised for those with existing databases of email addresses they want to maintain and contact. For those wanting to keep those databases in sync they offer a script you can download and install on your server that they can call with details of various actions (user unsubscribed, user clicked link, job completed, etc) as well as to extract the list of email addresses they should send a given campaign to. So far, so good.

The problem is in the implementation. Once you have downloaded and installed the script on your server, they ask you for your database credentials and then send these, along with an SQL query, over HTTP (or HTTPS) for the script to execute. In the FAQ attached to their blog entry on the topic the obvious question “Is this method of connecting to my data secure?” is asked, with the response:

Yes. It is inherently secure if you opt to have JangoMail connect over https instead of http. It can be additionally secured by restricting the range of IP addresses allowed to connect to the custom script file. JangoMail’s range of IP addresses are: –″

That answer is far from satisfactory. I refuse to give a third party my database credentials, still less to to execute arbitrary SQL received over an HTTP request, even if that is SSL and includes a password.

Surely if they want to keep the steps for the user to a minimum they could still provide an interface that takes credentials (via SSL)–along with appropriate other details like “enforce SSL?”–and generates a PHP script that contains those credentials embedded within it, along with code to generate the SQL from a set of parameters? It’s not a hard thing to do (witness the way wordpress/drupal/etc will generate a config file for you — it’s the same thing). That way JangoMail can take responsibility for making sure that the credentials are only ever sent a small number of times (and via SSL), and the endpoint can contain appropriate validation.

In this case I decided to take matters into my own hands and write a sane script to receive their input. There’s nothing forcing you to put genuine database credentials into their form, so instead I used those fields to provide a username and password I’d use to authenticate their request. In each of the boxes to enter the SQL for your queries I entered some code to generate JSON containing the relevant details.

With that done it’s a relatively trivial matter to parse the JSON, do any validation you may want to do and update your database accordingly. A rough-and-ready (and barely tested) script that does just that can be found in this gist. A perfectly satisfactory (if slightly laborious) solution for any competent web developer.

But of course JangoMail’s target market isn’t competent web developers. They’re clearly trying to target a general audience, and for that audience their lackadaisical approach to security is indefensible.

Mar 09

Selected (belated, extended) Saturday Links

The past two weeks haven’t really left time to compile my selected links, though there have been many. A few days at SxSWi (on which more, later) followed by travelling with the family and the inevitable work backlog moved blogging way down the priority list. So here’s a mammoth selection to get me caught up. Particularly interesting has been the discussion around the future of newspapers (represented here by Clay Shirky, Steven Johnson and Russell Davies), which seem to have finally pushed beyond “how t ind a good business model for papers” to looking at where the real value for society lies and how we can preserve and extend that in a changing landscape.

Apr 08

Ecampaigning Forum: Notes on Open Space sessions

Gathered feedbackWhile my live blogging efforts focussed on the more formal sessions at ecampaigning forum, most of the event’s time and content was spent in groups following the Open Space methodology. The gatherings for people to suggest sessions were instructive in themselves as they gave considerable hints as to the key concerns of ecampaigning practitioners.

How to engage with the big social networking sites, whether to create your own, organising around big events (such as G8 summits and climate conferences) and ways of managing decentralised/coalition campaigns were some of the big themes, but the sessions covered a wide range beyond that such as engaging with young supporters, or older supporters, choosing content management systems, operating on a tight budget, pooling resources/tools and one hastily agreed discussion of twitter. What follows are a few notes on things that struck me.

The twitter session drew a mixture of existing users, aware onlookers, and newcomers. A lot of time was spent exploring existing uses of the site with examples such as teamtibet‘s usage to co-ordinate protests around the olympic flame and Downing Street’s account. Most people seemed taken with its potential for short term co-ordination, but many questions arose about its potential for long term campaigning beyond informing core supporters of news updates. Being seemingly the longest-serving twitter user there, it was interesting to hear responses to a tool I’ve quickly come to take for granted

A recurring theme was the adoption of drupal by a number of the big agencies. Most seem keen to contribute code back to the community, along the lines of AI and CivicActionsassets module. I’ve mentioned my mixed feelings about drupal before but am hopeful that through events like this we might be able to resolve some of the issues that frustrate me.

I brought up Russell Davies’ 2008 – the year of peak advertising in conversation over breakfast on the first day and that phrase recurred a few times. There’s a general awareness that the last few years have brought lots of opportunities to attract attention by simply being quick to adopt some new “web 2.0” tool, but that won’t last. It didn’t seem like there was a sustained discussion or much sense of where to go next, but working hard to attain attention has been the life of campaigners for a long time and so perhaps this is just another step in that journey?

There’s clearly a growing sense of how hard it is to influence big summits where the final communique is often planned months in advance. Gatherings of world leaders are a great opportunity for media coverage and to present the “actionable moments” that Ben Brandzel spoke of, but they’re now when the real chance for change occur. It’s vital to find ways to turn the energy around these summits into sustained, directed action after the final communique is published, planning the next steps before the events themselves take place.

In the session on pooling resources and tools a number of questions came up about the ethics of collaborating with big players like google (who have just been on a big outreach programme for their new Google Earth offering for NGOs). The data provided and the tools offered by the likes of Google can be a great boon to charities operating on tight budgets, but at the expense of ceding a lot of control and a lot of attention data (and with providers like facebook there are concerns about things like this). It was obvious that there is some desire to develop open source tools that provide similar tools, but it’s not clear whether the resources are there. Mention was made of open street map and I brought up the theyworkforyou api, and it definitely would have been interesting to have had people who could present on the usage of that; some concerns remain as to how ready those tools are for non-geeky end-users, which would be easy to resolve if someone were to direct the right resources.

I’m looking forward to seeing what other people bring up in their notes on the event, and what themes come out in the ongoing discussion. You can see my photos on flickr, find some content on technorati and check out the conference wiki for more. All my posts on the topic are gathered under the ecf08 tag.

Oct 07

The MySpace platform: now official

The rumours of MySpace launching a platform or API have been floating for quite some time, but now as reported on the O’Reilly Radar they have been confirmed.

Over the next two months they are going to increase third-party access to their site. First, they are going to highlight the thousands of widgets that have been on their site for years now. This should be released in the next couple of weeks. I am assuming that it will go beyond the FIM’s Spring Widget Gallery. Second, they are going to offer an API for applications to all developers. However, these applications are going to be sandboxed initially and 1-2 million users will have access to them. If the users deem the applications safe and useful they’ll be available to all users. Developers will be able to advertise in their applications.

It’ll be interesting to see whether the MySpace platform and API are truly a step towards openness or whether it’ll be another walled garden a la facebook. Facebook’s platform is phenomenally successful, but doesn’t really open up their core data (status, events, etc.) for developers to interact with. Given their track record it’s unlikely that MySpace are really going to launch something more open that that.

For developers, and for the musicians whose presence is MySpace’s key calling card, this is a tiny step but not one that makes easier the services that we really need. Musicians still need to update their information across dozens of walled gardens rather than having easy tools to use. Developers still need to scrape and hack if they want to provide a way to access core parts of users’ profile, and unless MySpace address the many, many technical problems on their site (unreliability, apparently random use of captchas, awful HTML) that’s going to remain a huge hassle.

Of course, the key question will be whether this announcement will help MySpace retain their pre-eminent position. The crown has slipped over the last few months, with facebook’s popularity rocketing and people deleting MySpace contacts and accounts in order to focus on just one social network. I suspect MySpace will never get their crown back. If they do, it’ll have to be because they’ve radically changed the social networking game.

Apr 07

Avoiding MySpace (or, cross-posting with WWW::Mechanize)

It seems that anyone involved in helping musicians with their web presence has to learn to tolerate MySpace. I don’t think I know anyone who actually enjoys the process of using MySpace, but a strong presence there is a must have for almost every musician these days.

I’ve long wished for a decent API that would help me integrate MySpace with websites I run for musicians—after all, it isn’t very DRY to post the same content in several places when it could be automated—but as time has gone on it’s become clear that an API would be entirely anathema to MySpace’s approach to the web.

So while working on some updates to a friend’s website I decided to try out the Ruby port of WWW::Mechanize to automate the process of posting blog entries over at MySpace.

Firstly, we need to be able to log in. To do that, you can almost copy some of the library’s examples as it’s as simple as:

agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'

page = agent.get('http://www.myspace.com')
login_form = page.forms.with.name('theForm').first
login_form.email = username
login_form.password = password
logged_in = agent.submit(login_form)

Posting a blog entry is a little trickier, as MySpace uses javascript to change forms’ ‘action’ attributes based on which button you click, and occasionally inserts tokens in the URLs, but after a little exploration I came up with:

blog_page = agent.get('http://blog.myspace.com/index.cfm?fuseaction=blog.create&editor=false')
blog_form = blog_page.forms.with.name('theForm').first

# Here we have to grab the action as it includes a token which can change
new_action = blog_page.body.match(/document.theForm.action = '(.+?)'/)
blog_form.action = new_action[1]

blog_form.subject = subject
blog_form.BlogCategoryID = category
blog_form.body = body

now = DateTime.now
blog_form.postMonth = now.month
blog_form.postDay = now.mday
blog_form.postYear = now.year
blog_form.postHour = now.strftime('%I')
blog_form.postMinute = now.min
blog_form.postTimeMarker = now.strftime('%p')

submitted = agent.submit(blog_form)

confirm_form = submitted.forms.with.name('theForm').first
confirm_form.action = 'http://blog.myspace.com/index.cfm?fuseaction=blog.processCreate'
posted = agent.submit(confirm_form)

And that’s all there is to it. I’m impressed with how easy WWW::Mechanize makes interacting with forms, and generally how pleasant it is to work with. Performance is pretty good too, specially given how problem prone MySpace is. It’s nice to be able to imagine a scenario in which clients can cross-post their content to MySpace. If we’re lucky, we never need actually visit that website again!

I’m working on packaging up the code, probably with support for posting event dates and ‘bulletins’, and adding in error handling to deal with the 75% of the time (based on my usage this afternoon) when MySpace returns an error page. It may be a few days, but I’ll post a note here when it’s ready.

Feb 07

Services_Technorati version 2

In an effort to tidy up various older projects that were never quite completed, I’ve turned my attention to my first PEAR module Services_Technorati. It’s a very simple wrapper around the Technorati API, but the PHP4 version never reached a stable release as it depended on some other packages which were also never stabilised.

So it seemed time to make the simple step of converting the code to be PHP5-only and use simplexml for their XML parsing. That removes the dependencies which were slowing me down, and should result in improved speed along the way as the XML parsing is now handled in C rather than PHP. I just released 2.0.0alpha1, but the code should be pretty stable and I’m hoping to run through the steps and get a stable release out very soon.

Update (27th Feb): I’ve just pulled this release and re-released it as 0.7.0. Apparently because the package never release 1.0 in its original version, I should just continue with the previous version numbers despite the change to PHP5.

Feb 07

Intercepting microformats in rails input

In Input formats and content types in Rails 1.2 I mentioned a project I’ve been working on that will provide a RESTful service interface which accepts its input in a number of formats, including microformatted HTML.

For certain types of data microformats provide a great way to receive input as they don’t require your clients to learn a new schema to send you data. They can take the same semantically rich HTML they’re displaying on their website and POST it to your endpoint. Or they can use a tool like Ryan King’s hcalendar creator to generate some sample input.

But intercepting and interpreting that data isn’t quite so simple as JSON or XML. Those formats have well defined content types that we can use to identify them, and use the parameter parsers I described in my earlier blog entry. Microformats are HTML and so will come with an HTML content type, just like other forms of input.

It would be nice if there were a simple way (short of running them through a full parser) to identify POSTs whose bodies contain microformats, but so far I haven’t come across one. What we can do is to override rails’ default handlers to parse the raw post data and see if it looks like regular form input. If it doesn’t, we can presume it’s meant to be considered to be microformat data and we can do some parsing using the excellent mofo (on which more, later). My code at present is:

microformat_interceptor = Proc.new do |data|
  parsed = CGI.parse(data)

  if parsed.collect { |key, value| key + '=' + value[0] }.first == data
    { :event => HCalendar.find(:text => data) }

Mime::FORM = Mime::Type.lookup("application/x-www-form-urlencoded")
ActionController::Base.param_parsers[Mime::FORM] = microformat_interceptor
ActionController::Base.param_parsers[Mime::HTML] = microformat_interceptor

With that code in environment.rb, a request with hCalendar data in its post body will look to our actions as if it had an HCalendar object in params[:event]. If we extend a few of our event model’s methods, our controller can treat this input just as it would XML or a standard query string.

Obviously this is a work in progress, its output is very simple, and it’s not all that versatile, but so far my tests stand up well, and it makes the code considerably more elegant.

For a little more on microformats and APIs, check out Drew’s “Can Your Website be Your API?” and the REST page on the microformats wiki.

Feb 07

Input formats and content types in Rails 1.2

One feature of recent releases of Rails I hadn’t spotted before is the ability to define your own parameter parsing based on content type. I’m working on an application that will employ a RESTful API and that I hope will take its input in either standard http parameters, microformatted HTML, XML or JSON.

I don’t really want to have to write custom code within the controllers to interpret the input based on content type, so I started looking for how rails parses XML input and came across the following in the actionpack changelog:

    # Assign a new param parser to a new content type
    ActionController::Base.param_parsers['application/atom+xml'] = Proc.new do |data|
      node = REXML::Document.new(post)
     { node.root.name => node.root }

    # Assign the default XmlSimple to a new content type
    ActionController::Base.param_parsers['application/backpack+xml'] = :xml_simple

Looking at the actual source code it appears it’s actually being implemented slightly differently, with the Mime::Type object being used as the key, rather than the text content type. Since the json content type is already defined (with a reference to the object in Mime::JSON), JSON can (usually) be parsed as YAML, and the :yaml symbol is a shortcut to a YAML parser, handling it transparently is almost as simple as adding:

ActionController::Base.param_parsers[Mime::JSON] = :yaml

or if we wanted to be a bit more explicit:

ActionController::Base.param_parsers[Mime::JSON] = Proc.new do |data|

to environment.rb. Building these APIs is even easier than I’d thought!

Sep 06

Civic Footprint

For some time now I’ve been interested in the possibility of bringing together political information from all different layers of government and finding ways of layering it. Too few of us understand where the key decisions on the issues that concern or affect us are taken. Action at a local level can be a very powerful political tool it’s hard to find out which level is most appropriate, or to trace how issues move between layers. Unfortunately it can seem even harder to find well-structured data at more local levels than it is on a national level.

That’s why I was very interested to discover Civic Footprint, a project of the Center for Neighborhood Technology that provides a simple web interface (and since May 2006 an API) for residents of Cook County, Illinois to find out the ‘political geography’ of their address.

For users of the website those districts are matched up with representatives, so you can quickly find out who represents you on each level, and from there jump off to that representative’s website or wikipedia entry, or a Google News or Technorati search for them. It’d be nice if the congressional pages (such as this for Danny Davis (D)) were integrated with a site like govtrack for more targetted information that google or technorati can provide, but it’s still a great source of information.

It doesn’t look like the API will yet tell you who the representatives are for each of your districts, simply providing the IDs of those districts. Hopefully it will soon. It’ll be very interesting to see how the site develops, as it shows potential to become something of an example of how civic and political data can be made accessible and how services can be built on top of that.