While there haven’t been any visible changes to my Heathrow Tower project in the past couple of weeks beyond my throwing in a few greetings in other languages to break things up a bit. Having put some of the statistical plans on hold as the snow last week prevented any data gathered from being anywhere close to representative, I’ve gradually been building up the database behind the scenes so I can start to do some of the more intricate things I’d like to do.

The key data I wanted was the airport codes for the various flights, and geographic data for those airports.

Firstly I found FlightAware.com who will provide all sorts of data from a flight code, but unfortunately the first time I tried making a request to their site using HTTP Client I spotted a comment in the HTML referring visitors to http://flightaware.com/about/termsofuse.rvt which states:

You will only access the FlightAware web site with an interactive web browser and not with any program, collection agent, or “robot” for the purpose of automated retrieval of content.

So I started looking at airline sites. United have a relatively straightforward URL scheme that responds to a GET and returns data that can be scraped. eg:

http://www.ua2go.com/flifo/FlightSummary.do?date=20090201&fltNbr=959

BA and Virgin, on the other hand, both require cookies to be enabled in order to get results from their flight trackers and don’t advertise any other URLs for flight data. Once I’d realised that only one of those three carriers was going to be helpful, I decided not to keep checking airlines.

So, a little frustrated, I tried just typing flight codes into google. And lo and behold… most of them give useful results. It doesn’t get them all, of course, but it’s enough that out of the 838 flight codes in the database, 695 are fully identified. Of those not identified, a number seem to be flights that were diverted to Heathrow but don’t normally go there.

So with some sense of the airports served, I also want to know about the airports themselves. Wikipedia’s pretty good there, with geo data for them all in an easy to capture form. Some, like Heathrow itself, are very easy to find:

http://en.wikipedia.org/wiki/LHR

while others are a bit trickier. But with some manual intervention I was able to get all of that data. The manually produced mappings and the code for pulling in the data can both be found on github.

More updates as time allows…