The opening of newspaper archives

The Economist is the latest organisation to announce it will be putting its archive online. Stretching back to 1843, that’s a huge mass of data coming online, even if (for now) it sits behind a very expensive pay-wall. The news comes three days after The Guardian and Observer announced that:

The first phase of the Guardian News & Media archive, containing the Guardian from 1821 to 1975 and The Observer from 1900 to 1975, will launch on November 3.
It will contain exact replicas of the original newspapers, both as full pages and individual articles. and will be fully searchable and viewable at guardian.co.uk/archive.
Readers will be offered free 24-hour access during November, but after this trial period charging will be introduced.

It’s not surprising that both services will carry charges. Putting all that material online is a costly business and there’s not yet much information to explain what level of traffic it’ll attract. Chances are the majority of people using these services will be based in academic institutions that will purchase licenses. For anyone with an academic account, this will be hugely useful.

It’s also another reason the web development community needs to take very seriously an issue the question of historical context on the web that Gavin has been talking about for a while now. Whilst these are closed archives most access will be through their own specific tools, but when they (inevitably, in my opinon) open up that’s an awful lot of data with very specific provenance that our search tools will have to mediate. Hopefully the developers working on these projects will be able to share their experiences and begin to establish best practice for mediating large, web-based historical archives.

What would be doubly fascinating would be to see what sort of social layers could emerge on top of these archives. Obviously if the archives were open that would provide a wealth of historical references that are URL addressable, making hypertext documents still richer. But could I search and annotate the data such that any mention of my ancestors is easily identifiable and tie that into my family tree? Could specific groups organise around topics within the archives, with, say, urban spelunkers gravitating to news of closed or buried buildings, or local communities able to map out their history in much more detail?

One thing is for sure, as more and more historical archives open up we’re all going to need to be much more imaginative about how we interact with online material. The well rehearsed “comments on news articles” model only works for recent occurrences or recently unearthed information, and there’s so much more we could do.