Rails is great for many things, but for very small apps, it can definitely be overkill. That’s where why the lucky stiff’s Camping micro-framework comes in. Where rails gets you started with a clearly defined structure and generally presumes you’re going to want to use a database, Camping makes no such assumptions and just provides a few nice hooks for micro apps.
I got started using Camping a couple of months ago. With a lot of travel coming up, I’m eager to keep up to date with special deals on flights and frequent flyer miles, and stumbled across milemaven.com which seemed a great source of that information. But it doesn’t provide feeds and I have no desire to visit the site every day, so I decided to dust off hpricot and combine it with Camping to scrape the site and deliver the contents to my news reader.
If I wanted to be strict about MVC, I’d probably do the actual scraping in the model, since for this app that’s the data store/source. But in the interests of simplicity I did the parsing in the controller, and even so my entire controller comes out at only 29 lines. A version with a few extra comments looks like:
module Milemaven::Controllers class Index < R '/(\d*)' def get code # Default to United Airlines code = 109 if code.blank? @url = "http://www.milemaven.com/offers/program/fly/#{code}/" content = '' # I could actually make this more compact by just passing having # hpricot get the URL, but I want to capture the last_modified time and # the charset to use in my feed open(@url, 'User-Agent' => 'Camping Milemaven Atom Feed Scraper') do |f| f.each_line { |line| content < < line } @charset = f.charset @updated = f.last_modified || Time.now end doc = Hpricot(content) rows = doc.search("table.listData tr") @title = doc.at('td.content h3').children[0].to_s # This first couple of rows are headers, so skip them @deals = rows[2..rows.size-1].collect do |row| title = row.search("td")[0]['title'] url = row.search('td')[0].children[1]['href'] { :title => title, :url => url } unless title.nil? or url.nil? end.compact render :index end end end
And then the view is a quick wrapper around Builder to generate an atom feed. The skeleton looks like:
module Milemaven::Controllers def index @headers['Content-Type'] = "application/atom+xml; charset=#{@charset}" xml = Builder::XmlMarkup.new(:target => self) xml.instruct!(:xml, :encoding => @charset) # Generate feed end end
The main limitation of the feeds generated this way is that it’s very hard to get real published/updated dates for the entries, particularly as the server doesn’t always return to the timestamp for the pages correctly.
I’ve actually been playing with making this all a bit more re-usable by setting up a DSL to lay out the scraping rules, meaning that both controller and view become usable for most pages. But it needs a bit more work, so I’ll save it for another (potential) post.
UPDATE (Mar 28th): Boaz Shmueli from Milemaven contacted me to let me know there are some feeds available from that site, such as this one for the route from IAD to TPE.

[...] Quick and Easy Feeds with Camping (tags: ruby camping programming web) [...]
Pingback by links for 2007-03-01 « Bloggitation — 28 February 2007 @ 7:21 pm