Rails is great for many things, but for very small apps, it can definitely be overkill. That’s where why the lucky stiff’s Camping micro-framework comes in. Where rails gets you started with a clearly defined structure and generally presumes you’re going to want to use a database, Camping makes no such assumptions and just provides a few nice hooks for micro apps.

I got started using Camping a couple of months ago. With a lot of travel coming up, I’m eager to keep up to date with special deals on flights and frequent flyer miles, and stumbled across milemaven.com which seemed a great source of that information. But it doesn’t provide feeds and I have no desire to visit the site every day, so I decided to dust off hpricot and combine it with Camping to scrape the site and deliver the contents to my news reader.

If I wanted to be strict about MVC, I’d probably do the actual scraping in the model, since for this app that’s the data store/source. But in the interests of simplicity I did the parsing in the controller, and even so my entire controller comes out at only 29 lines. A version with a few extra comments looks like:

module Milemaven::Controllers
  class Index < R '/(\d*)'
    def get code
      # Default to United Airlines
      code = 109 if code.blank?
      @url = "http://www.milemaven.com/offers/program/fly/#{code}/"
	 content = ''
 
	 # I could actually make this more compact by just passing having
	 # hpricot get the URL, but I want to capture the last_modified time and
	 # the charset to use in my feed
      open(@url, 'User-Agent' => 'Camping Milemaven Atom Feed Scraper') do |f|
        f.each_line { |line| content < < line }
        @charset = f.charset
        @updated = f.last_modified || Time.now
      end
 
      doc = Hpricot(content)
 
      rows = doc.search("table.listData tr")
      @title = doc.at('td.content h3').children[0].to_s
 
      # This first couple of rows are headers, so skip them
      @deals = rows[2..rows.size-1].collect do |row|
        title = row.search("td")[0]['title']
        url = row.search('td')[0].children[1]['href']
 
        { :title => title, :url => url  } unless title.nil? or url.nil?
      end.compact
 
      render :index
    end
  end
end

And then the view is a quick wrapper around Builder to generate an atom feed. The skeleton looks like:

module Milemaven::Controllers
 
  def index
    @headers['Content-Type'] = "application/atom+xml; charset=#{@charset}"
    xml = Builder::XmlMarkup.new(:target => self)
    xml.instruct!(:xml, :encoding => @charset)
 
    # Generate feed
  end
end

The main limitation of the feeds generated this way is that it’s very hard to get real published/updated dates for the entries, particularly as the server doesn’t always return to the timestamp for the pages correctly.

I’ve actually been playing with making this all a bit more re-usable by setting up a DSL to lay out the scraping rules, meaning that both controller and view become usable for most pages. But it needs a bit more work, so I’ll save it for another (potential) post.

UPDATE (Mar 28th): Boaz Shmueli from Milemaven contacted me to let me know there are some feeds available from that site, such as this one for the route from IAD to TPE.