a work on process

Viewing posts tagged: Camping

Quick and Easy Feeds with Camping

26 February 2007 (7:34 pm)

By James Stewart
Filed under: Notes
Tagged: , , , ,

Rails is great for many things, but for very small apps, it can definitely be overkill. That’s where why the lucky stiff’s Camping micro-framework comes in. Where rails gets you started with a clearly defined structure and generally presumes you’re going to want to use a database, Camping makes no such assumptions and just provides a few nice hooks for micro apps.

I got started using Camping a couple of months ago. With a lot of travel coming up, I’m eager to keep up to date with special deals on flights and frequent flyer miles, and stumbled across milemaven.com which seemed a great source of that information. But it doesn’t provide feeds and I have no desire to visit the site every day, so I decided to dust off hpricot and combine it with Camping to scrape the site and deliver the contents to my news reader.

If I wanted to be strict about MVC, I’d probably do the actual scraping in the model, since for this app that’s the data store/source. But in the interests of simplicity I did the parsing in the controller, and even so my entire controller comes out at only 29 lines. A version with a few extra comments looks like:

module Milemaven::Controllers
  class Index < R '/(\d*)'
    def get code
      # Default to United Airlines
      code = 109 if code.blank?
      @url = "http://www.milemaven.com/offers/program/fly/#{code}/"
	 content = ''
 
	 # I could actually make this more compact by just passing having
	 # hpricot get the URL, but I want to capture the last_modified time and
	 # the charset to use in my feed
      open(@url, 'User-Agent' => 'Camping Milemaven Atom Feed Scraper') do |f|
        f.each_line { |line| content < < line }
        @charset = f.charset
        @updated = f.last_modified || Time.now
      end
 
      doc = Hpricot(content)
 
      rows = doc.search("table.listData tr")
      @title = doc.at('td.content h3').children[0].to_s
 
      # This first couple of rows are headers, so skip them
      @deals = rows[2..rows.size-1].collect do |row|
        title = row.search("td")[0]['title']
        url = row.search('td')[0].children[1]['href']
 
        { :title => title, :url => url  } unless title.nil? or url.nil?
      end.compact
 
      render :index
    end
  end
end

And then the view is a quick wrapper around Builder to generate an atom feed. The skeleton looks like:

module Milemaven::Controllers
 
  def index
    @headers['Content-Type'] = "application/atom+xml; charset=#{@charset}"
    xml = Builder::XmlMarkup.new(:target => self)
    xml.instruct!(:xml, :encoding => @charset)
 
    # Generate feed
  end
end

The main limitation of the feeds generated this way is that it’s very hard to get real published/updated dates for the entries, particularly as the server doesn’t always return to the timestamp for the pages correctly.

I’ve actually been playing with making this all a bit more re-usable by setting up a DSL to lay out the scraping rules, meaning that both controller and view become usable for most pages. But it needs a bit more work, so I’ll save it for another (potential) post.

UPDATE (Mar 28th): Boaz Shmueli from Milemaven contacted me to let me know there are some feeds available from that site, such as this one for the route from IAD to TPE.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]