a work on process

Viewing posts tagged: ruby

Exploring Ruby CSS parsers: TamTam and CSSPool

5 December 2007 (1:14 pm)

By James Stewart
Filed under: Notes, Snippets
Tagged: , , , , , ,

In response to yesterday’s post about inlining CSS for HTML emails, I got a couple of comments suggesting alternatives to my CSS parser class. Not wanting to have to maintain code unless I have to, I decided to give them both a try and see how they worked out.

TamTam

First up is TamTam, suggested by batnight. I’d actually spotted TamTam and link blogged it a few weeks ago, which shows how transient attention can be. TamTam is a complete solution for inlining CSS, so I should be able to replace all my code with:

require 'rubygems'
require 'hpricot'
require 'tamtam'
 
inlined = TamTam.inline(
  :css => File.read('/path/to/my.css'),
  :body => File.read('/path/to/my.html')
)
 
puts inlined

That looks ideal, but unfortunately as soon as I tried passing my code into it an error emerged.

/Library/Ruby/Gems/gems/tamtam-0.0.2/lib/tamtam.rb:77:in `apply_to': Trouble on style td#email-header-message on element <td id="email-header-message"> (Exception): can't convert nil into String	from /Library/Ruby/Gems/gems/tamtam-0.0.2/lib/tamtam.rb:24:in `inline'
</td>

Digging into the library it seems that its parsing fails if there’s a CSS rule that doesn’t match any elements in the document. That’s not a problem if your CSS file is targetted to a specific block of HTML, but if the idea is to build a set of rules that can be applied across a selection of page/emails that may be a problem.

CSSPool

CSSPool, suggested by Dan Kubb, is a more generic CSS parser which is designed to work with hpricot to map between CSS and HTML. Initially I just used something very much like the examples they offer to get a sense of the format of the objects they give access to:

require 'rubygems'
require 'hpricot'
require 'csspool'
 
sac = CSS::SAC::Parser.new 
css_doc = sac.parse(File.read('/path/to/my.css')) 
html_doc = Hpricot.parse(File.read('/path/to/my.css')) 
css_doc.find_all_rules_matching(html_doc).each do |rule| 
  puts rule
end

Unfortunately this too failed on me with:

NoMethodError: undefined method `accept' for nil:NilClass

From a quick look at the source code it seemed that this error was occurring when a CSS selector didn’t match the supplied document, just as in TamTam. It turned out to be pretty easy to patch, though, and I’ve submitted a report in their tracker.

That done I was able to iterate over it and access the selectors easily enough, but what I’ve not yet been able to find is a way to get the parser to give me the actual CSS declarations appropriately formatted for including in the HTML. When I have a rule I can get the selector with:

rule.selector.to_css

But to find out what styles the selector applies isn’t so straightforward. If anyone has an easy way to do that, I’d love to hear about it in the comments!

Update (later that day): Version 0.2.3 of CSSPool is now out and includes my fix.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

A little scripting to help with HTML email - bringing styles inline

4 December 2007 (8:22 pm)

By James Stewart
Filed under: Snippets
Tagged: , , , ,

As anyone keeping an eye on my deli.cio.us feed may have noticed, quite a few links have appeared to information about the preparation of HTML email. It’s a nasty business, as a quick glance at the website of the email standards project will tell you. But sadly, nasty as it may be, sometimes it has to be done.

Even if the email I send out is going to have CSS scattered inline, for building the templates I’d much rather be able to focus on writing the structure of the document and leave worrying about my CSS for another time, and another file. That wouldn’t get me around the nastiness of having to use tables for anything but the simplest of layouts, but it still feels right to keep the separation for as long as possible.

I had a quick look for a tool that would take a stylesheet and an HTML document, and embed the rules online, but didn’t find one. So I turned to ruby. In theory it should be very easy to build something like this, because of hpricot’s support for CSS selectors. If we had the CSS stored in a hash all it would take would be something like:

require 'hpricot'
doc = Hpricot(open('my_page.html')
 
css_as_hash.each do |selector, rule|
  (doc/selector).set('style', rule)
end
 
puts doc

Obviously that wouldn’t play nicely if there were already any styles inline, but for the purposes of this project I assumed there wouldn’t be.

I had a quick look at the cssparser rubygem but found that the sample code threw ‘method not found’ errors so I decided to quickly roll my own class that would take a path to a CSS file, and convert it to a hash. All it took was a few minutes’ work and the result was:

# This class takes a CSS file and provides a method to
# parse it into a hash. Usage is:
# 
# parser = SimpleCSSParser.new('/path/to/myfile.css')
# hash_of_rules = parser.to_hash
#
# For more advanced CSS handling check out the cssparser gem
# http://code.dunae.ca/css_parser/
class SimpleCSSParser
 
  # Receive and open the CSS file, storing its contents
  def initialize(path_to_file)
    @css = open(path_to_file).read
  end
 
  # Convert the CSS into a hash, where the keys are the selectors
  # and the values are the rules
  def to_hash
    @to_hash ||= separate_rules.inject({}) do |collection, rule|
      identifiers, rule = prepare_selectors_and_rule(rule)
      identifiers.each do |identifier|
        collection[identifier] ||= ''
        collection[identifier] += rule
      end
      collection
    end
  end
 
  private
    def separate_rules
      @css.split('}')
    end
 
    # Strip comments and extraneous white space from our CSS rules
    def clean_up_rule(css_rule)
      css_rule = css_rule.gsub(/\/\*.+?\*\//, '')
      css_rule.gsub(/\n|\s{2,}/, '')
    end
 
    # Break apart our selector(s) and rule. We return an array
    # of selectors to allow for situations where multiple selectors
    # are specified (comma separated) for a single rule
    def prepare_selectors_and_rule(rule)
      parts = rule.split('{')
      selectors = parts[0].split(',').map(&:strip)
      return selectors, clean_up_rule(parts[1])
    end
end

With that in place, I can now call:

require 'hpricot'
 
doc = Hpricot(open('my_file.html'))
parser = SimpleCSSParser.new('my_file.css')
 
parser.to_hash.each do |selector, rule|
  (doc/selector).set('style', rule)
end
 
puts doc

and have the result I wanted all along. It’s rather brittle because of the way it splits the rules up, and it won’t pull in @include’d files, handle multiple CSS files, or do anything to honour the proper inheritance rules, but for my purposes that’s okay. I bundled it all up in a file that can be called from the command line. You can find that in this pastie.

A nice (and really quite simple) addition would be to take Campaign Monitor’s Guide to CSS Support in Email, parse it and spit out warnings about which email clients will have issues with which CSS rules. If I get round to implementing that I’ll blog about it here. If you get there before me, do post a comment and let me know.

UPDATE (5th Dec ‘07: I’ve posted a follow-up looking at some other Ruby CSS parsers.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Converting HTML to Textile with Ruby

23 November 2007 (7:51 pm)

By James Stewart
Filed under: Snippets
Tagged: , , , , ,

One of the many tricky decisions to be made when building content management tools is how to allow users to control the basic formatting of their input without breaking your carefully crafted layouts or injecting nasty hacks into your pages. One approach has long been to provide your own markup language. Instead of allowing users to write HTML, let them use bbcode, or markdown, or textile, which have more controlled vocabularies and rules that mean it’s much less likely that problems will occur.

Textile in particular has a nice simple syntax and is increasingly popular thanks to its adoption in products like those of 37signals. In Ruby, there’s the RedCloth library which makes it fast and easy to convert textile to HTML. The one problem is if you already have a body of user generated HTML in your legacy system that needs converting. That’s the situation I found myself in this week and I quickly needed a tool to translate the content so that I could get on with the more interesting parts of the system.

Searching for options, the ClothRed library which offers some translation, but it doesn’t handle important elements like links. I considered patching it to handle the elements I need, but in the end I decided to take a different approach and used the SGML parsing library found here to port a python html2textile parser.

Porting code from python to ruby is a pretty straightforward process as the language’s are so similar on a number of levels, but there were several issues to work through, particularly relating to scoping, and quite a few methods to change to make them feel a little more ruby-ish. I’ve not converted all of the entity handling as I didn’t really need it, but there might be a bit of work to do in making sure character set issues are properly taken care of.

The end result is a piece of code that’s now served its purpose and that I’m unlikely to need again for quite a while. It’s not something that I’m particularly proud of, it could almost certainly be implemented more neatly, but I thought I’d throw it out there in case it could be useful to someone else. Should you be inspired to take it and twist it and turn it into a well-heeled, more robust and properly distributable solution, feel free, but please let me know so that at the very least I can update this entry.

Grab the code here or view it here.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

As I’ve indicated here a few times, when announcing site launches and offering a few hints and tips, I fairly frequently find myself working with Drupal but have long had reservations about doing so. What I’ve so far avoided doing is going into much detail about why that would be, what those reservations are, and so on. But now I’m working on a review of a Drupal book and so it seems appropriate to lay those cards on the table and look at the details on them. It seems easiest to do that by comparing with the framework I do most of my development in: Ruby on Rails.

As ever with technology comparisons, just like any other sort of comparison, a lot of my preference for rails boils down to a combination of taste and experience. I’m also well aware that I’m comparing unlike things. Rails and Drupal set out to solve different problems, and so obviously take quite different approaches. That doesn’t mean, however, that developers aren’t often left choosing between them as the boundaries are blurred and there is significant overlap between their respective domains.

I’ve seen plenty of bad ruby code, but proportionately I’ve seen a lot more bad PHP, and as a PHP-based solution Drupal inherits some of that baggage. Drupal has pretty good support for a wide range of cutting-edge ideas and tools, but by and large the Rails community has seemed much more clued-in to what’s happening at the forefront of web development. That may of course change as the Rails community continues to transition from being chiefly early adopters to a much more mainstream crowd, but it marked my early impressions of the framework.

A case in point is the integration of automated testing in Rails. There is some community support for automated testing in Drupal (primarily through the simpletest module) and there are plenty of Rails projects that lack tests, but there is a strong focus on Test-Driven and Behaviour-Driven development in the Rails community, and the combination of Rails’ object orientation, simple rake tasks and fixtures/mocks makes it very simple to write a solid test kit.

Testing is complicated by the hooks system in Drupal. Hooks are a powerful feature, allowing any module to broadcast and listen for a variety of events and respond appropriately, but it can be difficult to know what order modules will receive those events and so you end up performing a variety of contortions to make sure your responses are given the appropriate weight. Arguably Rails’ layers of filters and plugins can introduce similar complications and I’ve certainly spent quite a bit of time tracking down performance hits coming from rogue filters in plugins and engines, but so long as inheritance is properly handled it’s not all that difficult to do a little introspection to make sure everything’s in its right place.

The fact that Drupal relies heavily on web-based configuration makes it easy to put a lot of power in the hands of admin users, reducing the reliance on a developer, but it also makes upgrades and deployments a more complex business. When I add a new module to an existing Drupal project I usually find myself noting down the steps I take to configure it in my development and/or staging environments so that I can repeat that process in production. In a Rails project I’d write a migration and set up capistrano to run that migration when I push my new version live, significantly reducing duplication. The Drupal approach may be simpler for non-developers, but as a developer I frequently get frustrated at the repetition of tasks that could be so much easier.

That issue is one of the key things pushing me away from the nice-in-concept CCK. Adding new content types to a site through a web interface is an appealing idea, but it’s a pain to have to work out how that changed things in my database to produce a script, or to have to repeat a manual process. Since I know how, I’m much more likely to produce a module for my new content type, simply because I know that deploying that is a much quicker process.

One reason it’s been fascinating reading the Drupal book I’m currently reviewing is that books on Drupal are so thin on the ground. That’s a real shame as while there’s a lot of content on Drupal online, it’s not really been well organised for experienced developers wanting to build serious solutions who aren’t already intimately involved with Drupal. By contrast, while the Rails books are mixed in quality the best ones not only get you up and running fairly quickly, but provide a good entrance to exploring the internals and writing effective extensions. That’s not necessarily a reflection on the core projects themselves, but both online documentation (where, admittedly, Rails is still lacking) and books make a significant difference to the developer eco-system and quality of the solutions that are likely to be built.

I don’t mean to sound too negative, and I’d love to receive comments showing me how to address the shortcomings I perceive. I’ve built sites I’m proud of on top of Drupal, and where their focus is a good fit for Drupal’s core features or mainstream modules it’s been a very good solution, providing a suite of content management tools at very little cost. But as soon as there’s serious custom development to do, those advantages tend to dissipate pretty quickly when set alongside the advantages of Rails for agile, regularly upgraded and well tested solutions.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Book Review: Pro ActiveRecord

17 October 2007 (9:53 am)

By James Stewart
Filed under: Book Reviews
Tagged: , , , , ,

Right at the start of Pro Active Record the authors address a possible problem some may have with it: that there’s not enough in Active Record to warrant a full book. They point out that the basics are well covered as sections elsewhere but that this is the first book to really dig into working with legacy schema and other ‘advanced’ uses. That’s fair enough, but after reading the book I am still left with the question of why, then, they dedicate the first half to covering ActiveRecord’s most basic concepts?

Judging from postings on the rails email list, there’s certainly a lot of confusion about ActiveRecord, associations, observers, how to work with legacy table names and primary keys, and so on. But in a book with a title prefix of “Pro” I was expecting to jump straight into the nitty gritty of topics like compound/composite primary keys and performance tuning, probably with some real world examples, and maybe with a serious exploration of AR’s internals. As it is, such topics only get a quick treatment in the final chapter (the compound/composite primary keys section is a paragraph referring users to http://compositekeys.rubyforge.org).

It’s almost always instructive reading other developers’ code and it would be unfair to claim that I didn’t spot a couple of tips that may prove useful, but they were passing things. And sometimes I found myself wondering what happened to the tech review process, particularly in the coverage of the has_one association, where not only is the variable naming confusing, but they seem to be calling the each method on a single ActiveRecord instance.

I’m left wondering what the audience is for this book. The title and blurbs suggest it’s pitched at people who want to go deeper into ActiveRecord than they have before, but the content is better suited for someone with some database experience who wants to pick up ActiveRecord to write some scripts. As it is, if you’ve worked with ActiveRecord before your time will be better spent writing plugins and exploring the internals for yourself, and if you’ve not you’ll get most of the same material from a decent Rails book and some time exploring.

Disclaimer: I was sent a copy of this book for review by the publisher. You can find it at apress, amazon US, amazon UK and all sorts of other places.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 
« Previous PageNext Page »