Posts tagged content management

Converting HTML to Textile with Ruby

One of the many tricky decisions to be made when building content management tools is how to allow users to control the basic formatting of their input without breaking your carefully crafted layouts or injecting nasty hacks into your pages. One approach has long been to provide your own markup language. Instead of allowing users to write HTML, let them use bbcode, or markdown, or textile, which have more controlled vocabularies and rules that mean it’s much less likely that problems will occur.

Textile in particular has a nice simple syntax and is increasingly popular thanks to its adoption in products like those of 37signals. In Ruby, there’s the RedCloth library which makes it fast and easy to convert textile to HTML. The one problem is if you already have a body of user generated HTML in your legacy system that needs converting. That’s the situation I found myself in this week and I quickly needed a tool to translate the content so that I could get on with the more interesting parts of the system.

Searching for options, the ClothRed library which offers some translation, but it doesn’t handle important elements like links. I considered patching it to handle the elements I need, but in the end I decided to take a different approach and used the SGML parsing library found here to port a python html2textile parser.

Porting code from python to ruby is a pretty straightforward process as the language’s are so similar on a number of levels, but there were several issues to work through, particularly relating to scoping, and quite a few methods to change to make them feel a little more ruby-ish. I’ve not converted all of the entity handling as I didn’t really need it, but there might be a bit of work to do in making sure character set issues are properly taken care of.

The end result is a piece of code that’s now served its purpose and that I’m unlikely to need again for quite a while. It’s not something that I’m particularly proud of, it could almost certainly be implemented more neatly, but I thought I’d throw it out there in case it could be useful to someone else. Should you be inspired to take it and twist it and turn it into a well-heeled, more robust and properly distributable solution, feel free, but please let me know so that at the very least I can update this entry.

Grab the code here or view it here.

UPDATE (March ’09): I’ve moved the code to gist.github.com as past.ie seems rather unreliable these days

GOOD Magazine

Yesterday we pushed the button and launched the new version of GOOD Magazine, a site I’ve been working on for the past couple of months along with the folks at Area17. It’s a relatively large Ruby on Rails system built on top of the ORGware system (refitted as an engine) and supported with a caching system built for Madame Figaro.

Most of my work has been under the hood, but where I’ve touched the frontend I’ve tried to make use of microformats and other good practices. We’re providing a range of atom feeds (which will become easier to find over time as we make some refinements) but eventually I hope that we’ll have all listing pages using hAtom so that anything can be a feed.

The project’s been yet another reminder of the importance of a good test suite. Because we’re using an engine most of the test code has been in the form of integration tests (functional tests don’t automatically pick up the code from the engine) but I quickly lost count of the number of bugs that the tests helped me squish quickly, let alone prevented in the first place.

It’s always good when the placeholder content is interesting, and I’ve been enjoying reading the content from the magazine as I’ve worked on the site. Check it out at www.goodmagazine.com. I have to go and push out another update.

Content management with subversion

A recent comment reminded me of an old entry proposing yet another project I never had time to follow through with: Using Trac and Subversion with Social Documents. The idea there was to make use of subversion’s utility for version control and trac’s existing frontend for browsing that to present versioned documents.

In hindsight, I don’t think trac would actually be a good frontend for this unless the intended audience was entirely techies. Trac works for those of us who use it every day to follow a variety of projects, and its ability to combine a wiki with version control of the ‘official’ versions of documents provides some interesting ideas, but the interface just wouldn’t work.

But even if I’m unlikely to get time to play with it, I’m still interested in the idea of using subversion as the core for content management. It seems a sensible application of “small pieces loosely joined” to use a proven version control system as one layer in a system. So I’ll be interested to follow Bob DuCharme’s work to use subversion for Digital Asset Management in a CMS.

Bob’s looking into svn’s ability to store arbitrary metadata to store RDF relating to each revision and exploring how its hook mechanism an be employed to make it all work. As ever the proof will be in the interface, but the underlying principles definitely deserve exploration.