Posts tagged Parser
Exploring Ruby CSS parsers: TamTam and CSSPool
Dec 5th
In response to yesterday’s post about inlining CSS for HTML emails, I got a couple of comments suggesting alternatives to my CSS parser class. Not wanting to have to maintain code unless I have to, I decided to give them both a try and see how they worked out.
TamTam
First up is TamTam, suggested by batnight. I’d actually spotted TamTam and link blogged it a few weeks ago, which shows how transient attention can be. TamTam is a complete solution for inlining CSS, so I should be able to replace all my code with:
require 'rubygems' require 'hpricot' require 'tamtam' inlined = TamTam.inline( :css => File.read('/path/to/my.css'), :body => File.read('/path/to/my.html') ) puts inlined
That looks ideal, but unfortunately as soon as I tried passing my code into it an error emerged.
/Library/Ruby/Gems/gems/tamtam-0.0.2/lib/tamtam.rb:77:in `apply_to': Trouble on style td#email-header-message on element <td id="email-header-message"> (Exception): can't convert nil into String from /Library/Ruby/Gems/gems/tamtam-0.0.2/lib/tamtam.rb:24:in `inline' </td>
Digging into the library it seems that its parsing fails if there’s a CSS rule that doesn’t match any elements in the document. That’s not a problem if your CSS file is targetted to a specific block of HTML, but if the idea is to build a set of rules that can be applied across a selection of page/emails that may be a problem.
CSSPool
CSSPool, suggested by Dan Kubb, is a more generic CSS parser which is designed to work with hpricot to map between CSS and HTML. Initially I just used something very much like the examples they offer to get a sense of the format of the objects they give access to:
require 'rubygems' require 'hpricot' require 'csspool' sac = CSS::SAC::Parser.new css_doc = sac.parse(File.read('/path/to/my.css')) html_doc = Hpricot.parse(File.read('/path/to/my.css')) css_doc.find_all_rules_matching(html_doc).each do |rule| puts rule end
Unfortunately this too failed on me with:
NoMethodError: undefined method `accept' for nil:NilClass
From a quick look at the source code it seemed that this error was occurring when a CSS selector didn’t match the supplied document, just as in TamTam. It turned out to be pretty easy to patch, though, and I’ve submitted a report in their tracker.
That done I was able to iterate over it and access the selectors easily enough, but what I’ve not yet been able to find is a way to get the parser to give me the actual CSS declarations appropriately formatted for including in the HTML. When I have a rule I can get the selector with:
rule.selector.to_css
But to find out what styles the selector applies isn’t so straightforward. If anyone has an easy way to do that, I’d love to hear about it in the comments!
Update (later that day): Version 0.2.3 of CSSPool is now out and includes my fix.
Feed Parser: First Beta
Jan 11th
I promised more work on XML_Feed_Parser this month and am pleased to say I found the time. The first beta, version 0.3, is on its way into PEAR.
Two key developments in this version are support for the ‘content’ module in RSS2 and fix of a serious bug in the main __call method. I was accessing the compatMap variable (that handles the mapping of element names between different syndication formats) directly, and calling array_pop on it. That threw up problems if you wanted to access the same element multiple times, so I’ve added in a temporary variable to fix that.
The main development, however, is experimental support for the tidy library. If you think (or know) the feed you’re working with might not be valid XML, you can now choose to have tidy clean up the code before it gets handed to the DOM. Handling ill-formed XML is a much requested feature, and initial testing of this solution yields positive results.
XML_Feed_Parser 0.2.8
Dec 26th
Paying projects have kept me away from XML_Feed_Parser for the past couple of months, but today finally brought time to get a new release into shape.
The main focus has been on working through the tests that come with The Universal Feed Parser. I have a (rudimentary) script that converts those tests into PHP and makes some syntax conversion to bring them into line with my package. There’s still some hand tweaking required to get the tests running properly, but for the time being that seems preferable to writing a full lexer.
With those tweaks in place, the package now passes the majority of the Atom 1.0 tests. Along the way I’ve added a few new compatibility features (guid now maps to id in atom, in some cases uses of url/href will automatically map to uri where that’s appropriate), and cleaned up the workings of the __call magic function that does a lot of the dispatching within and between the classes.
Another major addition is the decoding of base64 encoded atom:text constructs.
Hopefully development will pick up speed over the next few weeks. My aim is to have a beta out some time in January.
New Feed Parser version
Oct 15th
There’s a new version (0.2.5alpha) of XML_Feed_Parser in the wild.
I’ve cleaned up the handling of xml:base considerably, finally switching over to using PHP DOM‘s baseURI attribute and checking the bases for all links returned, whether from link constructs or in text constructs with the type ‘xhtml’. That coincides with reworking the getText() and getContent() methods for atom so they now properly recognise the different types that the atom spec allows. There’s a little more work to do to properly allow non text/html mime types in atom:content, since I don’t want to return less common mime types without a way for the user to check the type beforehand. That should come in the next version.
I’ve had one enquiry as to whether the package will work for those not using PEAR. It’s dependency on PEAR is limited to the PEAR_Exception class, so for those who genuinely need to use it without PEAR, you could declare an empty class called PEAR_Exception that extends Exception and everything else should work. For those wondering whether there’s any way to get the code working with PHP4, the answer is ‘no’ since I make so much use of the PHP5 DOM implementation, plus simplexml and exceptions. For those looking for PHP4 support, check out Magpie.
XML_Feed_Parser CFV
Oct 3rd
Last night I initiated the Call For Votes on XML_Feed_Parser‘s inclusion in PEAR. The package isn’t quite where I want it to be, but after several months’ work, the core functionality is all in place and the first round of unit tests are all passed successfully.
So if you’re a PEAR developer… please go and vote! (feedback is still welcome from those who aren’t)