Inspired by Sam Ruby’s work on applying the Universal Feed Parser tests to the Ruby FeedTools, I’ve spent a little time this afternoon working on testing XML_Feed_Parser with that same test suite. There’s a lot of work to do!

UFP’s tests consist of a series of feed files, some well-formed, and some illformed, with a description and test condition defined at the top of the file. eg.


Example description

So far all I’ve done is run a script through all the tests for well-formed feeds, testing whether XML_Feed_Parser throws an exception when I try and interpret them. When run against the current CVS, 1181 of the 1273 feeds parsed successfully and 92 failed. 68 of those failures were due to encoding problems (which I’ll try and work around, but won’t be able to cleanly fix until PHP has full unicode support), and another 17 were a result of not supporting CDF, leaving another seven I need to get fixed asap.

The next stage will be to translate the ‘Expect:’ values into something I can use in a PHP test case. I’ve done a little searching for a python lexer for PHP, but aside from this embedded interpreter that hasn’t had a release in nearly three years, I haven’t found one. Lacking the time to write such a beast myself, I suspect I’ll simply put together a series of regexps to do the translation necessary.

Of course, XML_Feed_Parser’s API differs in quite a number of ways from that of the Universal Feed Parser and so quite a few of those tests—unadjusted— would fail. As Sam points out, there would be numerous advantages to (roughly) sharing an API with the Universal Feed Parser, particularly in allowing programmers to easily switch between languages and in the fact that the documentation already written would also apply to XML_Feed_Parser which is (as yet) undocumented. I’m going to spend some time thinking through the implications of making some API adjustments to fit more closely, but I’d love input on how far I should go (is it worth breaking backwards compatibility?)