Posts tagged XML_Feed_Parser

XML_Feed_Parser: Handing over the reins

For the past few years I’ve been maintaining a PHP package called XML_Feed_Parser. It’s part of PEAR and attempts to offer a unified API for handling RSS and Atom feeds in your PHP code, a little inspired by projects like the universal feed parser. Its parsing and API are pretty comprehensive, but lately I’ve been falling a bit behind in managing it and there are aspects that could definitely do with some attention.

So I’m looking to hand it all over to someone with more time and energy for it than I. Preferably someone who uses it in an active project (being primarily a ruby developer these days, I spend a lot more time with feedtools than with my own package). I’m going to mark the package as ‘unmaintained’ and if you want to take it on, take a look at the appropriate page in the manual.

And if you want the full story of why I’ve chosen now to make this move, it’s made fairly clear on flickr and my other blog.

A couple of releases

In the process of catching up with some neglected tasks, I’ve pushed out new releases of both of my PEAR packages.

Services_Technorati receives a version number bump, and little else. The alpha release was never meant to last quite this long given that it’s merely a port of a very stable package, and it’s finally marked beta. My hope is that the beta release will pick up a few more users to put it through its paces.

I had wondered about adding in some extra classes to encapsulate responses, but at the end of the day simplexml does a decent job, is well documented, and doesn’t add any overhead, so I’m happy just returning its objects and letting people work with them.

There are also a couple of bug fixes for the stable release of XML_Feed_Parser, kindly contributed by users. There are still a couple of outstanding tickets, but they’re issues which require more thought so I’m postponing them for 1.0.3 or 1.1.0.

XML_Feed_Parser release delay

I’ve been rethinking a few aspects of XML_Feed_Parser following some discussion around the web, summarised in this post from Sam Ruby. Numerous aggregators appear vulnerable to attacks based on malicious HTML in the body of comments, and that includes any based on XML_Feed_Parser that do not do their own HTML filtering/output escaping.

There was a brief discussion of the issue on the PEAR email list and I’ve decided to change the package’s default behaviour. In the spirit of PEAR, I’m going to make use of HTML_Safe to process any html or text content in the feed before returning it. There will be extra methods to access the raw content, but it’ll be an extra step so that people know they’re potentially getting dangerous content.

HTML_Safe is currently in beta, but the developers tell me there will be a stable release within the next few weeks. That means XML_Feed_Parser won’t be stable until HTML_Safe is stable, but I think in the long run that’s worthwhile as it’ll lead to more secure applications.

XML_Feed_Parser Release Candidate

The first release of XML_Feed_Parser in six months is out the door, and it’s the first (and hopefully only) release candidate. I’ve had several people email me with questions about the package in the past few weeks, most of whom are using it successfully and wanted to see a stable release soon, so it seemed time to get moving on that.

This release fixes a few small bugs, mostly related to the packaging rather than to its operation, and I’m now bundling the various Relax NG schemas used for validation to save on HTTP requests. There is one open ticket that I’ll need to attend to, and then all should be ready for 1.0.

I’ve had a few requests for improvements to the package, mainly relating to improved handling of extensions, such as those beginning to appear for atom, but I decided to save those for a later (1.1?) release. If anyone would like to join me as a developer to work on those, I’d definitely welcome more involvement.

Feed Parser: First Beta

I promised more work on XML_Feed_Parser this month and am pleased to say I found the time. The first beta, version 0.3, is on its way into PEAR.

Two key developments in this version are support for the ‘content’ module in RSS2 and fix of a serious bug in the main __call method. I was accessing the compatMap variable (that handles the mapping of element names between different syndication formats) directly, and calling array_pop on it. That threw up problems if you wanted to access the same element multiple times, so I’ve added in a temporary variable to fix that.

The main development, however, is experimental support for the tidy library. If you think (or know) the feed you’re working with might not be valid XML, you can now choose to have tidy clean up the code before it gets handed to the DOM. Handling ill-formed XML is a much requested feature, and initial testing of this solution yields positive results.