Tinkering with Atom

For a while now I’ve been talking about writing some classes to ease use of Atom in PHP, primarily as a base for an implementation of the Atom Publishing/Editing Protocol. I’ve been putting it off, partly due to time restrictions, and partly because the Atom Syndication Format isn’t quite an approved standard yet and I didn’t want to have to spend too much time keeping up with drafts.

Atom is rapidly approaching stability, and a little time over the weekend has led me to start work on some code. Rather than just write a parser for Atom I decided to follow the lead set by Mark Pilgrim’s Universal Feed Parser, which makes working with feeds in python a breeze, and have begun to shape some classes that I hope will provide a similar level of flexibility and abstraction for PHP coders.

At the moment, the implementation consists of two (PHP5-only) classes. The main FeedParser class will operate on either an XML file passed in as a string, or a URL/filename, and will then give some access to some of the feed properties (through overloading). Parsing is mainly provided using PHP5’s built in DOM support, with a little use of SimpleXML

The object can also be iterated over, returning each time an object representing one item from a given feed, from which elements and attributes can be retrieved. Soon I hope to add a mechanism for accessing an element by ID, as well as an improved mechanism for retrieving feeds over HTTP that will honour returned status codes, and support for feed validation. Right now, access works along the lines of:

require_once "XML_FeedParser.php";

$parse = new XML_FeedParser($atom_feed_url);
echo "<ul>";
foreach ($parse as $item) {
	$property = 'dc:subject';
	$subjects = $item->$property;
	if (is_array($subjects)) {
		$subjects = join(", ", $subjects);
	}
	$title = $item->title;
    echo "<li>$title: $subjects</li>";
}
echo "</ul>";

Where an element is requested that occurs only once and whose only child is a text node, the text will be returned. If the element occurs multiple times, or has further elements as children, an array is returned.

Further work is currently needed to provide useful abstractions. Mapping between element names used in different formats is high on the list, as are a unified mechanism for working with dates in feeds, and a cleaner way to access certain attributes. Where multiple links are provided, I hope to provide a way of accessing them using ‘rel’ and ’type’ attributes. I’m still deliberating whether the best way of achieving all this is to have a separate class (implementing a consistent interface) for each syndication format or careful use of conditional code within a single class.

There is some rudimentary support for namespaces provided. I have used the namespace to identifier mapping from the Universal Feed Parser to allow this, meaning that if the request is for, say, dc:subject, the parser will recognise that as a Dublin Core property and search the feed appropriately. The implementation is far from perfect, but is probably the best tradeoff between usability and flexibility that can be hoped for at this stage.

The other key aspect that I’m considering adding is a way to add/edit/remove items from a feed. If I do that, I’ll probably change the name of the package.

For now, I’ve packaged the work to date as a PEAR-compatible package that will be installed as XML_FeedParser. You can get the package file here or see the source code for the main class and the prototype item class (the latter URL may change as the implementation develops).