a work on process

Viewing posts in category: XML

XML_Feed_Parser: Now in PEAR

11 October 2005 (10:01 am)

By James Stewart
Filed under: Announcements, XML
Tagged:

On Monday morning I received noticed that XML_Feed_Parser had been accepted into PEAR. The voting process brought a few useful comments, and the latest version cleans up a few of those issues, including adding a custom Exception class.

You can now find the package’s homepage on the PEAR site and it is also stored in the PEAR CVS repository for those who want to get the latest development version or check the source before downloading. PEAR users can get the latest version with:

% pear install XML_Feed_Parser-alpha

I’m hoping to start adding test cases that use character encodings other than UTF-8 and ISO-8859-1 very soon.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Feed Parser: Unit Testing Results

22 September 2005 (8:37 am)

By James Stewart
Filed under: XML
Tagged: , , , , , ,

We’ve been away for the past week and time on planes and trains allowed me to focus on some simple unit tests for XML_Feed_Parser. I based the tests on the sample feeds provided in each format’s specification and was able to fix numerous bugs that arose during the testing process, with the only one remaining being some problems handling XML entities.

The tests also allowed me to do some refactoring, identifying a few methods that could be eliminated or merged into the parent class since they were the same across all feed types.

The new version is available at the usual location.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Tinkering with Atom

5 June 2005 (12:30 pm)

By James Stewart
Filed under: Announcements, XML
Tagged: , , ,

For a while now I’ve been talking about writing some classes to ease use of Atom in PHP, primarily as a base for an implementation of the Atom Publishing/Editing Protocol. I’ve been putting it off, partly due to time restrictions, and partly because the Atom Syndication Format isn’t quite an approved standard yet and I didn’t want to have to spend too much time keeping up with drafts.

Atom is rapidly approaching stability, and a little time over the weekend has led me to start work on some code. Rather than just write a parser for Atom I decided to follow the lead set by Mark Pilgrim’s Universal Feed Parser, which makes working with feeds in python a breeze, and have begun to shape some classes that I hope will provide a similar level of flexibility and abstraction for PHP coders.

At the moment, the implementation consists of two (PHP5-only) classes. The main FeedParser class will operate on either an XML file passed in as a string, or a URL/filename, and will then give some access to some of the feed properties (through overloading). Parsing is mainly provided using PHP5’s built in DOM support, with a little use of SimpleXML

The object can also be iterated over, returning each time an object representing one item from a given feed, from which elements and attributes can be retrieved. Soon I hope to add a mechanism for accessing an element by ID, as well as an improved mechanism for retrieving feeds over HTTP that will honour returned status codes, and support for feed validation. Right now, access works along the lines of:

require_once "XML_FeedParser.php";

$parse = new XML_FeedParser($atom_feed_url);
echo "<ul>";
foreach ($parse as $item) {
	$property = 'dc:subject';
	$subjects = $item->$property;
	if (is_array($subjects)) {
		$subjects = join(", ", $subjects);
	}
	$title = $item->title;
    echo "<li>$title: $subjects</li>";
}
echo "</ul>";

Where an element is requested that occurs only once and whose only child is a text node, the text will be returned. If the element occurs multiple times, or has further elements as children, an array is returned.

Further work is currently needed to provide useful abstractions. Mapping between element names used in different formats is high on the list, as are a unified mechanism for working with dates in feeds, and a cleaner way to access certain attributes. Where multiple links are provided, I hope to provide a way of accessing them using ‘rel’ and ‘type’ attributes. I’m still deliberating whether the best way of achieving all this is to have a separate class (implementing a consistent interface) for each syndication format or careful use of conditional code within a single class.

There is some rudimentary support for namespaces provided. I have used the namespace to identifier mapping from the Universal Feed Parser to allow this, meaning that if the request is for, say, dc:subject, the parser will recognise that as a Dublin Core property and search the feed appropriately. The implementation is far from perfect, but is probably the best tradeoff between usability and flexibility that can be hoped for at this stage.

The other key aspect that I’m considering adding is a way to add/edit/remove items from a feed. If I do that, I’ll probably change the name of the package.

For now, I’ve packaged the work to date as a PEAR-compatible package that will be installed as XML_FeedParser. You can get the package file here or see the source code for the main class and the prototype item class (the latter URL may change as the implementation develops).

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 
« Previous Page