a work on process

Viewing posts tagged: RSS

Feed Parser: First Beta

11 January 2006 (3:10 pm)

By James Stewart
Filed under: Announcements
Tagged: , , , , , ,

I promised more work on XML_Feed_Parser this month and am pleased to say I found the time. The first beta, version 0.3, is on its way into PEAR.

Two key developments in this version are support for the ‘content’ module in RSS2 and fix of a serious bug in the main __call method. I was accessing the compatMap variable (that handles the mapping of element names between different syndication formats) directly, and calling array_pop on it. That threw up problems if you wanted to access the same element multiple times, so I’ve added in a temporary variable to fix that.

The main development, however, is experimental support for the tidy library. If you think (or know) the feed you’re working with might not be valid XML, you can now choose to have tidy clean up the code before it gets handed to the DOM. Handling ill-formed XML is a much requested feature, and initial testing of this solution yields positive results.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

XML_Feed_Parser 0.2.8

26 December 2005 (12:26 pm)

By James Stewart
Filed under: Announcements
Tagged: , , , , , ,

Paying projects have kept me away from XML_Feed_Parser for the past couple of months, but today finally brought time to get a new release into shape.

The main focus has been on working through the tests that come with The Universal Feed Parser. I have a (rudimentary) script that converts those tests into PHP and makes some syntax conversion to bring them into line with my package. There’s still some hand tweaking required to get the tests running properly, but for the time being that seems preferable to writing a full lexer.

With those tweaks in place, the package now passes the majority of the Atom 1.0 tests. Along the way I’ve added a few new compatibility features (guid now maps to id in atom, in some cases uses of url/href will automatically map to uri where that’s appropriate), and cleaned up the workings of the __call magic function that does a lot of the dispatching within and between the classes.

Another major addition is the decoding of base64 encoded atom:text constructs.

Hopefully development will pick up speed over the next few weeks. My aim is to have a beta out some time in January.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Inspired by Sam Ruby’s work on applying the Universal Feed Parser tests to the Ruby FeedTools, I’ve spent a little time this afternoon working on testing XML_Feed_Parser with that same test suite. There’s a lot of work to do!

UFP’s tests consist of a series of feed files, some well-formed, and some illformed, with a description and test condition defined at the top of the file. eg.

<!--
Description: channel description
Expect:      not bozo and feed['description'] == u'Example description'
-->
<rdf :RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/">
<channel rdf:about="http://example.com/index.rdf">
<description>Example description</description>
</channel>
</rdf>

So far all I’ve done is run a script through all the tests for well-formed feeds, testing whether XML_Feed_Parser throws an exception when I try and interpret them. When run against the current CVS, 1181 of the 1273 feeds parsed successfully and 92 failed. 68 of those failures were due to encoding problems (which I’ll try and work around, but won’t be able to cleanly fix until PHP has full unicode support), and another 17 were a result of not supporting CDF, leaving another seven I need to get fixed asap.

The next stage will be to translate the ‘Expect:’ values into something I can use in a PHP test case. I’ve done a little searching for a python lexer for PHP, but aside from this embedded interpreter that hasn’t had a release in nearly three years, I haven’t found one. Lacking the time to write such a beast myself, I suspect I’ll simply put together a series of regexps to do the translation necessary.

Of course, XML_Feed_Parser’s API differs in quite a number of ways from that of the Universal Feed Parser and so quite a few of those tests—unadjusted— would fail. As Sam points out, there would be numerous advantages to (roughly) sharing an API with the Universal Feed Parser, particularly in allowing programmers to easily switch between languages and in the fact that the documentation already written would also apply to XML_Feed_Parser which is (as yet) undocumented. I’m going to spend some time thinking through the implications of making some API adjustments to fit more closely, but I’d love input on how far I should go (is it worth breaking backwards compatibility?)

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

New Feed Parser version

15 October 2005 (10:10 am)

By James Stewart
Filed under: Announcements
Tagged: , , , , , ,

There’s a new version (0.2.5alpha) of XML_Feed_Parser in the wild.

I’ve cleaned up the handling of xml:base considerably, finally switching over to using PHP DOM’s baseURI attribute and checking the bases for all links returned, whether from link constructs or in text constructs with the type ‘xhtml’. That coincides with reworking the getText() and getContent() methods for atom so they now properly recognise the different types that the atom spec allows. There’s a little more work to do to properly allow non text/html mime types in atom:content, since I don’t want to return less common mime types without a way for the user to check the type beforehand. That should come in the next version.

I’ve had one enquiry as to whether the package will work for those not using PEAR. It’s dependency on PEAR is limited to the PEAR_Exception class, so for those who genuinely need to use it without PEAR, you could declare an empty class called PEAR_Exception that extends Exception and everything else should work. For those wondering whether there’s any way to get the code working with PHP4, the answer is ‘no’ since I make so much use of the PHP5 DOM implementation, plus simplexml and exceptions. For those looking for PHP4 support, check out Magpie.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

XML_Feed_Parser CFV

3 October 2005 (8:26 am)

By James Stewart
Filed under: Announcements
Tagged: , , , , , , ,

Last night I initiated the Call For Votes on XML_Feed_Parser’s inclusion in PEAR. The package isn’t quite where I want it to be, but after several months’ work, the core functionality is all in place and the first round of unit tests are all passed successfully.

So if you’re a PEAR developer… please go and vote! (feedback is still welcome from those who aren’t)

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 
« Previous PageNext Page »