a work on process

Viewing posts tagged: XML_Feed_Parser

New Feed Parser version

15 October 2005 (10:10 am)

By James Stewart
Filed under: Announcements
Tagged: , , , , , ,

There’s a new version (0.2.5alpha) of XML_Feed_Parser in the wild.

I’ve cleaned up the handling of xml:base considerably, finally switching over to using PHP DOM’s baseURI attribute and checking the bases for all links returned, whether from link constructs or in text constructs with the type ‘xhtml’. That coincides with reworking the getText() and getContent() methods for atom so they now properly recognise the different types that the atom spec allows. There’s a little more work to do to properly allow non text/html mime types in atom:content, since I don’t want to return less common mime types without a way for the user to check the type beforehand. That should come in the next version.

I’ve had one enquiry as to whether the package will work for those not using PEAR. It’s dependency on PEAR is limited to the PEAR_Exception class, so for those who genuinely need to use it without PEAR, you could declare an empty class called PEAR_Exception that extends Exception and everything else should work. For those wondering whether there’s any way to get the code working with PHP4, the answer is ‘no’ since I make so much use of the PHP5 DOM implementation, plus simplexml and exceptions. For those looking for PHP4 support, check out Magpie.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

XML_Feed_Parser CFV

3 October 2005 (8:26 am)

By James Stewart
Filed under: Announcements
Tagged: , , , , , , ,

Last night I initiated the Call For Votes on XML_Feed_Parser’s inclusion in PEAR. The package isn’t quite where I want it to be, but after several months’ work, the core functionality is all in place and the first round of unit tests are all passed successfully.

So if you’re a PEAR developer… please go and vote! (feedback is still welcome from those who aren’t)

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Feed Parser: Unit Testing Results

22 September 2005 (8:37 am)

By James Stewart
Filed under: XML
Tagged: , , , , , ,

We’ve been away for the past week and time on planes and trains allowed me to focus on some simple unit tests for XML_Feed_Parser. I based the tests on the sample feeds provided in each format’s specification and was able to fix numerous bugs that arose during the testing process, with the only one remaining being some problems handling XML entities.

The tests also allowed me to do some refactoring, identifying a few methods that could be eliminated or merged into the parent class since they were the same across all feed types.

The new version is available at the usual location.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Feed Parser: Extensibility and xml:base

11 September 2005 (12:44 pm)

By James Stewart
Filed under: Commentary
Tagged: , , , , , ,

I’ve bumped the version number for the latest version of XML_Feed_Parser to 0.2.0 to mark a couple of major updates.

Firstly, I’ve changed the internal DOM models to public variables. While thinking through the possibilities for improving the extensibility of the package it occurred to me that almost any extension someone wanted to provide would revolve around some use of the DOM, so opening up the internal DOM model would make that nice and easy. Not only can you extend the classes in your own code, you can also do something like:

$feed = new XML_Feed_Parser($myFeed);
foreach ($feed as $entry) {
    $myDOMModel = $entry->model;
    $myDOMModel->getElementsByTagNameNS($myNS, $myTagName);
    // do something with the results
}

I’ve also begun to add support for xml:base. xml:base is very similar to the ‘base href‘ concept in HTML which allows you to specify a base URL to which all children URLs are relative. xml:base can be updated right down the XML tree and so I had to write code that iterates right up the document, pulling out any xml:base attributes it finds and then processing them to work out what URLs in this particular element should be relative to.

The iteration is managed with:

while ($thisNode instanceof DOMElement) {
    if ($thisNode->hasAttributes()) {
        $test = $thisNode->attributes->getNamedItemNS($nameSpace, "base");
        if ($test) {
            array_push($bases, $test->nodeValue);
        }
    }
    $thisNode = $thisNode->parentNode;
}

and the parsing with:

$bases = array_reverse($bases);
foreach ($bases as $base) {
    if (preg_match("/^[A-Za-z]+:\/\//", $base)) {
        $combinedBase = $base;
        preg_match("/^([A-Za-z]+:\/\/.*?)\//", $base, $results);
        $firstLayer = $results[1];
    } else if (preg_match("/^\//", $base)) {
        $combinedBase = $firstLayer . $base;
    } else {
        $combinedBase .= $base;
    }
}

The code seems to work quite nicely for link elements, and next up will be the task of applying it to non-link elements, such as atom:content. I’m also hoping to find some time to benchmark it. At present whenever an item object is instantiated it is handed the xml:base value it should inherit, so that we don’t have to go right to the root element for every query, but it may be that there is further caching that could be implemented to speed things up.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 

Feed Parser: Atom:source, RSS1 modules and RSS2 improvements

10 September 2005 (7:04 pm)

By James Stewart
Filed under: Announcements
Tagged: , , , , ,

There’s another development version of XML_Feed_Parser up on the dev server. I’ve added support for atom:source and revised atom:author to work with that as per the spec, included the last few parts of core RSS2 support, added full support to the RSS1 module for the syndication and content modules, and partial support for the Dublin Core module.

I have yet to see any examples of RSS1 feeds using dc:type, dc:format, dc:identifier, dc:source, dc:language, dc:relation, or dc:coverage. If there’s sufficient demand I’ll add native support for them, but they’re not a critical concern at present.

It seems that the best way to support “official” extensions to the various feed formats is to roll them into the main classes, but I’m hoping to also add in a fallback option that will allow users to access the underlying models and retrieve simplexml objects representing elements that the package may not support. That seems to be the best way to allow for further extensibility at this point.

Recommend this post:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

 
« Previous PageNext Page »