Feed Parser: Extensibility and xml:base

I’ve bumped the version number for the latest version of XML_Feed_Parser to 0.2.0 to mark a couple of major updates.

Firstly, I’ve changed the internal DOM models to public variables. While thinking through the possibilities for improving the extensibility of the package it occurred to me that almost any extension someone wanted to provide would revolve around some use of the DOM, so opening up the internal DOM model would make that nice and easy. Not only can you extend the classes in your own code, you can also do something like:

$feed = new XML_Feed_Parser($myFeed);
foreach ($feed as $entry) {
    $myDOMModel = $entry->model;
    $myDOMModel->getElementsByTagNameNS($myNS, $myTagName);
    // do something with the results
}

I’ve also begun to add support for xml:base. xml:base is very similar to the ’ base href’ concept in HTML which allows you to specify a base URL to which all children URLs are relative. xml:base can be updated right down the XML tree and so I had to write code that iterates right up the document, pulling out any xml:base attributes it finds and then processing them to work out what URLs in this particular element should be relative to.

The iteration is managed with:

while ($thisNode instanceof DOMElement) {
    if ($thisNode->hasAttributes()) {
        $test = $thisNode->attributes->getNamedItemNS($nameSpace, "base");
        if ($test) {
            array_push($bases, $test->nodeValue);
        }
    }
    $thisNode = $thisNode->parentNode;
}

and the parsing with:

$bases = array_reverse($bases);
foreach ($bases as $base) {
    if (preg_match("/^[A-Za-z]+:///", $base)) {
        $combinedBase = $base;
        preg_match("/^([A-Za-z]+://.*?)//", $base, $results);
        $firstLayer = $results[1];
    } else if (preg_match("/^//", $base)) {
        $combinedBase = $firstLayer . $base;
    } else {
        $combinedBase .= $base;
    }
}

The code seems to work quite nicely for link elements, and next up will be the task of applying it to non-link elements, such as atom:content. I’m also hoping to find some time to benchmark it. At present whenever an item object is instantiated it is handed the xml:base value it should inherit, so that we don’t have to go right to the root element for every query, but it may be that there is further caching that could be implemented to speed things up.