New Feed Parser version

There’s a new version (0.2.5alpha) of XML_Feed_Parser in the wild.

I’ve cleaned up the handling of xml:base considerably, finally switching over to using PHP DOM‘s baseURI attribute and checking the bases for all links returned, whether from link constructs or in text constructs with the type ‘xhtml’. That coincides with reworking the getText() and getContent() methods for atom so they now properly recognise the different types that the atom spec allows. There’s a little more work to do to properly allow non text/html mime types in atom:content, since I don’t want to return less common mime types without a way for the user to check the type beforehand. That should come in the next version.

I’ve had one enquiry as to whether the package will work for those not using PEAR. It’s dependency on PEAR is limited to the PEAR_Exception class, so for those who genuinely need to use it without PEAR, you could declare an empty class called PEAR_Exception that extends Exception and everything else should work. For those wondering whether there’s any way to get the code working with PHP4, the answer is ‘no’ since I make so much use of the PHP5 DOM implementation, plus simplexml and exceptions. For those looking for PHP4 support, check out Magpie.

Tags: , , , , , ,

12 comments

  1. Great work!

    I wonder if there is any docs or examples acting as a proof of concept.

    Thanks

    Dinh
    http://www.goldenkey.edu.vn/en

  2. Thanks. So far I’ve been focussed on getting the code as stable and effective as possible. I think that it’s now approaching stability and the next focus will be on documentation and testing.

    A very quick example is available at http://jystewart.net/code/feedparser/

    btw: if anyone knows of non-UTF8/ISO8859-1 feeds I could use for testing, that’d be really helpful.

  3. Your example is great. It works with my blog feed: http://www.phpvietnam.net/blog/index.php/feed/ (unicode – utf-8)

    Thank you very much and I will keep your solution in mind whenever I need a XML Parsing Library for PHP 5.

    Dinh
    http://www.goldenkey.edu.vn/en

  4. Thanks for the examples. Both of the Hungarian examples (marked ‘unsuccessful’) are actually RSS 0.92 which is not supported by the parser at present. I had been hoping not to have to support versions of RSS

  5. I have tried your example several times and got an error when parsing http://www.bytefx.com/blog/SyndicationService.asmx/GetRssCategory?categoryName=MySQL

    The error is displayed as follows:

    Fatal error: Maximum execution time of 30 seconds exceeded in C:\server\webroot\phplearning\XML_Feed_Parser.php on line 23

    Is there is any way to reset that default time out setting?

    Thanks a lot.

    Dinh
    http://www.goldenkey.edu.vn/en

  6. That’s defined in your php.ini file. You’ll need to change the ‘max_execution_time’ directive.

    Are you sure that the problem was with the parser and not with code fetching the feed? If so, please file a bug report through the PEAR website. There’s a link from http://pear.php.net/package/XML_Feed_Parser to the bugs system.

  7. Hello James,

    Do you think that XML_Feed_Parser should throw a exception in case a user tries to parse an unsupported XML Format such as RSS 0.92. The current version of XML_Feed_Parser return a blank page so users can be confused: the feed has nothing to display or the parser can not get the information.

    I dont think this is a bug but a RFE. Please correct me if I am wrong.

    Thanks and best regards

    Dinh
    http://www.goldenkey.edu.vn/en

  8. Um. It’s supposed to throw an exception if it doesn’t recognise the feed type, hence using the try {} catch {} block in the example. And the package doesn’t ever return ‘pages’ — it’s the job of your code to turn its results into a page (or whatever else you want to produce)

    If it’s not throwing an exception, that’s a bug. If it’s timing out then that’s probably a bug too.

  9. Ah. There is indeed a bug whereby the main parser class doesn’t throw an exception if the feed type is a version of RSS that is not supported. That is fixed in CVS (I’ve also added 0.91 and 0.92 support to CVS) and will be resolved in the next version (which I hope to get out this weekend).

  10. I just upgraded XML_Feed_Parser to version 0.26 and am happy to find that the exception has been improved. I have tested it with Atom 0.3: http://b2evolution.net/xmlsrv/atom.php?blog=7

    I changed my script a bit ( put $feed_xml = file_get_contents($feedFile); into the try.. catch block) with a hope that an exception will be thrown when it is instructed to parse a non-existent feed or not-a-feed XML/HTML file but it seems that these problems has not been covered yet.

    Besides, an fatal error (not an exception) will occur if the connection fails, a website on and up but it is behind a firewall.

    Thanks and best regards,

    Dinh

  11. Where the file_get_contents() sits is not going to change whether or not the feed parser throws an error, but you’re right that it should be checking for empty/non-XML input and throwing an exception in that case. That will be in the next version.

    As to the fatal error, this package deliberately does not provide HTTP functionality. How you retrieve the feed is your business and not something that will be addressed here. The example I offer is a very simple one as I am demonstrating the parser, not feed fetching. Obviously in production environments I use much more robust approaches.