In the process of building my bus route app, I realised that half the data for bus stops is missing. While the site’s developers have done a good job of providing clear data on half the stops, if you want to see stops going in the other direction, you have to use a drop-down box that triggers an AJAX request and repopulates the table.
A little digging shows that the call is to:
The other gotcha is that it seems the internal IDs for some routes don’t match their route numbers. If you try and retrieve the westbound stops for Route #14 the call is actually to:
and when you make requests for route 13, the routeID passed is 14. The same disparity continues, suggesting that they’ve (sensibly) added primary keys to their database other than the route number. It turns out that ID is embedded in the markup within a comment showing the direction and the ID. For Route #50 that is:
<div id="stopListWrapper"> <!-- E -> 19 --> <div id="stopList"> ... </div> </div>
Since the document is already being parsed using hpricot, we can get that with:
internal_route_id = doc.at("div#stopListWrapper").children.to_s.match(/\-\> (\d+) \-\-\>/)
(get the div, note that the comment is the second child, and get the data with a regular expression)
I’ve updated my scraper and the service to grab data based on the correct IDs. The HTML views will follow suit shortly.