Screen-scraping Melbourne’s TramTracker information.

Melbourne’s tram operator, Yarra Trams, provides a web and sms system called TramTracker, that can tell you the time of the next tram that will arrive at any given stop, using a combination of real-time information and scheduled timetables. It uses the same system that drives the passenger information displays that can be seen around inner-city tram stops.

The web-service is pretty nasty, however. It doesn’t render very well for me using Galeon, and worse, it doesn’t keep any state information, so you have to keep retyping the tram-stop code every time you want to look up the information on your tram. And having to launch a web-browser to just look up the time of the next tram is annoying; it would be nicer to have either a command line interface, or perhaps even a small application running in a docked window.

It also assumes that you only wish to catch a tram from one stop; if, like me, you’re within walking distance of two or more different tram lines that can take you to a particular destination, then you have to do multiple lookups, which is a waste of time.

So, with this in mind, I pulled out Wireshark and had a look at the HTTP traffic that was being passed when making a request to the service. The following was the most interesting part:

%252C%2520Culture%253Dneutral%252C%2520PublicKeyToken%25 [blah blah blah…]
&__VIEWSTATE= [blah blah blah…]

The number 1919 was the tramstop code that I’d entered. So I quickly threw together a small web form, with hidden variables txtTrackerID, ddlRouteNo and btnPrediction, which sent a request to the tramtracker interface, but unfortunately this wasn’t enough and it kept returning to the start page.

After a bit of trial and error, I found that it also needed to be passed these variables: tkScriptManager, __EVENTTARGET, __EVENTARGUMENT, __LASTFOCUS and __VIEWSTATE. Fortunately it didn’t need any of the long-winded variables with public key tokens in them.

I was rather happy to find that the output from the service was XHTML, however this feeling soon dissipated when I discovered that whoever wrote this clearly didn’t have a clue that XML would only work if well-formed and that they hadn’t closed off any of their br or img tags. Sigh, so many useless “web programmers” out there, so few jail sentences. This ruled out using XML::Simple to parse it, and I had to settle for kludging it with HTML::TableExtract.

The upshot of all this is the NextTram perl script, which will return the times of the next trams arriving at multiple tram stops, sorted by time:

$ ./nexttram 1419 1259 1216
1:Sth Melb Beach:0
19:Flinders St City:6
55:Domain Interchange:10
1:Sth Melb Beach:13
19:Flinders St City:18
55:Domain Interchange:26
19:Flinders St City:31
55:Domain Interchange:39

While I realise that it has a limited potential audience (Linux/Unix users in inner Melbourne suburbs who actually care about what times trams run, ie, probably just me), I’ve released it under the GPL in the hope that it might go onto bigger and better things. Of course, it will probably just break next time Yarra Trams upgrades their website…

One response to “Screen-scraping Melbourne’s TramTracker information.

  1. Good post. I tried building a J2ME MIDlet for a slicker Tram Tracker UI on Symbian/MS phones, however the sticking point has been not failure to open/close tags in the “XHTML”, but the the use of unquoted old-style HTML attributes. Both mobile oriented parsers I tried (kxml2 and built-in JSR178 SAX) operate in non-validating modes but nevertheless fatally stumbled on this sort of thing:

    The HTML is rubbish – I can imagine they had someone restyle/reformat the HTML that didn’t really know what they were doing, then still called it all XHTML.

Leave a Reply

Your email address will not be published. Required fields are marked *

Anti-Spam Quiz: