Popular Entries
|
Sunday, July 13. 2008Screen-scraping Melbourne's TramTracker information.Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
hey cool, that's great.
FYI, you might want to look into HTML::TokeParser as well. I've used it for screen-scraping recalcitrant web sites in the past (like the old trading post site before sensis took it over...the new sensis-developed site is even worse and i haven't bothered.) another useful module is XML::Mini::Document. I've used it along with HTML::Tokeparser. H:T to scrape HTML to find which XML documents to fetch, X:M:D to parse them.
There seem to be a few of these types of things popping up.
Another Tram Tracker mashup http://trams.ctyzn.com/ Timetable browser for iPhone http://iphone.itransit.com.au/
Good post. I tried building a J2ME MIDlet for a slicker Tram Tracker UI on Symbian/MS phones, however the sticking point has been not failure to open/close tags in the "XHTML", but the the use of unquoted old-style HTML attributes. Both mobile oriented parsers I tried (kxml2 and built-in JSR178 SAX) operate in non-validating modes but nevertheless fatally stumbled on this sort of thing:
The HTML is rubbish - I can imagine they had someone restyle/reformat the HTML that didn't really know what they were doing, then still called it all XHTML. |
Calendar
Recent Entries
Blog AdministrationFurther ReadingLicence
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. ![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||