National rail timetable parser

by Edward Betts

It would be nice to have a machine readable copy of the UK National rail timetable. Network rail supply a timetable in PDF format. It is possible to extract machine readable timetable data from this PDF.

Requirements

pdftops
Portable Document Format (PDF) to PostScript converter
Ships as part of Poppler (GPL 2)
To install on Debian or Ubuntu run:
apt-get install poppler-utils
CompleteTimetable.pdf
Download the timetable PDF from Network Rail (60M)
Perl
Also need the modules Data::Dump and List::MoreUtils.
On Debian or Ubuntu run:
apt-get install libdata-dump-perl liblist-moreutils-perl
Rail timetable parser
Download parser

Usage

Put CompleteTimetable.pdf in a directory with parse. Run parse: perl parse The first time it is run it will call pdftops to convert the PDF, a binary format, into PostScript, a text format, which is easy to work with. Then it will print lots of debugging output about pages, timetables and trains.

Output

Here is an sample of output in a range of formats:

Further work

This code is unfinished, lots of cases are not handled. Specifically:

Contact

Contact me if you need any help understanding the code: edwardbetts@gmail.com