National rail timetable parser
by Edward Betts
It would be nice to have a machine readable copy of the UK National rail timetable. Network rail supply a timetable in PDF format. It is possible to extract machine readable timetable data from this PDF.
Requirements
- pdftops
- Portable Document Format (PDF) to PostScript converter
Ships as part of Poppler (GPL 2)
To install on Debian or Ubuntu run:
apt-get install poppler-utils
- CompleteTimetable.pdf
- Download the timetable PDF from Network Rail (60M)
- Perl
- Also need the modules Data::Dump and
List::MoreUtils.
On Debian or Ubuntu run:
apt-get install libdata-dump-perl liblist-moreutils-perl
- Rail timetable parser
- Download parser
Usage
Put CompleteTimetable.pdf in a directory with parse. Run parse:
perl parse
The first time it is run it will call pdftops to convert the PDF, a binary format, into PostScript, a text format, which is easy to work with.
Then it will print lots of debugging output about pages, timetables and trains.
Output
Here is an sample of output in a range of formats:
Further work
This code is unfinished, lots of cases are not handled. Specifically:
- Base notes
- Head notes
- Date ranges
- Train flags
- Trains that join or split
- Repeat trains: "and at the same minutes past each hour until"
Contact
Contact me if you need any help understanding the code: edwardbetts@gmail.com