Explanation

These explanation pages provide an understanding of the assess-gtfs package.

assess-gtfs allows users to validate, clean, inspect and filter transit timetable data in the General Transit Feed Specification (GTFS) format.

What is GTFS?

GTFS files are compressed zip archives of text files. Each text file containing information about routes, trips, calendar, stop locations and so on. Various transport modelling software are able to use these files as a relational database in order to undertake routing operations.

Below are the file contents of a small sample of UK GTFS.

.../tests/data/chester-20230816-small_gtfs/
├── agency.txt
├── calendar.txt
├── calendar_dates.txt
├── feed_info.txt
├── routes.txt
├── shapes.txt
├── stop_times.txt
├── stops.txt
└── trips.txt

1 directory, 9 files

Working with GTFS

If you would prefer a demonstration of assess-gtfs, please follow the tutorial.

Filtering GTFS

When undertaking routing operations with GTFS, you typically need to filter large feeds to an area of interest. This ensures that building a transport network with a package such as r5py is optimised. Feeds can be restricted based upon location with a bounding box. They can also be restricted to a date or list of dates within the feed calendar. For more on filtering GTFS, please see the assess-gtfs api docs.

Inspecting GTFS

Undertaking routing analysis tends to happen at a specific location and time or time window. It is important to assess the service distribution over the available dates within the GTFS. GTFS tend to come with a range of calendar dates, but the service volume across those dates can be variable and dependent upon the publication frequency of the specific feed.

The objective is to ensure a selected time of analysis is representative of average service volume within the feed. For a guide to doing this with assess-gtfs, please see the tutorial section on summarising GTFS.

Validating GTFS

When working with GTFS from a range of sources, it is important to understand whether the feed you intend to use is compliant. Online tools like that available on the French government’s Transport Data Portal are excellent choices for manual validation of a small number of feeds.

assess-gtfs produces tabular outputs for specification warnings and errors using gtfs_kit under the hood. Note that not all of these errors are as severe as they initially appear. For example, the below validation table is commonly seen when validating British GTFS:

type message table rows GTFS
0 error Invalid route_type; maybe has extra space char... routes [1, 2, 3, 4] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...
1 warning Unrecognized column agency_noc agency [] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...
2 warning Feed expired calendar [] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...
3 warning Repeated pair (route_short_name, route_long_name) routes [13] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...
4 warning Unrecognized column stop_direction_name stop_times [] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...
5 warning Unrecognized column platform_code stops [] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...
6 warning Unrecognized column trip_direction_name trips [] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...
7 warning Unrecognized column vehicle_journey_code trips [] /opt/hostedtoolcache/Python/3.11.9/x64/lib/pyt...

The first row in the validity table shows an apparent error, reporting “Invalid route_type; maybe has extra space characters”. Examining the routes table for the affected rows:

0      3
1    200
2    200
3    200
4    200
5      3
6      3
7      3
8      3
9      3
Name: route_type, dtype: int64

We see that rows 1 through 4 use route_type 200. Google have proposed an extension to GTFS route_type that many publishers of GTFS have adopted. Here you can see that route_type 200 means a coach service and would not cause a problem for most routing software. For more on validating GTFS feeds, consult the api reference for implementation details.

Cleaning GTFS

assess-gtfs can be used to attempt to resolve some of the identified problems in GTFS. To see how to do this, please follow along with the tutorial’s clean_feed section. Alternatively, visit the api documentation for more detail.

Note that cleaning for all specification alerts has not been implemented. To raise a feature request with the package maintainers, please do so on GitHub.