validation

validation

Validating GTFS data.

Classes

Name Description
GtfsInstance Create a feed instance for validation, cleaning & visualisation.

GtfsInstance

validation.GtfsInstance(self, gtfs_pth, units='km', route_lookup_pth=None)

Create a feed instance for validation, cleaning & visualisation.

Parameters

Name Type Description Default
gtfs_pth Union[str, bytes, os.PathLike] File path to GTFS archive. required
units (str, optionl) Spatial units of the GTFS file, defaults to “km”. 'km'
route_lookup_pth Union[str, pathlib.Path] The path to the route type lookup. If left empty, the default path will be used. When None, route lookup table is read from file saved within this package, defaults to None. None

Attributes

Name Type Description
feed gtfs_kit.Feed A gtfs_kit feed produced using the files at gtfs_pth on init.
gtfs_path Union[str, pathlib.Path] The path to the GTFS archive.
file_list list Files in the GTFS archive.
validity_df pd.DataFrame Table of GTFS errors, warnings & their descriptions.
dated_trip_counts pd.DataFrame Dated trip counts by modality.
daily_trip_summary pd.DataFrame Summarised trip results by day of the week and modality.
daily_route_summary pd.DataFrame Dated route counts by modality.
route_mode_summary_df pd.DataFrame Summarised route counts by day of the week and modality.
pre_processed_trips pd.DataFrame A table of pre-processed trip data.

Raises

Type Description
TypeError 1. pth is not either of string or pathlib.Path. 2. units is not of type str.
FileExistsError pth does not exist on disk.
ValueError 1. pth does not have the expected file extension(s). 2. units are not one of: “m”, “km”, “metres”, “meters”, “kilometres”, “kilometers”.

Methods

Name Description
clean_feed Attempt to clean feed using gtfs_kit.
ensure_populated_calendar If calendar is absent, creates one from calendar_dates.
filter_to_bbox Very shallow wrapper around filter_gfts().
filter_to_date Very shallow wrapper around filter_gtfs().
get_gtfs_files Return a list of files present in the GTFS feed.
get_route_modes Summarise the available routes by their associated route_type.
html_report Generate a HTML report describing the GTFS data.
is_valid Check a feed is valid with gtfs_kit.
print_alerts Print validity errors & warning messages in full.
save Save the cleaned gtfs feed.
summarise_routes Produce a summarised table of route statistics by day of week.
summarise_trips Produce a summarised table of trip statistics by day of week.
viz_stops Visualise the stops on a map as points or convex hull. Writes file.
clean_feed

validation.GtfsInstance.clean_feed(validate=False, fast_travel=False)

Attempt to clean feed using gtfs_kit.

Parameters
Name Type Description Default
validate bool Whether or not to validate the dataframe before cleaning, by default False. False
fast_travel bool Whether or not to clean warnings related to fast travel, by default False. False
ensure_populated_calendar

validation.GtfsInstance.ensure_populated_calendar()

If calendar is absent, creates one from calendar_dates.

Saves calendar table to feed.calendar. Shallow wrapper around gtfs.calendar.create_calendar_from_dates.

Raises
Type Description
FileNotFoundError Calendar and calendar_dates are missing, GTFS is invalid.
filter_to_bbox

validation.GtfsInstance.filter_to_bbox(bbox, crs='epsg:4326')

Very shallow wrapper around filter_gfts().

Filters GTFS to a bbox.

Parameters
Name Type Description Default
bbox Union[list, GeoDataFrame] The bbox to filter the GTFS to. Leave as none if the GTFS does not need to be cropped. Format - [xmin, ymin, xmax, ymax] required
crs Union[str, int] The CRS of the given bbox, by default “epsg:4326” 'epsg:4326'
Returns
Type Description
None
filter_to_date

validation.GtfsInstance.filter_to_date(dates)

Very shallow wrapper around filter_gtfs().

Filters GTFS to date(s)

Parameters
Name Type Description Default
dates Union[str, list] The date(s) to filter the GTFS to required
Returns
Type Description
None
get_gtfs_files

validation.GtfsInstance.get_gtfs_files()

Return a list of files present in the GTFS feed.

Returns
Type Description
list A list of files that present within the GTFS file
get_route_modes

validation.GtfsInstance.get_route_modes()

Summarise the available routes by their associated route_type.

Returns
Type Description
pd.core.frame.DataFrame Summary table of route counts by transport mode.
html_report

validation.GtfsInstance.html_report(report_dir='outputs', overwrite=False, summary_type='mean', extended_validation=False, clean_feed=True)

Generate a HTML report describing the GTFS data.

Parameters
Name Type Description Default
report_dir Union[str, pathlib.Path] The directory to save the report to, by default “outputs” 'outputs'
overwrite bool Whether or not to overwrite the existing report if it already exists in the report_dir, by default False False
summary_type str The type of summary to show on the summaries on the gtfs report, by default “mean” 'mean'
extended_validation bool Whether or not to create extended reports for gtfs validation errors/warnings, by default False False
clean_feed bool Whether or not to clean the feed before validating, by default True True
Returns
Type Description
None
Raises
Type Description
ValueError An error raised if the type of summary passed is invalid
is_valid

validation.GtfsInstance.is_valid(far_stops=False)

Check a feed is valid with gtfs_kit.

Parameters
Name Type Description Default
far_stops bool Whether or not to perform validation for far stops (both between consecutive stops and over multiple stops), by default False. False
Returns
Type Description
pd.core.frame.DataFrame Table of errors, warnings & their descriptions.
print_alerts

validation.GtfsInstance.print_alerts(alert_type='error')

Print validity errors & warning messages in full.

Parameters
Name Type Description Default
alert_type str The alert type to print. Also accepts “warning”. Defaults to “error”. 'error'
Returns
Type Description
None
Raises
Type Description
AttributeError No validity_df() attrubute was found.
UserWarning No alerts of the specified alert_type were found.
save

validation.GtfsInstance.save(path, overwrite=False)

Save the cleaned gtfs feed.

Parameters
Name Type Description Default
path Union[str, pathlib.Path] The path to save the GTFS file to. E.g., outputs/cleaned_gtfs.zip required
overwrite bool Whether or not to overwrite any pre-existing files at the given path False
Returns
Type Description
None
summarise_routes

validation.GtfsInstance.summarise_routes(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True)

Produce a summarised table of route statistics by day of week.

For route count summaries, function counts route_id only, irrespective of which service_id the routes map to. If the services run on different calendar days, they will be counted separately. In cases where more than one service runs the same route on the same day, these will not be counted as distinct routes.

Parameters
Name Type Description Default
summ_ops list A list of operators used to get a summary of a given day, by default [np.min, np.max, np.mean, np.median]. [np.min, np.max, np.mean, np.median]
return_summary bool When True, a summary is returned. When False, route data for each date is returned, by default True. True
Returns
Type Description
pd.DataFrame A dataframe containing either summarised results or dated route data.
Raises
Type Description
TypeError 1. return_summary is not of type pd.df. 2. summ_ops must be a numpy function or a list. 3. Each item in a summ_ops list must be a function. 4. Each item in a summ_ops list must be a numpy namespace export.
NotImplementedError summ_ops is a function not exported from numpy.
summarise_trips

validation.GtfsInstance.summarise_trips(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True)

Produce a summarised table of trip statistics by day of week.

For trip count summaries, function counts distinct trip_id only. These are then summarised into average/median/min/max (default) number of trips per day. Raw data for each date can also be obtained by setting the ‘return_summary’ parameter to False (bool).

Parameters
Name Type Description Default
summ_ops list A list of operators used to get a summary of a given day, by default [np.min, np.max, np.mean, np.median]. [np.min, np.max, np.mean, np.median]
return_summary bool When True, a summary is returned. When False, trip data for each date is returned, by default True. True
Returns
Type Description
pd.DataFrame A dataframe containing either summarised results or dated trip data.
Raises
Type Description
TypeError 1. return_summary is not of type pd.df. 2. summ_ops must be a numpy function or a list. 3. Each item in a summ_ops list must be a function. 4. Each item in a summ_ops list must be a numpy namespace export.
NotImplementedError summ_ops is a function not exported from numpy.
viz_stops

validation.GtfsInstance.viz_stops(out_pth, geoms='point', geom_crs=27700, create_out_parent=False, filtered_only=True)

Visualise the stops on a map as points or convex hull. Writes file.

Parameters
Name Type Description Default
out_pth Union[str, pathlib.Path] Path to write the map file html document to, including the file name. Must end with ‘.html’ file extension. required
geoms str Type of map to plot. If geoms=point (the default) uses gtfs_kit to map point locations of available stops. If geoms=hull, calculates the convex hull & its area, defaults to “point”. 'point'
geom_crs Union[str, int] Geometric CRS to use for the calculation of the convex hull area only, defaults to “27700” (OSGB36, British National Grid). 27700
create_out_parent bool Should the parent directory of out_pth be created if not found, defaults to False. False
filtered_only bool When True, only stops referenced within stop_times.txt will be plotted. When False, stops referenced in stops.txt will be plotted. Note that gtfs_kit filtering behaviour removes stops from stop_times.txt but not stops.txt, defaults to True. True
Returns
Type Description
None
Raises
Type Description
TypeError 1. out_pth is not either of string or pathlib.PosixPath. 2. geoms is not of type str 3. geom_crs is not of type str or int 4. create_out_parent or filtered_only are not of type bool
FileNotFoundError Raised if the parent directory of out_pth could not be found on disk and create_out_parent is False.
KeyError The stops table has no ‘stops_code’ column.
UserWarning If the file extension of out_pth is not .html, the extension will be changed to .html.