validation
validation
Validating GTFS data.
Classes
Name | Description |
---|---|
GtfsInstance | Create a feed instance for validation, cleaning & visualisation. |
GtfsInstance
validation.GtfsInstance(self, gtfs_pth, units='km', route_lookup_pth=None)
Create a feed instance for validation, cleaning & visualisation.
Parameters
Name | Type | Description | Default |
---|---|---|---|
gtfs_pth |
Union[str, bytes, os.PathLike] | File path to GTFS archive. | required |
units |
(str, optionl) | Spatial units of the GTFS file, defaults to “km”. | 'km' |
route_lookup_pth |
Union[str, pathlib.Path] | The path to the route type lookup. If left empty, the default path will be used. When None, route lookup table is read from file saved within this package, defaults to None. | None |
Attributes
Name | Type | Description |
---|---|---|
feed | gtfs_kit.Feed | A gtfs_kit feed produced using the files at gtfs_pth on init. |
gtfs_path | Union[str, pathlib.Path] | The path to the GTFS archive. |
file_list | list | Files in the GTFS archive. |
validity_df | pd.DataFrame | Table of GTFS errors, warnings & their descriptions. |
dated_trip_counts | pd.DataFrame | Dated trip counts by modality. |
daily_trip_summary | pd.DataFrame | Summarised trip results by day of the week and modality. |
daily_route_summary | pd.DataFrame | Dated route counts by modality. |
route_mode_summary_df | pd.DataFrame | Summarised route counts by day of the week and modality. |
pre_processed_trips | pd.DataFrame | A table of pre-processed trip data. |
Raises
Type | Description |
---|---|
TypeError | 1. pth is not either of string or pathlib.Path. 2. units is not of type str. |
FileExistsError | pth does not exist on disk. |
ValueError | 1. pth does not have the expected file extension(s). 2. units are not one of: “m”, “km”, “metres”, “meters”, “kilometres”, “kilometers”. |
Methods
Name | Description |
---|---|
clean_feed | Attempt to clean feed using gtfs_kit . |
ensure_populated_calendar | If calendar is absent, creates one from calendar_dates. |
filter_to_bbox | Very shallow wrapper around filter_gfts(). |
filter_to_date | Very shallow wrapper around filter_gtfs(). |
get_gtfs_files | Return a list of files present in the GTFS feed. |
get_route_modes | Summarise the available routes by their associated route_type . |
html_report | Generate a HTML report describing the GTFS data. |
is_valid | Check a feed is valid with gtfs_kit . |
print_alerts | Print validity errors & warning messages in full. |
save | Save the cleaned gtfs feed. |
summarise_routes | Produce a summarised table of route statistics by day of week. |
summarise_trips | Produce a summarised table of trip statistics by day of week. |
viz_stops | Visualise the stops on a map as points or convex hull. Writes file. |
clean_feed
validation.GtfsInstance.clean_feed(validate=False, fast_travel=False)
Attempt to clean feed using gtfs_kit
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
validate |
bool | Whether or not to validate the dataframe before cleaning, by default False. | False |
fast_travel |
bool | Whether or not to clean warnings related to fast travel, by default False. | False |
ensure_populated_calendar
validation.GtfsInstance.ensure_populated_calendar()
If calendar is absent, creates one from calendar_dates.
Saves calendar table to feed.calendar. Shallow wrapper around gtfs.calendar.create_calendar_from_dates.
Raises
Type | Description |
---|---|
FileNotFoundError | Calendar and calendar_dates are missing, GTFS is invalid. |
filter_to_bbox
validation.GtfsInstance.filter_to_bbox(bbox, crs='epsg:4326')
Very shallow wrapper around filter_gfts().
Filters GTFS to a bbox.
Parameters
Name | Type | Description | Default |
---|---|---|---|
bbox |
Union[list, GeoDataFrame] | The bbox to filter the GTFS to. Leave as none if the GTFS does not need to be cropped. Format - [xmin, ymin, xmax, ymax] | required |
crs |
Union[str, int] | The CRS of the given bbox, by default “epsg:4326” | 'epsg:4326' |
Returns
Type | Description |
---|---|
None |
filter_to_date
validation.GtfsInstance.filter_to_date(dates)
Very shallow wrapper around filter_gtfs().
Filters GTFS to date(s)
Parameters
Name | Type | Description | Default |
---|---|---|---|
dates |
Union[str, list] | The date(s) to filter the GTFS to | required |
Returns
Type | Description |
---|---|
None |
get_gtfs_files
validation.GtfsInstance.get_gtfs_files()
Return a list of files present in the GTFS feed.
Returns
Type | Description |
---|---|
list | A list of files that present within the GTFS file |
get_route_modes
validation.GtfsInstance.get_route_modes()
Summarise the available routes by their associated route_type
.
Returns
Type | Description |
---|---|
pd.core.frame.DataFrame | Summary table of route counts by transport mode. |
html_report
validation.GtfsInstance.html_report(report_dir='outputs', overwrite=False, summary_type='mean', extended_validation=False, clean_feed=True)
Generate a HTML report describing the GTFS data.
Parameters
Name | Type | Description | Default |
---|---|---|---|
report_dir |
Union[str, pathlib.Path] | The directory to save the report to, by default “outputs” | 'outputs' |
overwrite |
bool | Whether or not to overwrite the existing report if it already exists in the report_dir, by default False | False |
summary_type |
str | The type of summary to show on the summaries on the gtfs report, by default “mean” | 'mean' |
extended_validation |
bool | Whether or not to create extended reports for gtfs validation errors/warnings, by default False | False |
clean_feed |
bool | Whether or not to clean the feed before validating, by default True | True |
Returns
Type | Description |
---|---|
None |
Raises
Type | Description |
---|---|
ValueError | An error raised if the type of summary passed is invalid |
is_valid
validation.GtfsInstance.is_valid(far_stops=False)
Check a feed is valid with gtfs_kit
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
far_stops |
bool | Whether or not to perform validation for far stops (both between consecutive stops and over multiple stops), by default False. | False |
Returns
Type | Description |
---|---|
pd.core.frame.DataFrame | Table of errors, warnings & their descriptions. |
print_alerts
validation.GtfsInstance.print_alerts(alert_type='error')
Print validity errors & warning messages in full.
Parameters
Name | Type | Description | Default |
---|---|---|---|
alert_type |
str | The alert type to print. Also accepts “warning”. Defaults to “error”. | 'error' |
Returns
Type | Description |
---|---|
None |
Raises
Type | Description |
---|---|
AttributeError | No validity_df() attrubute was found. |
UserWarning | No alerts of the specified alert_type were found. |
save
validation.GtfsInstance.save(path, overwrite=False)
Save the cleaned gtfs feed.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
Union[str, pathlib.Path] | The path to save the GTFS file to. E.g., outputs/cleaned_gtfs.zip | required |
overwrite |
bool | Whether or not to overwrite any pre-existing files at the given path | False |
Returns
Type | Description |
---|---|
None |
summarise_routes
validation.GtfsInstance.summarise_routes(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True)
Produce a summarised table of route statistics by day of week.
For route count summaries, function counts route_id only, irrespective of which service_id the routes map to. If the services run on different calendar days, they will be counted separately. In cases where more than one service runs the same route on the same day, these will not be counted as distinct routes.
Parameters
Name | Type | Description | Default |
---|---|---|---|
summ_ops |
list | A list of operators used to get a summary of a given day, by default [np.min, np.max, np.mean, np.median]. | [np.min, np.max, np.mean, np.median] |
return_summary |
bool | When True, a summary is returned. When False, route data for each date is returned, by default True. | True |
Returns
Type | Description |
---|---|
pd.DataFrame | A dataframe containing either summarised results or dated route data. |
Raises
Type | Description |
---|---|
TypeError | 1. return_summary is not of type pd.df. 2. summ_ops must be a numpy function or a list. 3. Each item in a summ_ops list must be a function. 4. Each item in a summ_ops list must be a numpy namespace export. |
NotImplementedError | summ_ops is a function not exported from numpy. |
summarise_trips
validation.GtfsInstance.summarise_trips(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True)
Produce a summarised table of trip statistics by day of week.
For trip count summaries, function counts distinct trip_id only. These are then summarised into average/median/min/max (default) number of trips per day. Raw data for each date can also be obtained by setting the ‘return_summary’ parameter to False (bool).
Parameters
Name | Type | Description | Default |
---|---|---|---|
summ_ops |
list | A list of operators used to get a summary of a given day, by default [np.min, np.max, np.mean, np.median]. | [np.min, np.max, np.mean, np.median] |
return_summary |
bool | When True, a summary is returned. When False, trip data for each date is returned, by default True. | True |
Returns
Type | Description |
---|---|
pd.DataFrame | A dataframe containing either summarised results or dated trip data. |
Raises
Type | Description |
---|---|
TypeError | 1. return_summary is not of type pd.df. 2. summ_ops must be a numpy function or a list. 3. Each item in a summ_ops list must be a function. 4. Each item in a summ_ops list must be a numpy namespace export. |
NotImplementedError | summ_ops is a function not exported from numpy. |
viz_stops
validation.GtfsInstance.viz_stops(out_pth, geoms='point', geom_crs=27700, create_out_parent=False, filtered_only=True)
Visualise the stops on a map as points or convex hull. Writes file.
Parameters
Name | Type | Description | Default |
---|---|---|---|
out_pth |
Union[str, pathlib.Path] | Path to write the map file html document to, including the file name. Must end with ‘.html’ file extension. | required |
geoms |
str | Type of map to plot. If geoms=point (the default) uses gtfs_kit to map point locations of available stops. If geoms=hull , calculates the convex hull & its area, defaults to “point”. |
'point' |
geom_crs |
Union[str, int] | Geometric CRS to use for the calculation of the convex hull area only, defaults to “27700” (OSGB36, British National Grid). | 27700 |
create_out_parent |
bool | Should the parent directory of out_pth be created if not found, defaults to False. |
False |
filtered_only |
bool | When True, only stops referenced within stop_times.txt will be plotted. When False, stops referenced in stops.txt will be plotted. Note that gtfs_kit filtering behaviour removes stops from stop_times.txt but not stops.txt, defaults to True. | True |
Returns
Type | Description |
---|---|
None |
Raises
Type | Description |
---|---|
TypeError | 1. out_pth is not either of string or pathlib.PosixPath. 2. geoms is not of type str 3. geom_crs is not of type str or int 4. create_out_parent or filtered_only are not of type bool |
FileNotFoundError | Raised if the parent directory of out_pth could not be found on disk and create_out_parent is False. |
KeyError | The stops table has no ‘stops_code’ column. |
UserWarning | If the file extension of out_pth is not .html, the extension will be changed to .html. |