multi_validation
multi_validation
Validating multiple GTFS at once.
Classes
Name | Description |
---|---|
MultiGtfsInstance | Create a feed instance for multiple GTFS files. |
MultiGtfsInstance
multi_validation.MultiGtfsInstance(self, path)
Create a feed instance for multiple GTFS files.
This allows for multiple GTFS files to be cleaned, validated, summarised, filtered and saved at the same time.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
Union[str, list, pathlib.Path] | A list of paths, a singular path object, or a glob string. See more information on glob strings here: https://docs.python.org/3/library/glob.html | required |
Attributes
Name | Type | Description |
---|---|---|
paths | list | A list of the GTFS paths used to create the MultiGtfsInstance object. |
instances | list | A list of GtfsInstance objects created from self.paths. |
daily_trip_summary | pd.DataFrame | A combined summary of statistics for trips from all GTFS files in the MultiGtfsInstance. |
daily_route_summary | pd.DataFrame | A combined summary of statistics for routes from all GTFS files in the MultiGtfsInstance. |
Raises
Type | Description |
---|---|
TypeError | ‘path’ is not of type string or list. |
FileNotFoundError | One (or more) of the paths passed to ‘path’ does not exist. |
FileNotFoundError | There are no GTFS files found in the passed list of paths, or from the glob string. |
ValueError | Path has no file extension. |
ValueError | One (or more) of the paths passed are not of the filetype ‘.zip’. |
Returns
Type | Description |
---|---|
None |
Methods
Name | Description |
---|---|
clean_feeds | Clean each of the feeds in the MultiGtfsInstance. |
ensure_populated_calendars | Check if calendar is absent and creates one from calendar_dates. |
filter_to_bbox | Filter GTFS to a bbox. |
filter_to_date | Filter each GTFS to date(s). |
get_dates | Get all available dates from calendar.txt (or calendar_dates.txt). |
is_valid | Validate each of the feeds in the MultiGtfsInstance. |
plot_service | Create a line plot of route or trip counts over time. |
save_feeds | Save the GtfsInstances to a directory. |
summarise_routes | Summarise the combined GTFS data by route_id. |
summarise_trips | Summarise the combined GTFS data by trip_id. |
validate_empty_feeds | Ensure the feeds in MultiGtfsInstance are not empty. |
viz_stops | Visualise all stops from all of the GTFS files. |
clean_feeds
multi_validation.MultiGtfsInstance.clean_feeds(clean_kwargs=None)
Clean each of the feeds in the MultiGtfsInstance.
Parameters
Name | Type | Description | Default |
---|---|---|---|
clean_kwargs |
Union[dict, None] | The kwargs to pass to GtfsInstance.clean_feed() for each Gtfs in the MultiGtfsInstance, by default None | None |
Returns
Type | Description |
---|---|
None |
ensure_populated_calendars
multi_validation.MultiGtfsInstance.ensure_populated_calendars()
Check if calendar is absent and creates one from calendar_dates.
Shallow wrapper around GtfsInstance.ensure_populated_calendar().
Returns
Type | Description |
---|---|
None |
filter_to_bbox
multi_validation.MultiGtfsInstance.filter_to_bbox(bbox, crs='epsg:4326', delete_empty_feeds=False)
Filter GTFS to a bbox.
Parameters
Name | Type | Description | Default |
---|---|---|---|
bbox |
Union[list, GeoDataFrame] | The bbox to filter the GTFS to. Leave as none if the GTFS does not need to be cropped. Format - [xmin, ymin, xmax, ymax] | required |
crs |
Union[str, int] | The CRS of the given bbox, by default “epsg:4326” | 'epsg:4326' |
delete_empty_feeds |
bool | Whether or not to remove empty feeds, by default False | False |
Returns
Type | Description |
---|---|
None |
filter_to_date
multi_validation.MultiGtfsInstance.filter_to_date(dates, delete_empty_feeds=False)
Filter each GTFS to date(s).
Parameters
Name | Type | Description | Default |
---|---|---|---|
dates |
Union[str, list] | The date(s) to filter the GTFS to | required |
delete_empty_feeds |
bool | Whether or not to remove empty feeds, by default False | False |
Returns
Type | Description |
---|---|
None |
get_dates
multi_validation.MultiGtfsInstance.get_dates(return_range=True)
Get all available dates from calendar.txt (or calendar_dates.txt).
Parameters
Name | Type | Description | Default |
---|---|---|---|
return_range |
bool | Whether to return the raw dates, or the min/max range, by default True | True |
Returns
Type | Description |
---|---|
list | Either the full set of dates, or the range that the dates span between |
is_valid
multi_validation.MultiGtfsInstance.is_valid(validation_kwargs=None)
Validate each of the feeds in the MultiGtfsInstance.
Parameters
Name | Type | Description | Default |
---|---|---|---|
validation_kwargs |
Union[dict, None] | The kwargs to pass to GtfsInstance.is_valid() for each Gtfs in the MultiGtfsInstance, by default None | None |
Returns
Type | Description |
---|---|
self.validity_df : pd.DataFrame | A dataframe containing the validation messages from all of the GtfsInstances. |
plot_service
multi_validation.MultiGtfsInstance.plot_service(service_type='routes', route_type=True, width=1000, height=550, title=None, plotly_kwargs=None, rolling_average=None, line_date=None)
Create a line plot of route or trip counts over time.
Parameters
Name | Type | Description | Default |
---|---|---|---|
service_type |
str | Whether to plot ‘routes’ or ‘trips’. By default ‘routes’. | 'routes' |
route_type |
bool | Whether or not to draw a line for each modality, by default True | True |
width |
int | Plot width, by default 1000 | 1000 |
height |
int | Plot height, by default 550 | 550 |
title |
str | Plot title, by default None | None |
plotly_kwargs |
dict | Kwargs to pass to plotly.express.line, by default None | None |
rolling_average |
Union[int, None] | How many days to calculate the rolling average over. When left as None, rolling average is not used. The rolling average is calculated from the centre, meaning if ra=3, the average will be calculated from the current date, previous date and following date. Missing dates are imputed and treated as having values of 0. | None |
line_date |
Union[str, None] | A date to draw a dashed vertical line on. Date should be in format: YYYY-MM-DD, by default None | None |
Returns
Type | Description |
---|---|
go.Figure | The timerseries plot |
save_feeds
multi_validation.MultiGtfsInstance.save_feeds(dir, suffix='_new', file_names=None, overwrite=False)
Save the GtfsInstances to a directory.
Parameters
Name | Type | Description | Default |
---|---|---|---|
dir |
Union[pathlib.Path, str] | The directory to export the GTFS files into. | required |
suffix |
str | The suffix to apply to save names. The ‘file_name’ param takes priority here. | '_new' |
file_names |
list | A list of save names for the altered GTFS. The list must be the same length as the number of GTFS instances. Takes priority over the ‘suffix’ param. Names will be used in order of the instances (access using self.instances()). | None |
overwrite |
bool | Whether or not to overwrite the pre-existing saves with matching paths. | False |
Returns
Type | Description |
---|---|
None |
summarise_routes
multi_validation.MultiGtfsInstance.summarise_routes(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True, to_days=False, sort_by_route_type=False)
Summarise the combined GTFS data by route_id.
Parameters
Name | Type | Description | Default |
---|---|---|---|
summ_ops |
list | A list of numpy operators to gather a summary on. Accepts operators (e.g., np.min) or strings (“min”), by default [np.min, np.max, np.mean, np.median]. | [np.min, np.max, np.mean, np.median] |
return_summary |
bool | When set to False, full data for each trip on each date will be returned, by default True. | True |
to_days |
bool | Whether or not to aggregate to days, or to just return counts for trips/routes for each date. When False, summ_ops becomes useless, and should therefore nothing should be passed when calling this function (so it remains as the default), by default False. | False |
sort_by_route_type |
bool | Whether or not to sort the resulting dataframe by route_type. This only impacts the resulting df when to_days=True, by default False. | False |
Returns
Type | Description |
---|---|
pd.DataFrame | A dataframe containing the summary |
summarise_trips
multi_validation.MultiGtfsInstance.summarise_trips(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True, to_days=False, sort_by_route_type=False)
Summarise the combined GTFS data by trip_id.
Parameters
Name | Type | Description | Default |
---|---|---|---|
summ_ops |
list | A list of numpy operators to summarise with. Accepts operators (e.g., np.min) or strings (“min”) ,by default [np.min, np.max, np.mean, np.median] | [np.min, np.max, np.mean, np.median] |
return_summary |
bool | When set to False, full data for each trip on each date will be returned, by default True. | True |
to_days |
bool | Whether or not to aggregate to days, or to just return counts for trips/routes for each date. When False, summ_ops becomes useless, and therefore nothing should be passed when calling this function (so it remains as the default), by default False. | False |
sort_by_route_type |
bool | Whether or not to sort the resulting dataframe by route_type. This only impacts the resulting df when to_days=True, by default False. | False |
Returns
Type | Description |
---|---|
pd.DataFrame | A dataframe containing the summary |
validate_empty_feeds
multi_validation.MultiGtfsInstance.validate_empty_feeds(delete=False)
Ensure the feeds in MultiGtfsInstance are not empty.
Parameters
Name | Type | Description | Default |
---|---|---|---|
delete |
bool | Whether or not to delete the empty feeds, by default False | False |
Returns
Type | Description |
---|---|
list | A list of feeds that are empty and their index in MultiGtfsInstance.instances |
viz_stops
multi_validation.MultiGtfsInstance.viz_stops(path=None, return_viz=True, filtered_only=True)
Visualise all stops from all of the GTFS files.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
Union[str, pathlib.Path] | The path to save the folium map to, by default None. | None |
return_viz |
bool | Whether or not to return the folium map object, by default True. | True |
filtered_only |
bool | Whether to filter the stops that are plotted to only stop_id’s that are present in the stop_times table, by default True. | True |
Returns
Type | Description |
---|---|
folium.Map | A folium map with all stops plotted on it. |
None | Returns none if ‘return_viz’ is False. |
Raises
Type | Description |
---|---|
ValueError | An error is raised if both path and return_viz parameters are None as the map won’t be saved or returned. |