multi_validation
gtfs.multi_validation
Validating multiple GTFS at once.
Classes
| Name | Description |
|---|---|
| MultiGtfsInstance | Create a feed instance for multiple GTFS files. |
MultiGtfsInstance
gtfs.multi_validation.MultiGtfsInstance(self, path)
Create a feed instance for multiple GTFS files.
This allows for multiple GTFS files to be cleaned, validated, summarised, filtered and saved at the same time.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
path |
Union[str, list, pathlib.Path] | A list of paths, a singular path object, or a glob string. See more information on glob strings here: https://docs.python.org/3/library/glob.html | required |
Attributes
| Name | Type | Description |
|---|---|---|
| paths | list | A list of the GTFS paths used to create the MultiGtfsInstance object. |
| instances | list | A list of GtfsInstance objects created from self.paths. |
| daily_trip_summary | pd.DataFrame | A combined summary of statistics for trips from all GTFS files in the MultiGtfsInstance. |
| daily_route_summary | pd.DataFrame | A combined summary of statistics for routes from all GTFS files in the MultiGtfsInstance. |
Raises
| Type | Description |
|---|---|
| TypeError | ‘path’ is not of type string or list. |
| FileNotFoundError | One (or more) of the paths passed to ‘path’ does not exist. |
| FileNotFoundError | There are no GTFS files found in the passed list of paths, or from the glob string. |
| ValueError | Path has no file extension. |
| ValueError | One (or more) of the paths passed are not of the filetype ‘.zip’. |
Returns
| Type | Description |
|---|---|
| None |
Methods
| Name | Description |
|---|---|
| clean_feeds | Clean each of the feeds in the MultiGtfsInstance. |
| ensure_populated_calendars | Check if calendar is absent and creates one from calendar_dates. |
| filter_to_bbox | Filter GTFS to a bbox. |
| filter_to_date | Filter each GTFS to date(s). |
| get_dates | Get all available dates from calendar.txt (or calendar_dates.txt). |
| is_valid | Validate each of the feeds in the MultiGtfsInstance. |
| plot_service | Create a line plot of route or trip counts over time. |
| save_feeds | Save the GtfsInstances to a directory. |
| summarise_routes | Summarise the combined GTFS data by route_id. |
| summarise_trips | Summarise the combined GTFS data by trip_id. |
| validate_empty_feeds | Ensure the feeds in MultiGtfsInstance are not empty. |
| viz_stops | Visualise all stops from all of the GTFS files. |
clean_feeds
gtfs.multi_validation.MultiGtfsInstance.clean_feeds(clean_kwargs=None)
Clean each of the feeds in the MultiGtfsInstance.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
clean_kwargs |
Union[dict, None] | The kwargs to pass to GtfsInstance.clean_feed() for each Gtfs in the MultiGtfsInstance, by default None | None |
Returns
| Type | Description |
|---|---|
| None |
ensure_populated_calendars
gtfs.multi_validation.MultiGtfsInstance.ensure_populated_calendars()
Check if calendar is absent and creates one from calendar_dates.
Shallow wrapper around GtfsInstance.ensure_populated_calendar().
Returns
| Type | Description |
|---|---|
| None |
filter_to_bbox
gtfs.multi_validation.MultiGtfsInstance.filter_to_bbox(bbox, crs='epsg:4326', delete_empty_feeds=False)
Filter GTFS to a bbox.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
bbox |
Union[list, GeoDataFrame] | The bbox to filter the GTFS to. Leave as none if the GTFS does not need to be cropped. Format - [xmin, ymin, xmax, ymax] | required |
crs |
Union[str, int] | The CRS of the given bbox, by default “epsg:4326” | 'epsg:4326' |
delete_empty_feeds |
bool | Whether or not to remove empty feeds, by default False | False |
Returns
| Type | Description |
|---|---|
| None |
filter_to_date
gtfs.multi_validation.MultiGtfsInstance.filter_to_date(dates, delete_empty_feeds=False)
Filter each GTFS to date(s).
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
dates |
Union[str, list] | The date(s) to filter the GTFS to | required |
delete_empty_feeds |
bool | Whether or not to remove empty feeds, by default False | False |
Returns
| Type | Description |
|---|---|
| None |
get_dates
gtfs.multi_validation.MultiGtfsInstance.get_dates(return_range=True)
Get all available dates from calendar.txt (or calendar_dates.txt).
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
return_range |
bool | Whether to return the raw dates, or the min/max range, by default True | True |
Returns
| Type | Description |
|---|---|
| list | Either the full set of dates, or the range that the dates span between |
is_valid
gtfs.multi_validation.MultiGtfsInstance.is_valid(validation_kwargs=None)
Validate each of the feeds in the MultiGtfsInstance.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
validation_kwargs |
Union[dict, None] | The kwargs to pass to GtfsInstance.is_valid() for each Gtfs in the MultiGtfsInstance, by default None | None |
Returns
| Type | Description |
|---|---|
| self.validity_df : pd.DataFrame | A dataframe containing the validation messages from all of the GtfsInstances. |
plot_service
gtfs.multi_validation.MultiGtfsInstance.plot_service(service_type='routes', route_type=True, width=1000, height=550, title=None, plotly_kwargs=None, rolling_average=None, line_date=None)
Create a line plot of route or trip counts over time.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
service_type |
str | Whether to plot ‘routes’ or ‘trips’. By default ‘routes’. | 'routes' |
route_type |
bool | Whether or not to draw a line for each modality, by default True | True |
width |
int | Plot width, by default 1000 | 1000 |
height |
int | Plot height, by default 550 | 550 |
title |
str | Plot title, by default None | None |
plotly_kwargs |
dict | Kwargs to pass to plotly.express.line, by default None | None |
rolling_average |
Union[int, None] | How many days to calculate the rolling average over. When left as None, rolling average is not used. The rolling average is calculated from the centre, meaning if ra=3, the average will be calculated from the current date, previous date and following date. Missing dates are imputed and treated as having values of 0. | None |
line_date |
Union[str, None] | A date to draw a dashed vertical line on. Date should be in format: YYYY-MM-DD, by default None | None |
Returns
| Type | Description |
|---|---|
| go.Figure | The timerseries plot |
save_feeds
gtfs.multi_validation.MultiGtfsInstance.save_feeds(dir, suffix='_new', file_names=None, overwrite=False)
Save the GtfsInstances to a directory.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
dir |
Union[pathlib.Path, str] | The directory to export the GTFS files into. | required |
suffix |
str | The suffix to apply to save names. The ‘file_name’ param takes priority here. | '_new' |
file_names |
list | A list of save names for the altered GTFS. The list must be the same length as the number of GTFS instances. Takes priority over the ‘suffix’ param. Names will be used in order of the instances (access using self.instances()). | None |
overwrite |
bool | Whether or not to overwrite the pre-existing saves with matching paths. | False |
Returns
| Type | Description |
|---|---|
| None |
summarise_routes
gtfs.multi_validation.MultiGtfsInstance.summarise_routes(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True, to_days=False, sort_by_route_type=False)
Summarise the combined GTFS data by route_id.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
summ_ops |
list | A list of numpy operators to gather a summary on. Accepts operators (e.g., np.min) or strings (“min”), by default [np.min, np.max, np.mean, np.median]. | [np.min, np.max, np.mean, np.median] |
return_summary |
bool | When set to False, full data for each trip on each date will be returned, by default True. | True |
to_days |
bool | Whether or not to aggregate to days, or to just return counts for trips/routes for each date. When False, summ_ops becomes useless, and should therefore nothing should be passed when calling this function (so it remains as the default), by default False. | False |
sort_by_route_type |
bool | Whether or not to sort the resulting dataframe by route_type. This only impacts the resulting df when to_days=True, by default False. | False |
Returns
| Type | Description |
|---|---|
| pd.DataFrame | A dataframe containing the summary |
summarise_trips
gtfs.multi_validation.MultiGtfsInstance.summarise_trips(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True, to_days=False, sort_by_route_type=False)
Summarise the combined GTFS data by trip_id.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
summ_ops |
list | A list of numpy operators to summarise with. Accepts operators (e.g., np.min) or strings (“min”) ,by default [np.min, np.max, np.mean, np.median] | [np.min, np.max, np.mean, np.median] |
return_summary |
bool | When set to False, full data for each trip on each date will be returned, by default True. | True |
to_days |
bool | Whether or not to aggregate to days, or to just return counts for trips/routes for each date. When False, summ_ops becomes useless, and therefore nothing should be passed when calling this function (so it remains as the default), by default False. | False |
sort_by_route_type |
bool | Whether or not to sort the resulting dataframe by route_type. This only impacts the resulting df when to_days=True, by default False. | False |
Returns
| Type | Description |
|---|---|
| pd.DataFrame | A dataframe containing the summary |
validate_empty_feeds
gtfs.multi_validation.MultiGtfsInstance.validate_empty_feeds(delete=False)
Ensure the feeds in MultiGtfsInstance are not empty.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
delete |
bool | Whether or not to delete the empty feeds, by default False | False |
Returns
| Type | Description |
|---|---|
| list | A list of feeds that are empty and their index in MultiGtfsInstance.instances |
viz_stops
gtfs.multi_validation.MultiGtfsInstance.viz_stops(path=None, return_viz=True, filtered_only=True)
Visualise all stops from all of the GTFS files.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
path |
Union[str, pathlib.Path] | The path to save the folium map to, by default None. | None |
return_viz |
bool | Whether or not to return the folium map object, by default True. | True |
filtered_only |
bool | Whether to filter the stops that are plotted to only stop_id’s that are present in the stop_times table, by default True. | True |
Returns
| Type | Description |
|---|---|
| folium.Map | A folium map with all stops plotted on it. |
| None | Returns none if ‘return_viz’ is False. |
Raises
| Type | Description |
|---|---|
| ValueError | An error is raised if both path and return_viz parameters are None as the map won’t be saved or returned. |