multi_validation

gtfs.multi_validation

Validating multiple GTFS at once.

Classes

Name Description
MultiGtfsInstance Create a feed instance for multiple GTFS files.

MultiGtfsInstance

gtfs.multi_validation.MultiGtfsInstance(self, path)

Create a feed instance for multiple GTFS files.

This allows for multiple GTFS files to be cleaned, validated, summarised, filtered and saved at the same time.

Parameters

Name Type Description Default
path Union[str, list, pathlib.Path] A list of paths, a singular path object, or a glob string. See more information on glob strings here: https://docs.python.org/3/library/glob.html required

Attributes

Name Type Description
paths list A list of the GTFS paths used to create the MultiGtfsInstance object.
instances list A list of GtfsInstance objects created from self.paths.
daily_trip_summary pd.DataFrame A combined summary of statistics for trips from all GTFS files in the MultiGtfsInstance.
daily_route_summary pd.DataFrame A combined summary of statistics for routes from all GTFS files in the MultiGtfsInstance.

Raises

Type Description
TypeError ‘path’ is not of type string or list.
FileNotFoundError One (or more) of the paths passed to ‘path’ does not exist.
FileNotFoundError There are no GTFS files found in the passed list of paths, or from the glob string.
ValueError Path has no file extension.
ValueError One (or more) of the paths passed are not of the filetype ‘.zip’.

Returns

Type Description
None

Methods

Name Description
clean_feeds Clean each of the feeds in the MultiGtfsInstance.
ensure_populated_calendars Check if calendar is absent and creates one from calendar_dates.
filter_to_bbox Filter GTFS to a bbox.
filter_to_date Filter each GTFS to date(s).
get_dates Get all available dates from calendar.txt (or calendar_dates.txt).
is_valid Validate each of the feeds in the MultiGtfsInstance.
plot_service Create a line plot of route or trip counts over time.
save_feeds Save the GtfsInstances to a directory.
summarise_routes Summarise the combined GTFS data by route_id.
summarise_trips Summarise the combined GTFS data by trip_id.
validate_empty_feeds Ensure the feeds in MultiGtfsInstance are not empty.
viz_stops Visualise all stops from all of the GTFS files.
clean_feeds

gtfs.multi_validation.MultiGtfsInstance.clean_feeds(clean_kwargs=None)

Clean each of the feeds in the MultiGtfsInstance.

Parameters
Name Type Description Default
clean_kwargs Union[dict, None] The kwargs to pass to GtfsInstance.clean_feed() for each Gtfs in the MultiGtfsInstance, by default None None
Returns
Type Description
None
ensure_populated_calendars

gtfs.multi_validation.MultiGtfsInstance.ensure_populated_calendars()

Check if calendar is absent and creates one from calendar_dates.

Shallow wrapper around GtfsInstance.ensure_populated_calendar().

Returns
Type Description
None
filter_to_bbox

gtfs.multi_validation.MultiGtfsInstance.filter_to_bbox(bbox, crs='epsg:4326', delete_empty_feeds=False)

Filter GTFS to a bbox.

Parameters
Name Type Description Default
bbox Union[list, GeoDataFrame] The bbox to filter the GTFS to. Leave as none if the GTFS does not need to be cropped. Format - [xmin, ymin, xmax, ymax] required
crs Union[str, int] The CRS of the given bbox, by default “epsg:4326” 'epsg:4326'
delete_empty_feeds bool Whether or not to remove empty feeds, by default False False
Returns
Type Description
None
filter_to_date

gtfs.multi_validation.MultiGtfsInstance.filter_to_date(dates, delete_empty_feeds=False)

Filter each GTFS to date(s).

Parameters
Name Type Description Default
dates Union[str, list] The date(s) to filter the GTFS to required
delete_empty_feeds bool Whether or not to remove empty feeds, by default False False
Returns
Type Description
None
get_dates

gtfs.multi_validation.MultiGtfsInstance.get_dates(return_range=True)

Get all available dates from calendar.txt (or calendar_dates.txt).

Parameters
Name Type Description Default
return_range bool Whether to return the raw dates, or the min/max range, by default True True
Returns
Type Description
list Either the full set of dates, or the range that the dates span between
is_valid

gtfs.multi_validation.MultiGtfsInstance.is_valid(validation_kwargs=None)

Validate each of the feeds in the MultiGtfsInstance.

Parameters
Name Type Description Default
validation_kwargs Union[dict, None] The kwargs to pass to GtfsInstance.is_valid() for each Gtfs in the MultiGtfsInstance, by default None None
Returns
Type Description
self.validity_df : pd.DataFrame A dataframe containing the validation messages from all of the GtfsInstances.
plot_service

gtfs.multi_validation.MultiGtfsInstance.plot_service(service_type='routes', route_type=True, width=1000, height=550, title=None, plotly_kwargs=None, rolling_average=None, line_date=None)

Create a line plot of route or trip counts over time.

Parameters
Name Type Description Default
service_type str Whether to plot ‘routes’ or ‘trips’. By default ‘routes’. 'routes'
route_type bool Whether or not to draw a line for each modality, by default True True
width int Plot width, by default 1000 1000
height int Plot height, by default 550 550
title str Plot title, by default None None
plotly_kwargs dict Kwargs to pass to plotly.express.line, by default None None
rolling_average Union[int, None] How many days to calculate the rolling average over. When left as None, rolling average is not used. The rolling average is calculated from the centre, meaning if ra=3, the average will be calculated from the current date, previous date and following date. Missing dates are imputed and treated as having values of 0. None
line_date Union[str, None] A date to draw a dashed vertical line on. Date should be in format: YYYY-MM-DD, by default None None
Returns
Type Description
go.Figure The timerseries plot
save_feeds

gtfs.multi_validation.MultiGtfsInstance.save_feeds(dir, suffix='_new', file_names=None, overwrite=False)

Save the GtfsInstances to a directory.

Parameters
Name Type Description Default
dir Union[pathlib.Path, str] The directory to export the GTFS files into. required
suffix str The suffix to apply to save names. The ‘file_name’ param takes priority here. '_new'
file_names list A list of save names for the altered GTFS. The list must be the same length as the number of GTFS instances. Takes priority over the ‘suffix’ param. Names will be used in order of the instances (access using self.instances()). None
overwrite bool Whether or not to overwrite the pre-existing saves with matching paths. False
Returns
Type Description
None
summarise_routes

gtfs.multi_validation.MultiGtfsInstance.summarise_routes(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True, to_days=False, sort_by_route_type=False)

Summarise the combined GTFS data by route_id.

Parameters
Name Type Description Default
summ_ops list A list of numpy operators to gather a summary on. Accepts operators (e.g., np.min) or strings (“min”), by default [np.min, np.max, np.mean, np.median]. [np.min, np.max, np.mean, np.median]
return_summary bool When set to False, full data for each trip on each date will be returned, by default True. True
to_days bool Whether or not to aggregate to days, or to just return counts for trips/routes for each date. When False, summ_ops becomes useless, and should therefore nothing should be passed when calling this function (so it remains as the default), by default False. False
sort_by_route_type bool Whether or not to sort the resulting dataframe by route_type. This only impacts the resulting df when to_days=True, by default False. False
Returns
Type Description
pd.DataFrame A dataframe containing the summary
summarise_trips

gtfs.multi_validation.MultiGtfsInstance.summarise_trips(summ_ops=[np.min, np.max, np.mean, np.median], return_summary=True, to_days=False, sort_by_route_type=False)

Summarise the combined GTFS data by trip_id.

Parameters
Name Type Description Default
summ_ops list A list of numpy operators to summarise with. Accepts operators (e.g., np.min) or strings (“min”) ,by default [np.min, np.max, np.mean, np.median] [np.min, np.max, np.mean, np.median]
return_summary bool When set to False, full data for each trip on each date will be returned, by default True. True
to_days bool Whether or not to aggregate to days, or to just return counts for trips/routes for each date. When False, summ_ops becomes useless, and therefore nothing should be passed when calling this function (so it remains as the default), by default False. False
sort_by_route_type bool Whether or not to sort the resulting dataframe by route_type. This only impacts the resulting df when to_days=True, by default False. False
Returns
Type Description
pd.DataFrame A dataframe containing the summary
validate_empty_feeds

gtfs.multi_validation.MultiGtfsInstance.validate_empty_feeds(delete=False)

Ensure the feeds in MultiGtfsInstance are not empty.

Parameters
Name Type Description Default
delete bool Whether or not to delete the empty feeds, by default False False
Returns
Type Description
list A list of feeds that are empty and their index in MultiGtfsInstance.instances
viz_stops

gtfs.multi_validation.MultiGtfsInstance.viz_stops(path=None, return_viz=True, filtered_only=True)

Visualise all stops from all of the GTFS files.

Parameters
Name Type Description Default
path Union[str, pathlib.Path] The path to save the folium map to, by default None. None
return_viz bool Whether or not to return the folium map object, by default True. True
filtered_only bool Whether to filter the stops that are plotted to only stop_id’s that are present in the stop_times table, by default True. True
Returns
Type Description
folium.Map A folium map with all stops plotted on it.
None Returns none if ‘return_viz’ is False.
Raises
Type Description
ValueError An error is raised if both path and return_viz parameters are None as the map won’t be saved or returned.