import os
import subprocess
import tempfile
import geopandas as gpd
from pyprojroot import here
from shapely.geometry import Polygon
from transport_performance.osm.osm_utils import filter_osm
from transport_performance.osm import validate_osm
4. OSM
transport_performance.osm
module through examples.
Introduction
Outcomes
OpenStreetMap (OSM) is a free, community-maintained source of spatial data. It contains information about the properties of street networks and has international coverage. We use these data in combination with General Transit Feed Specification data to build public transit networks for routing operations.
In this tutorial we will learn how to prepare OSM data for routing. Specifically, we will:
- Download OSM data.
- Filter the osm files to an area of interest with a bounding box.
- Examine features of the transit network.
Requirements
To complete this tutorial, you will need:
- python 3.9
- Stable internet connection
- Installed the
transport_performance
package (see the getting started explanation for help) - Install the following requirements to a virtual environment:
requirements.txt
geopandas
pyprojroot shapely
transport_performance.osm
is built on osmosis
and is tested on macos and linux only. Please follow the osmosis
guidance for installation on your operating system.
Downloading OSM
Let’s import the necessary dependencies:
We require a source of OSM data in Protocolbuffer Binary Format (PBF). We recommend using exerpts of this data hosted on Geofabrik’s Download Server. This server is provided free of charge by Geofabrik and can come under considerable demand at certain times of the day. Please use this service considerately.
Navigate the Geofabrik Download Server and download OSM data for an area of interest. Save the data and instantiate a path to the file called original_osm_path
.
= <INSERT_PATH_HERE> original_osm_path
= here("tests/data/newport-2023-06-13.osm.pbf") original_osm_path
Define the Area of Interest
To crop the OSM file, we need to get a bounding box. This could be:
- The boundary of an urban centre calculated with the
transport_performance.urban_centres
module. - Any boundary from an open service such as klokantech in csv format.
The bounding box should be in EPSG:4326 projection (longitude & latitude).
Using klokantech, define a small bounding box within the territory of the OSM file that you downloaded.
Extract the bounding box in comma separated value format. Assign to a list in xmin, ymin, xmax, ymax format. Call the list BBOX_LIST
.
= [<INSERT_VALUES_HERE>] BBOX_LIST
= [-3.002175, 51.587035, -2.994271, 51.59095] BBOX_LIST
Filtering PBF
As PBF files can be very large and contain lots of data that are irrelevant for our routing purposes, we can filter the data to the road network only. Ensure that you have osmosis
installed for this task.
Define a filtered_osm_path
object to save the filtered pbf file to.
Use the filter_osm()
function to restrict the PBF file to the extent of BBOX_LIST
. Inspect the API reference or use help(filter_osm)
for information on all available parameters.
= <INSERT_A_PATH>
filtered_osm_path
filter_osm(=original_osm_path, out_pth=filtered_osm_path, bbox=BBOX_LIST) pbf_pth
= tempfile.TemporaryDirectory()
tmp_path = os.path.join(tmp_path.name, "filtered_feed.pbf")
filtered_osm_path
filter_osm(=original_osm_path, out_pth=filtered_osm_path, bbox=BBOX_LIST) pbf_pth
Rejecting ways: buildings, waterway, landuse & natural.
Filter completed. Written to /tmp/tmpbwt6diy7/filtered_feed.pbf
Jun 25, 2024 9:54:32 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Osmosis Version 0.48.3
Jun 25, 2024 9:54:32 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Preparing pipeline.
Jun 25, 2024 9:54:32 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Launching pipeline execution.
Jun 25, 2024 9:54:32 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline executing, waiting for completion.
Jun 25, 2024 9:54:35 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Pipeline complete.
Jun 25, 2024 9:54:35 AM org.openstreetmap.osmosis.core.Osmosis run
INFO: Total execution time: 2474 milliseconds.
When using filter_osm()
, the default behaviour is to remove elements tagged as buildings, waterways, landuse, and natural since they are not required for transport routing and removing them significantly reduces file size. If this is not desired, set tag_filter=False
.
Notice that osmosis
is quite chatty and will print various exceptions originating from the Java code. If the filter operation was performed successfully, you should see INFO: Pipeline complete.
and an execution time printed to the console.
Now that we have performed the filter, we should notice a significant change in the size of the file on disk.
= subprocess.run(
orig_du "du", "-sh", original_osm_path], text=True, capture_output=True
["\t")[0]
).stdout.split(= subprocess.run(
filtered_du "du", "-sh", filtered_osm_path], text=True, capture_output=True
["\t")[0]
).stdout.split(print(f"After filtering, PBF size reduced from {orig_du} to {filtered_du}")
After filtering, PBF size reduced from 2.5M to 164K
Count OSM Features
From this point on in the tutorial, it is suggested to work with a small, filtered PBF file as the computations can be slow.
PBF data contain spatial data organised with tagged (labelled) elements. We can access these elements to explore the features stored within the file.
The first step in understanding the contents of your PBF file is to explore the tag IDs that are available.
Use the validate_osm.FindIds
class to discover the full list of IDs within the pbf file saved at filtered_osm_path
. Assign the class instance to id_finder
.
Use an appropriately named method to count the available IDs within the file.
= validate_osm.<INSERT_CLASS_NAME>(osm_pth=filtered_osm_path)
id_finder <INSERT_METHOD_NAME>() id_finder.
= validate_osm.FindIds(osm_pth=filtered_osm_path)
id_finder id_finder.count_features()
{'node_ids': 7079, 'way_ids': 891, 'relation_ids': 54, 'area_ids': 95}
You should find that there are four classes of IDs within the returned dictionary:
- Nodes
- Ways
- Relations
- Areas
For our purposes we can focus on nodes and ways. Nodes will be point locations on the travel network such as junctions or bends in the road whereas ways are collections of nodes forming a road or section of road.
If we have IDs for nodes or ways, we can visualise their locations on a map. To do this, we first need a list of IDs.
Return IDs for a Way
Using the id_finder
instance we instantiated earlier, find all of the IDs labelled as ways in the PBF file. Assign these IDs to a list called way_ids
. Print the first 5 IDs.
= id_finder.<INSERT_METHOD>()["<INSERT_CORRECT_KEY>"]
way_ids <START>:<END>] way_ids[
= id_finder.get_feature_ids()["way_ids"]
way_ids 0:5] way_ids[
[2954415, 2954417, 2954418, 2954419, 2954431]
Visualising OSM Features
Now that we have returned the coordinate data for the way, it is straight forward to visualise the points on a map.
Assign validate_osm.FindLocations
to an instance called loc_finder
. You will need to point this class to the same filtered PBF file as you used previously.
Using the way_ids
list from a previous task, pass the first 5 IDs to loc_finder.plot_ids()
in a list. Ensure that you specify that the feature_type
is "way"
.
= validate_osm.<INSERT_CLASS>(osm_pth=filtered_osm_path)
loc_finder <INSERT_METHOD>(
loc_finder.=way_ids[<START>:<END>], feature_type="<INSERT_FEATURE_TYPE>") ids
= validate_osm.FindLocations(osm_pth=filtered_osm_path)
loc_finder =way_ids[0:5], feature_type="way") loc_finder.plot_ids(ids
Visualising these features of the PBF file can help to validate features of the local transit network, particularly in areas where changes to infrastructure are ongoing. Examining the features present in relation to our bounding box, we can see that the geometries may not be neatly cropped to the extent of the bounding box. This is because filter_osm()
ensures all ways and relations are complete when cropping to a bounding box. This means roads and paths that traverse the edge of the bounding box remain whole.
Below we display every way (and their member nodes) in the PBF relative to the bounding box crop we applied (purple).
# map all available nodes
= loc_finder.plot_ids(id_finder.id_dict["way_ids"], feature_type="way")
imap # add polygon of bounding box to map
= BBOX_LIST
xmin, ymin, xmax, ymax = Polygon(((xmin,ymin), (xmin,ymax), (xmax,ymax), (xmax,ymin)))
poly = gpd.GeoDataFrame({"geometry": poly}, crs=4326, index=[0])
poly_gdf ="purple", m=imap) poly_gdf.explore(color
The filter_osm
function has reduced the file size but has also retained features outside of the crop that we specified. This is because removing a feature outside of the crop, that is referenced by a feature within the crop zone, can cause runtime errors when routing. The likelihood is that a junction within the crop zone you specified references a road (or some other feature ID) outside of your crop zone. The filter strategy we have adopted for routing is the safest approach to avoiding these issues.
To read more on osmosis
filtering strategies, refer to the completeWays
and completeRelations
flag descriptions in the Osmosis detailed usage documentation.
Note that additional metadata can be added to the map by setting include_tags=True
. Adding this rich contextual data to the map can be useful but is also computationally expensive. This operation should be avoided for large osm files, for example anything over 500 KB.
"way_ids"], feature_type="way", include_tags=True) loc_finder.plot_ids(id_finder.id_dict[
/home/runner/work/transport-network-performance/transport-network-performance/src/transport_performance/osm/validate_osm.py:560: PerformanceWarning:
PBF file is 164833 bytes. Tag operations are expensive. Consider filtering the pbf file smaller than 50000 bytes
Conclusion
Congratulations, you have successfully completed this tutorial on OpenStreetMap data.
To continue learning how to work with the transport_performance
package, it is suggested that you continue with the Analyse Network Tutorial
For any problems encountered with this tutorial or the transport_performance
package, please open an issue on our GitHub repository.