A Google street-view image processing pipeline

As part of our urban forest project we have implemented an image processing pipeline which downloads and processes images from the Google Street View API. These images are then used to detect the amount of vegetation and trees present along points on a road network.

In this post, I hope to give an overview of the main components of the code and some of the problems it aims to address.

Sampling the road network

We aim to build a complete dataset (map) of vegetation present along the road network. To achieve this, our first objective is to obtain image samples from the Google Streetview API. The API, amongst other parameters, takes as input a (latitude, longitude) point along with a heading corresponding to the direction of the camera’s field of view. As such, it is necessary to calculate the desired points before calling the API.

To enumerate the road network and generate these sample points, we have made use of road network waypoints in the Open Street Map project (herein referred to as OSM). This data, in addition to describing road features such as number of lanes, presence of street lighting, surface type, speed limits etc, provides a set of points describing the entire road network which can be used for routing purposes of which a number of open source projects depend, most notably, OpenTripPlanner.

Streetview images are sampled for the left and right hand side of the road at 10 metre intervals. Satellite imagery copyright Google.

To calculate the desired sample points, for each road, we create a line-string using a list of points obtained from OSM. Then, at the starting point of each road, we proceed 10 metres forward using the current bearing defined as the angle between the last known point and the next. We continue this way, logging the 10 metre interval points until the road is exhausted. In addition to calculating the 10 metre sample points, we also calculate the left and right hand side view angle from the sample point with respect to the sample point headings. Having completed this procedure, we end up with a dataset with the following form:

road, sequence, latitude, longitude, left_heading, right_heading

Where sequence is the sample order along the road, {left,right}_heading corresponds to the heading, offset by +- 90 degrees with respect to the bearing of the road sample point. We later use these pre-computed sample points to query the Google streetview API for images.

The scalability problem

The Department for Transport (DfT) estimate the road network to be 397,025km in length (2017). If we pre-compute sample points as described previously, this would result in ~40 million points, of which we would then need to obtain left and right hand side images, yielding a total of ~80 million sample points. While it would be easy to download and store 80 million images (~3TB @40k each), the (non-premium) Google streetview API restricts developers to a quota of 25,000 images per day. Consequently, if we serially download and process images, it would take in excess of 8 years 9 months to complete.

We restrict our sampling to road network within major towns and cities

Our research focuses specifically on vegetation in urban environments. Subsequently, we have restricted our sample points to the 112 major towns and cities in England and Wales as defined by ONS Geography, resulting in a significant reduction to ~17 million sample images which would take 1 year, 10 months to process under the standard quota which is a significant reduction in sampling time.

City Roads Points
London 85153 1765318
Birmingham 15763 316733
Bristol 9643 173484
Leeds 9362 189728
Sheffield 9197 196118
Liverpool 9009 204267
Manchester 8323 165443
Norwich 7019 74156
Reading 5951 88079
Nottingham 5885 101326
Total 418941 8315247

Table: Top 10 cities (in England & Wales) by number of roads.

While 17 million data points is a significant reduction, it is still expected to take some time to download and process all images. In the meantime, we still wish to provide a complete dataset.

Having downloaded an image, we derive a percentage score for each image, indicative of the total percent vegetation present in the scene in terms of pixel coverage. If we assume that there exists some spatial correlation in which vegetation density tends to cluster (for example groups of trees), then we can interpolate missing data from data we have already sampled.

In the case of a single road, delimited by 7 points, 10 metres apart, [a, b, c, d, e, f, g], in lieu of any additional information, if we know the percent vegetation for points a and g, we might then predict point d to be 0.5(a+g). If we assume that there exists some spatial correlation between the points, then we might expect this relationship to deteriorate with increasing distance. In this case, missing points can be interpolated using inverse distance weighting.

Interpolating missing points leads to an interesting problem. If we are restricted to sampling one point at a time for a given road, [a, .., g] in this case, which point should we sample first? More generically, in what order should we sample the points? It would make sense to first sample the mid-point d, since this would yield the maximum information gain with respect to minimising the interpolation error: we assume a prediction’s quality to be a decaying function of distance. Following this scheme, we look to “fill in the gaps”, in a way that results in the most accurate set of interpolated points after each sample step. To generalise, given as input a road consisting of a fixed number of sample points of length N, output a schedule of length N describing the order in which to enumerate the road.

As such, this method assigns a sampling priority to each sample point so that the road is enumerated in the desired order. The code above illustrates the algorithm to output this order. It is based on the insight that a road can be folded a number of times until the distance between each fold is equal to 10 metres. Each fold corresponds to the sampling priority. This concept is recursive, and is in fact equivalent to traversing a binary search tree in breadth first order. To illustrate, our schedule will enumerate the [a, b, c, d, e, f, g] road used in this example in this order: [d, b, f, a, c, e, g]. After vising each node, unvisited sample points will be interpolated according to the inverse weighted distance with respect to their nearest already visited sample points.

We take the spatial correlation assumption one step further and apply this schedule independently to both sides of a road: we assume that there exists a stronger relationship between sample points on the same side of the road. This is especially apparent for long stretches of hedge-row and on roads which delimit areas of woodland.

There exist many alternative schemes to obtain a sample point enumeration schedule, some more sophisticated than others. It would be desirable, for instance, to visit sample points which yield the maximum expected information gain, much the same way that one would focus on more complicated regions of a puzzle, reserving the uniform sky region for later. In any case, the desired effect is that our dataset will improve in resolution over time and that at each time step, we hope to achieve the maximum possible resolution.

Image processing pipeline

Having a sampling schedule requires a form of processing pipeline, so that it is possible to keep track of the various processing stages of images downloaded from the Google street view API. At the core of processing pipeline is a MySQL database containing the following attributes:

Attribute Meaning
id A unique ID assigned to each sample point
city One of the 112 major towns and cities the sample belongs to
road_name Name of the road the sample is on
way_id Open Street Map road network way id the sample belongs to
sequence Order of this sample w.r.t the point along the way id/road
bearing Heading in degrees of this point w.r.t previous and next point
geom Coordinates of this sample, stored as a WKT POINT(lng lat)
order Order in which this sample should be taken
priority Sampling priority for this points
ts Timestamp for when this point was sampled or interpolated
cam_dir Sample direction - currently looking left or right w.r.t bearing
predicted True if this point has been interpolated/predicted, False if an actual reading has been made
value The vegetation percentage value for this sample point

Table: Sample point database.

Each row in the database corresponds to a processed image from which a percentage vegetation has been derived. Initially, the database is pre-populated with a sampling backlog, where each row is assigned a sampling priority. The sampling database has been de-coupled from the image processing pipeline by placing it behind an API layer in which Create Read Update Delete (CRUD) operations have been defined in the form of a RESTful web service. This web service exposes a number of (remotely accessible) endpoints which consume and produce JSON and GeoJSON data.

Data flow in our image processing pipeline

The above figure illustrates the end-to-end flow of data in the image processing pipeline with respect to interactions with the sample point database. The processing pipeline operates on a distributed architecture. Specifically, we have implemented a form of Message Oriented Middleware (MOM), which makes use of different levels of (network based) FIFO queues to isolate processing components into a loosely-coupled, scalable system allowing for concurrent, fault-tolerant processing. The use of queues allows for maximum flexibility in terms of controlling processing throughput and on-demand process scaling,

Processing a single image requires a number of steps, which are enumerated here in computation order, as to demonstrate the workings of the (operational) prototype.

1. Image download job scheduler

A script is scheduled once per day to request a number of “jobs” from the image processing API (IP-API). A job consists of the following (JSON) form:

  "id": 123,
  "city": "Cardiff",
  "road_name": "Clydesmuir Road",
  "osm_way_id": "24005721",
  "sequence": "000057",
  "latitude": 51.491386,
  "longitude": -3.141202,
  "bearing": 46.48,
  "cam_dir": "left"

Requested jobs are ordered according to a priority scheme using the sampling priority described previously. Each day, the script will request a batch of (25,000) jobs from the API, and then place these jobs onto a download queue. The download queue acts as in intermediate message bus between the IP-API and the other (distributed) components in the pipeline. As such, the role of the scheduled script at this stage is simply to shovel jobs from the IP-API onto the download queue. Note that the IP-API, shovel script and download queue are distinct, de-coupled components, each of which can be suspended, updated or indeed, fail, without compromising the integrity (and state) of the rest of the system.

2. Image download agents

A set of download agents continuously monitor, or optionally drain in a one -off operation, the download queue. The download queue and agents are separate processes which can be deployed in a distributed nature over a network. The role of a download agent is to consume a job from the queue, which contains all of the parameters needed to query the Google streetview API: Namely, the latitude, longitude and bearing. Having a number of download agents allows the system to download images in parallel, significantly reducing download time. Having downloaded a streetview image, the agent deletes the job from the download queue, stores the respective image to disk, including the job meta data, and then pushes a new job on to a secondary image processing queue. Processes at this stage are thus consumers and producers of jobs: their role is simply to carry out the task of downloading a single image and then to wait (indefinitely) for more work to do.

3. Image processing agents

A set of image processing agents monitor the image processing queue which is populated by stage (2). jobs at this stage correspond to images which have been downloaded from Google streetview and are now pending processing. Having consumed a job from the image processing queue, an agent will apply a vegetation detection method to the image to derive the percentage vegetation and then push the result to our IP-API which will in turn update the sampling point database with the new measurement.

The vegetation detection method can be a computationally demanding process. The current deployment makes use of a single dual-GPU machine on which 2 image processing agents consume and detect vegetation concurrently on each GPU. The deployment can be scaled to much greater load by adding more agents which may be distributed over a number of machines, effectively forming an image processing cluster. In addition, image processing agents can be taken offline, to update, or even replace the vegetation detection method without disrupting the rest of the pipeline: Image downloading may continue regardless, and image processing agents may be suspended one at a time for update, allowing the pipeline to be modified on-the-fly.

4. Vegetation data export

At any point in time, a human may query the IP-API to export vegetation data for a specific city via a HTTP endpoint. Before the point of export, all pending sample points in the database are interpolated using the inverse distance weighting scheme. Currently, the IP-API can export road-side percent vegetation data for a city in GeoJSON and CSV format, where each GeoJSON feature or CSV row consists of (amongst other attributes) latitude, longitude, side, % vegetation, where side = left or right hand side of the road.

Visualisation of Cardiff's street level vegetation.

The above visualisation shows a visualisation of vegetation density exported for the city of Cardiff. It is interesting to observe the prevalence of vegetation in the north of the city and the near complete absence of vegetation in the centre and south which we indicate in red. This is due to the presence of terraced housing and lack of front garden space in these areas.

Visualisation of Cardiff's building density

In addition to capturing percentage vegetation along the road network, we have also captured a number of other features such as parked cars and building density. The above visualisation shows building density which has been calculated using the total percent building facade present in each of the respective streetview images. It is interesting to note how building density tends to have a negative relationship with vegetation (in this example) and also how the high density areas form clusters which in fact correspond to wards belonging to the historic centre of the city.

We will be publishing a technical report very soon detailing the image segmentation method used in this pipeline. Thanks for reading!

Phil Stubbings


Data scientist at ONS Data Science Campus. Contact: philip.stubbings AT ons.gov.uk