Data Science Campus Projects

Our Data Science Campus projects in their project life-cycle phase.

18 Complete

Projects we have completed and handed over to the stakeholder.

DSC-166 National Assembly for Wales consultation on the Children's Bill

Efficient Operations External-Other Improved Evidence NLP Python Small Social Survey Data

This project analyses free text responses to a consultation gathering opinions on a recent Welsh Government Bill - the Children (Abolition of Defence of Reasonable Punishment) (Wales) Bill (“the Bill”) - introduced by Julie Morgan AM, Deputy Minister for Health and Social Serv...

DSC-88 Economic impact of the UK fishing industry on local areas

Admin Data Better Statistics DataViz Economics External-Gov Geospatial Open Data R Small Social Survey Data

Fishing activities are a key economic driver in many rural and coastal communities. This project involves bringing together public data with data from the Office for National Statistics’ Inter-departmental Business Register (IDBR) to assess the UK fishing industry.

DSC-85 UN Global Platform - Mapping the urban forest

Big Data Computer Vision Deep Learning Environment Geospatial Java Open Data Python Small

Following up from our recent Mapping the urban forest research, this short-term project aims to deploy our image processing pipeline on to Algorithmia - a distributed computing environment used by the UN Global Platform project.

DSC-64 Evaluating calorie intake

Admin Data Better Statistics Classical ML Health ONS Open Data Small Stata Survey Data Time Series

This research explored novel data sources that could help improve the accuracy of official statistics on calorie consumption from food. The analysis focused on the use of biometric data to statistically re-calibrate estimates derived from national survey data.

DSC-54 Automated report generation

1 month DataViz Efficient Operations External-Gov R Small

Creation of a pipeline for automated report generation with access to online application programming interfaces (API’s).

DSC-51 Approaches for producing granular trade statistics

Better Statistics Economics ONS Python Small Survey Data

Monitoring the UK economy in granular detail is important for economic and monetary policy-makers. In particular, there have recently been calls for the publication of more granular statistics on the import and export of services by product and by country. This project develop...

DSC-50 Synthetic data using generative models

Better Statistics Big Data Computer Vision DataViz Deep Learning Economics Efficient Operations External-Gov Health Improved Evidence Medium ONS Open Data Optimisation Python Simulation Social Synthetic Data Time Series

The project involves the generation of synthetic data using machine learning to replace real data for the purpose of data processing and, potentially, analysis. This is particularly useful in cases where the real data are sensitive (for example, microdata, medical records, def...

DSC-46 How green is your street?

Better Statistics Big Data Computer Vision DataViz Environment Improved Evidence Medium ONS Open Data Python

A collaboration led by the Office for National Statistics (ONS) Visual team which uses vegetation index data produced by the Mapping the urban forest project to produce a data journalism and visualisation output. The short-term project will explore novel ways to visualise the ...

DSC-40 Improving garden green space statistics

Better Statistics Big Data Commercial Data Computer Vision Deep Learning Environment Geospatial Improved Evidence Medium ONS Python

The Office for National Statistics (ONS) publishes a regular statistic on natural capital, including estimates of natural land or green space in the UK. Currently, these figures assume all residential garden space is green. This project will generate a more accurate estimate o...

DSC-23 Improving the ONS search engine

Commercial Data Efficient Operations Medium NLP ONS Python Social

We investigate challenges related to the site search function of the Office for National Statistics (ONS) website and make recommendations on possible improvements. Although there is a wealth of literature on search engine optimisation (SEO), most solutions are designed for c...

DSC-18 Categorising contents of lorries in cross-border goods

Admin Data Classical ML Commercial Data Economics Efficient Operations External-Gov Improved Evidence Large NLP Python R

The Data Science Campus has been exploring how to process unlabelled list data that are collected manually in an uncontrolled fashion with no supplementary information to allow aggregation of data.

DSC-13 Risk factors for loneliness

Admin Data Better Statistics Classical ML DataViz External-Gov External-Other Geospatial Improved Evidence NLP ONS Open Data Python R Small Social Time Series

Determining the risk factors for loneliness across the UK with good geography. Loneliness is a perception that is hard to measure directly. Our approach is using health data as an outcome measure of loneliness and treating loneliness as a hidden variable.

DSC-28 Understanding characteristics of high growth firms

Commercial Data Economics Efficient Operations External-Gov Medium NLP Python

Through this work the Campus is supporting the Data Enabled Change Accelerator (DECA) project led by the Department for Business, Energy and Industrial Strategy (BEIS), which aims to identify the characteristics of businesses with high growth potential. The Campus is explorin...

DSC-24 Classification of financial services

Admin Data Better Statistics Classical ML Economics External-Other ONS Scala Small Spark Survey Data

This project explores whether it is possible to classify financial corporations to their detailed Standard Industry Classification 2007 (SIC2007) using data on their financial assets and liabilities, and other firm-level information. The project makes use of a number of unique...

DSC-22 Analysis of Automatic Identification System (AIS) data to understand shipping and ports

Better Statistics Big Data Commercial Data Economics External-Gov Geospatial Medium Python Scala Spark

The off-course project explores the operation, use and relationships between ports in the UK at a macro level and the behaviour and operational characteristics of ships at a micro level. Specifically, we explored ship travelling behaviours, traffic at ports and related factors...

DSC-21 Mapping the urban forest

Big Data Computer Vision DataViz Deep Learning Environment Geospatial Medium Open Data

In collaboration with the Office for National Statistics (ONS) Natural Capital team, we have developed an experimental computer vision method for estimating the density of trees and vegetation present at 10 metres resolution along the road network for all 112 major towns and c...

DSC-14 Public transport access to services

Better Statistics DataViz External-Gov Geospatial Medium ONS Open Data R Social

An inability to access services can have negative health and economic effects by increasing social isolation and limiting job prospects. The Data Science Campus (DSC) worked with the Welsh Government to produce a R package called propeR, which uses multimodal (private and publ...

DSC-11 Extracting economic signals from internet bandwidth consumption data

Better Statistics Big Data Economics Medium ONS Open Data Python R Social Time Series

This project aims to explore if it is possible to extract economic signals and insights from publicly available internet bandwidth consumption data in a similar way that electricity demand and road traffic congestion are related to economic activity of some form.

4 in Dissemination

Projects in handover phase to the stakeholder.

DSC-57 Explore Shipping GPS data for rapid economic indicators

Better Statistics Big Data Commercial Data Economics External-Other Geospatial Java Medium ONS PySpark Tech! Time Series

This project is to explore the ships tracking data (AIS) and the ship waste data (CERS), to further exploit these huge, rich datasets.

DSC-29 Identifying emerging trends from patent data

Big Data DataViz Economics Efficient Operations External-Gov Large NLP Open Data Python Time Series

Patents and other technical literature have key terminology trends identified, which may inform business and government decisions regarding new technologies. The analysis includes when and where terminology usage occurs, considered both nationally and internationally.

DSC-12 Estimating housing conditions and energy efficiency

Admin Data Big Data Deep Learning External-Gov Health Improved Evidence Open Data Python Small Social

The Welsh Government are trying to improve the evidence base they use for supporting policies in housing, energy efficiency and fuel poverty. Currently, evidence on housing conditions has relied on data from the Living in Wales Property Survey 2008 which can no longer represen...

3 in Delivery

Projects in delivery phase.

DSC-107 Payments data for public good

Big Data Commercial Data DSC-Policy DSC-SO1 DSC-SO2 Economics Improved Evidence Medium Python Time Series

The Campus and Barclays are working together on developing payments data for public good. Payments data is one of the top 3 sought-after data sources for economic statistics. The Office for National Statistics (ONS) has seconded staff into Barclays to explore the data, and wh...

DSC-72 Data science for NICE guidance

Commercial Data Deep Learning Efficient Operations External-Gov Health Large NLP Python

This project targets the ongoing ‘surveillance’ of guidance recommendations through the following search functionality: Given a recommendation, retrieve similar or related recommendations Given a set of keywords, retrieve related recommendations Given a set of keywords, retri...

DSC-70 Novel approaches to the Living Costs and Food Survey

Better Statistics Classical ML Computer Vision Deep Learning Economics Efficient Operations External-International Large NLP Open Data Optimisation Python Survey Data

This project aims to explore the application of computer vision and Natural Language Processing (NLP) techniques to the Office for National Statistics (ONS) Living Costs and Food Survey (LCF). Specifically, we will produce a set of tools for automatically extracting textual da...

2 in Discovery

Projects in discovery phase (note: projects must pass discovery to go to delivery phase).

DSC-128 SDG 6.6.1. Surface water

Better Statistics Big Data Computer Vision Deep Learning Environment External-International Geospatial Large ONS Open Data Python Time Series

The aim of this project is to research and develop techniques for rapid monitoring and assessment of changing extents of freshwater bodies in relation to operationalising SDG indicator 6.6.1: “Change in the extent of water-related ecosystems over time” in different country con...

DSC-89 Georeferencing historical aerial images

Computer Vision Deep Learning Environment External-Gov Geospatial Medium Open Data Python Resource!

This project aims to use data science techniques to automatically georeference historical aerial imagery. We are working with Welsh Government using their extensive catalogue of aerial images for Wales. Successful georeferencing of images would create a time-series of aerial i...