Data Science Campus Projects

Our Data Science Campus projects in their project life-cycle phase.

21 Complete

Projects we have completed and handed over to the stakeholder.

DSC-166 National Assembly for Wales consultation on the Children's Bill

3 - Adapt-Adopt Efficient Operations External-Other Improved Evidence NLP Python Small Social Survey Data prj

This project analyses free text responses to a consultation gathering opinions on a recent Welsh Government Bill - the Children (Abolition of Defence of Reasonable Punishment) (Wales) Bill (“the Bill”) - introduced by Julie Morgan AM, Deputy Minister for Health and Social Serv...

DSC-88 Economic impact of the UK fishing industry on local areas

2 - New campus product Admin Data Better Statistics DataViz Economics External-Gov Geospatial Open Data R Small Social Survey Data prj

Fishing activities are a key economic driver in many rural and coastal communities. This project involves bringing together public data with data from the Office for National Statistics’ Inter-departmental Business Register (IDBR) to assess the UK fishing industry.

DSC-85 UN Global Platform - Mapping the urban forest

Big Data Computer Vision Deep Learning Environment External-Other Geospatial Java Open Data Python Small prj

Following up from our recent Mapping the urban forest research, this short-term project aims to deploy our image processing pipeline on to Algorithmia - a distributed computing environment used by the UN Global Platform project.

DSC-64 Evaluating calorie intake

2 - New campus product Admin Data Better Statistics Classical ML Health ONS Open Data Small Stata Survey Data Time Series prj

This research explored novel data sources that could help improve the accuracy of official statistics on calorie consumption from food. The analysis focused on the use of biometric data to statistically re-calibrate estimates derived from national survey data.

DSC-57 Explore Shipping GPS data for rapid economic indicators

2 - New campus product Better Statistics Big Data Commercial Data Economics External-Other Geospatial Java Medium ONS PySpark RAG = GREEN Tech! Time Series prj

This project is to explore the ships tracking data (AIS) and the ship waste data (CERS), to further exploit these huge, rich datasets.

DSC-54 Automated report generation

1 month DataViz Efficient Operations External-Gov R Small prj

Creation of a pipeline for automated report generation with access to online application programming interfaces (API’s).

DSC-51 Approaches for producing granular trade statistics

2 - New campus product Better Statistics Economics ONS Python Small Survey Data prj

Monitoring the UK economy in granular detail is important for economic and monetary policy-makers. In particular, there have recently been calls for the publication of more granular statistics on the import and export of services by product and by country. This project develop...

DSC-50 Synthetic data using generative models

1 - Experimental Better Statistics Big Data Computer Vision DataViz Deep Learning Economics Efficient Operations External-Gov Health Improved Evidence Medium ONS Open Data Optimisation Python Simulation Social Synthetic Data Time Series prj

The project involves the generation of synthetic data using machine learning to replace real data for the purpose of data processing and, potentially, analysis. This is particularly useful in cases where the real data are sensitive (for example, microdata, medical records, def...

DSC-46 How green is your street?

2 - New campus product Better Statistics Big Data Computer Vision DataViz Environment Improved Evidence Medium ONS Open Data Python prj

A collaboration led by the Office for National Statistics (ONS) Visual team which uses vegetation index data produced by the Mapping the urban forest project to produce a data journalism and visualisation output. The short-term project will explore novel ways to visualise the ...

DSC-40 Improving garden green space statistics

2 - New campus product Better Statistics Big Data Commercial Data Computer Vision Deep Learning Environment Geospatial Improved Evidence Medium ONS Python prj

The Office for National Statistics (ONS) publishes a regular statistic on natural capital, including estimates of natural land or green space in the UK. Currently, these figures assume all residential garden space is green. This project will generate a more accurate estimate o...

DSC-12 Estimating housing conditions and energy efficiency

2 - New campus product Admin Data Big Data Deep Learning External-Gov Health Improved Evidence Open Data Python RAG = GREEN Small Social prj

The Welsh Government are trying to improve the evidence base they use for supporting policies in housing, energy efficiency and fuel poverty. Currently, evidence on housing conditions has relied on data from the Living in Wales Property Survey 2008 which can no longer represen...

DSC-28 Understanding characteristics of high growth firms

2 - New campus product Commercial Data Economics Efficient Operations External-Gov Medium NLP Python prj

Through this work the Campus is supporting the Data Enabled Change Accelerator (DECA) project led by the Department for Business, Energy and Industrial Strategy (BEIS), which aims to identify the characteristics of businesses with high growth potential. The Campus is explorin...

DSC-24 Classification of financial services

2 - New campus product Admin Data Better Statistics Classical ML Economics External-Other ONS Scala Small Spark Survey Data prj

This project explores whether it is possible to classify financial corporations to their detailed Standard Industry Classification 2007 (SIC2007) using data on their financial assets and liabilities, and other firm-level information. The project makes use of a number of unique...

DSC-23 Improving the ONS search engine

2 - New campus product Commercial Data Efficient Operations Medium NLP ONS Python Social prj

We investigate challenges related to the site search function of the Office for National Statistics (ONS) website and make recommendations on possible improvements. Although there is a wealth of literature on search engine optimisation (SEO), most solutions are designed for c...

DSC-21 Mapping the urban forest

2 - New campus product Big Data Computer Vision DataViz Deep Learning Environment Geospatial Medium ONS Open Data prj

In collaboration with the Office for National Statistics (ONS) Natural Capital team, we have developed an experimental computer vision method for estimating the density of trees and vegetation present at 10 metres resolution along the road network for all 112 major towns and c...

DSC-18 Categorising contents of lorries in cross-border goods

2 - New campus product Admin Data Classical ML Commercial Data Economics Efficient Operations External-Gov Improved Evidence Large NLP Python R prj

The Data Science Campus has been exploring how to process unlabelled list data that are collected manually in an uncontrolled fashion with no supplementary information to allow aggregation of data.

DSC-14 Public transport access to services

3 - Adapt-Adopt Better Statistics DataViz External-Gov Geospatial Medium ONS Open Data R Social prj

An inability to access services can have negative health and economic effects by increasing social isolation and limiting job prospects. The Data Science Campus (DSC) worked with the Welsh Government to produce a R package called propeR, which uses multimodal (private and publ...

DSC-13 Risk factors for loneliness

2 - New campus product Admin Data Better Statistics Classical ML DataViz External-Gov External-Other Geospatial Improved Evidence NLP ONS Open Data Python R Small Social Time Series prj

Determining the risk factors for loneliness across the UK with good geography. Loneliness is a perception that is hard to measure directly. Our approach is using health data as an outcome measure of loneliness and treating loneliness as a hidden variable.

DSC-11 Extracting economic signals from internet bandwidth consumption data

1 - Experimental Better Statistics Big Data Economics Medium ONS Open Data Python R Social Time Series prj

This project aims to explore if it is possible to extract economic signals and insights from publicly available internet bandwidth consumption data in a similar way that electricity demand and road traffic congestion are related to economic activity of some form.

DSC-22 Analysis of Automatic Identification System (AIS) data to understand shipping and ports

Better Statistics Big Data Commercial Data Economics External-Gov Geospatial Medium Python Scala Spark prj

The off-course project explores the operation, use and relationships between ports in the UK at a macro level and the behaviour and operational characteristics of ships at a micro level. Specifically, we explored ship travelling behaviours, traffic at ports and related factors...

1 in Dissemination

Projects in handover phase to the stakeholder.

DSC-29 Identifying emerging trends from patent data

2 - New campus product Big Data DataViz Economics Efficient Operations External-Gov Large NLP New Product Open Data Python RAG = GREEN Time Series prj

Patents and other technical literature have key terminology trends identified, which may inform business and government decisions regarding new technologies. The analysis includes when and where terminology usage occurs, considered both nationally and internationally.

4 in Delivery

Projects in delivery phase.

DSC-128 SDG 6.6.1. Surface water

2 - New campus product Better Statistics Big Data Computer Vision Deep Learning Environment External-International Geospatial Large ONS Open Data Python RAG = GREEN Time Series prj

The aim of this project is to research and develop techniques for rapid monitoring and assessment of changing extents of freshwater bodies in relation to operationalising SDG indicator 6.6.1: “Change in the extent of water-related ecosystems over time” in different country con...

DSC-107 Payments data for public good

2 - New campus product Big Data Commercial Data DSC-Policy DSC-SO1 DSC-SO2 Economics External-Gov External-Other Improved Evidence Medium Python RAG = GREEN Time Series prj

The Campus and Barclays are working together on developing payments data for public good. Payments data is one of the top 3 sought-after data sources for economic statistics. The Office for National Statistics (ONS) has seconded staff into Barclays to explore the data, and wh...

DSC-72 Data science for NICE guidance

2 - New campus product Commercial Data Deep Learning Efficient Operations External-Gov Health Large NLP Python RAG = GREEN prj

This project targets the ongoing ‘surveillance’ of guidance recommendations through the following search functionality: Given a recommendation, retrieve similar or related recommendations Given a set of keywords, retrieve related recommendations Given a set of keywords, retri...

DSC-70 Novel approaches to the Living Costs and Food Survey

2 - New campus product Better Statistics Classical ML Computer Vision Deep Learning Economics Efficient Operations External-International Large NLP Open Data Optimisation Python RAG = GREEN Survey Data prj

This project aims to explore the application of computer vision and Natural Language Processing (NLP) techniques to the Office for National Statistics (ONS) Living Costs and Food Survey (LCF). Specifically, we will produce a set of tools for automatically extracting textual da...

1 in Discovery

Projects in discovery phase (note: projects must pass discovery to go to delivery phase).

DSC-89 Georeferencing historical aerial images

2 - New campus product Computer Vision Deep Learning Environment Experimental External-Gov Geospatial Medium Open Data Python RAG = GREEN Resource! prj

This project aims to use data science techniques to automatically georeference historical aerial imagery. We are working with Welsh Government using their extensive catalogue of aerial images for Wales. Successful georeferencing of images would create a time-series of aerial i...