The project involves the generation of synthetic data using machine learning in order to replace real data for the purpose of data processing and, potentially, analysis. This is particularly useful in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, the methods developed as part of the project may be used for imputation.
- Ioannis Kaloskampis
- Chaitanya Joshi
- David Pugh
- Lanthao Benedikt
- Alex Noyvirt
- Louisa Nolan
In our digital world, data are produced at an exponential rate. Various organisations such as government departments, banks, retailers etc. would like to exploit the so called ‘big data’ to build statistical models in order to make accurate decisions and predict a number of important measures, such as the inflation rate and exchange rates. However, the raw data are often sensitive. In this project, we propose methods which generate synthetic data to replace the raw data for the purposes of processing and analysis.
The project will result in a safer, easier and faster way to share data between ONS and the research communities in cases where the real data is sensitive. Additionally, It will make sharing data between the research communities and ONS easier and faster. Furthermore, the project is linked to several current ONS Data Science projects (Trade, Housing, etc.).
We investigate several state-of-the-art algorithms which are used to generate synthetic data such as generative adversarial networks (GANs), variational autoencoders (VAE) and autoregressive models. Additionally, since the project involves big data, we are particularly interested in the efficient implementation of the synthetic data generation algorithms using graphics processing units (GPUs).
- ONS Methodology
- ONS Trade team
- United Nations global platform.
Please contact email@example.com for more information.
- No updates yet.