Categorising contents of lorries in cross-border goods

The Data Science Campus has been exploring how to process unlabelled list data that is collected manually in an uncontrolled fashion with no supplementary information to allow aggregation of data.

Team members

  • Steven Hopkins
  • Gareth Clews
  • Arturas Eidukas

The need

The enabling of analysis on datasets acquired from several ferry operators


The key output is the processing of the datasets into well-structured hierarchical datasets that enable aggregation across categories for analytical understanding of trade flows. The project, on a wider scope, is aiming to open source a generalised tool for these sorts of problems that can be used by analysts to understand similar free-text variables in their own work

Data science

The unsupervised processing of free-text using current methods such as word embeddings and clustering algorithms


For processed datasets - DEFRA and indirectly the cross-Whitehall group on UK trade. For the generalised tool - the analytical community who use Python for natural language analysis

Code and outputs

Further information

Please contact for more information.


  • No updates yet.


This page has been automatically generated. Click here to download this project description as a pdf or click here to download as a word document.