The Data Science Campus has been exploring how to process unlabelled list data that is collected manually in an uncontrolled fashion with no supplementary information to allow aggregation of data.
- Steven Hopkins
- Gareth Clews
- Arturas Eidukas
The enabling of analysis on datasets acquired from several ferry operators
The key output is the processing of the datasets into well-structured hierarchical datasets that enable aggregation across categories for analytical understanding of trade flows. The project, on a wider scope, is aiming to open source a generalised tool for these sorts of problems that can be used by analysts to understand similar free-text variables in their own work
The unsupervised processing of free-text using current methods such as word embeddings and clustering algorithms
For processed datasets - DEFRA and indirectly the cross-Whitehall group on UK trade. For the generalised tool - the analytical community who use Python for natural language analysis
Code and outputs
- Optimus - Github repository
- o p t i m u s – turning free-text lists into hierarchical datasets
Please contact firstname.lastname@example.org for more information.
- No updates yet.