optimus

A text processing pipeline for turning unstructured text data into hierarchical datasets.

The Data Science Campus has been exploring how to process unlabelled list data that is collected manually in an uncontrolled fashion with no supplementary information to allow aggregation of data. Please note that this project is intended to work on short descriptions, of no more than around 10 words. For longer text descriptions you may need to fork the repository and optimise some of the metrics.

Make sense of unlabelled list data.

Turn unlabelled list data into categories to better understand clusters.

Improve analytical efficiency.

Unstructered data usually requires a significant amount of manual processing which can be impractical for large datasets.

Customise to your needs.

Optimus can be customised using a configuration file allowing the user to be in control of the process.

Data Science Campus

GitHub Twitter GitHub Pages template adapted from Facebook

Contribute to this project on GitHub