Introduction to NLP in Python
In this weekโs workshop, we will explore Natural Language Processing (NLP) techniques for coding free-text survey responses, using the Nigeria Labour Force Survey (NBS) dataset.
Introduction
Live coding
Review the material that we explored in Week 6โs live-coding session.
Lab
Exercise: Clean and Code Labour Force Survey Free Text
You will work with a simulated dataset which contains free-text occupation responses:
- Simulate your dataset.
- Clean the text:
- Convert to lowercase.
- Remove punctuation, numbers, and stopwords.
- Tokenize text and calculate most frequent words.
- Create a simple occupation keyword dictionary that maps common terms to ISCO codes.
- Write a function that assigns ISCO codes based on keyword matching.
- Apply the function to the dataset and create a new column ISCO_code.
- Export the updated dataset as
firstname_labour_force_coded.csv. - Commit and push your work to GitHub.
Further reading
- OโReilly: Bird, S. et al. (2009). Natural Language Processing with Python. OโReilly Media. https://www.oreilly.com/library/view/natural-language-processing/9780596803346/
- Nigeria National Bureau of Statistics: Labour Force Survey methodology documentation. https://microdata.nigerianstat.gov.ng/index.php/catalog/152
- ISCO Official Classification: International Standard Classification of Occupations. https://ilostat.ilo.org/methods/concepts-and-definitions/classification-occupation/