Introduction to NLP in Python

In this weekโ€™s workshop, we will explore Natural Language Processing (NLP) techniques for coding free-text survey responses, using the Nigeria Labour Force Survey (NBS) dataset.

Lab Logo Introduction

Live Logo Live coding

Review the material that we explored in Week 6โ€™s live-coding session.

Lab Logo Lab

Exercise: Clean and Code Labour Force Survey Free Text


You will work with a simulated dataset which contains free-text occupation responses:

  1. Simulate your dataset.
  2. Clean the text:
    • Convert to lowercase.
    • Remove punctuation, numbers, and stopwords.
    • Tokenize text and calculate most frequent words.
  3. Create a simple occupation keyword dictionary that maps common terms to ISCO codes.
  4. Write a function that assigns ISCO codes based on keyword matching.
  5. Apply the function to the dataset and create a new column ISCO_code.
  6. Export the updated dataset as firstname_labour_force_coded.csv.
  7. Commit and push your work to GitHub.

Lab Logo Further reading