Editing and imputation in python
General Information
This course provides a comprehensive introduction to handling missing and inconsistent data in Python, equipping learners with practical techniques for data cleaning, editing and imputation. Learner will explore powerful Python libraries such as Pandas, Numpy and specialised package like imputena and missingno to pre-process data effectively. This course doesn’t cover the theory of any of the methods specified. The theory of these methods is covered in Introduction to Editing and Imputation
which is the prerequisite of this course. Also, this course doesn’t teach the best method to use for editing and imputation. Instead, it only shows how different methods can be used in Python.
Course Materials
The course materials come in several formats:
HTML pages such as the one you are reading now
Data we will use during the course. It’s highly recommended you create a project with a ‘data’ folder and download all the required datasets before starting the course
You can also navigate to the course Github Repository and clone or fork the website structure for yourself. If you are new to programming and version control, we recommend you remain on the website to gain the best experience.
Software Requirements
- Python (Version 3.7 or higher)
- Anaconda
- The main packages we will be using for this course are:
- matplotlib==3.3.4
- pandas==1.1.5
- missingno ==0.5.0
- matplotlib == 3.3.4
- imputena == 1.0.0
** as imputena explicitly depends on the deprecated sklearn package and hasn’t updated its depedencies set the set the environment variable SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to by pass python setup.py egg_info did not run successfully error