Editing and Imputation in R
1 Description - Editing and Imputation in R
This is a short course and aims to covers the practical application of editing and imputation in R. A similar course is available for those who prefer working in Python (Editing and Imputation in Python). This course doesn’t cover the theory of any of the methods specified. The theory of these methods are covered in Introduction to Editing and Imputation which is the prerequisite of this course.
2 Course Materials
The course materials come in several formats:
HTML pages such as the one you are reading now
Data we will use during the course. It’s highly recommended you create a project with a ‘data’ folder and download all the required datasets before starting the course
You can also navigate to the course Github Repository and clone or fork the website structure for yourself. If you are new to programming and version control, we recommend you remain on the website to gain the best experience.
3 Course Objectives
After taking this course you should be able to conduct:
The methods of reviewing and editing data such as data entry, automatic editing and error localization in R.
Model-based imputation such as mean, ratio and regression imputation in R.
Donor based imputation such as the random hot deck, sequential hot deck, hierarchical hot deck, k-nearest neighbor and predictive mean matching imputation in R.
4 Prerequisites Courses
Awareness in Editing and Imputation
Introduction to Editing and Imputation
Introduction to R R Control Flow, Loops and Functions (If you have done Introduction to R before 18/08/2021 then you don’t need to complete this course since before these two courses were not split)
5 Packages
The packages used in this course are:
tidyverse (Version 1.3.0) for data manipulation.
editrules (Version 2.9.3) for editing data.
deducorrect (Version 1.3.7) for editing data.
VIM (Version 6.1.0) for donor based imputation.
mice (Version 3.13.0) for model based imputation.
lattice (Version 0.20.41) for creating plots.
This course has been written in R Version 4.0.2