Intermediate Statistical Programming

Data Science Campus and Analysis Function logos.

To switch between light and dark modes, use the toggle in the top right

1 Introduction

Landing page for the Intermediate Statistical Programming Pathway training pathway

Existing experience in maintaining and writing code, need to upskill to intermediate level so they can expand existing pipelines and apply good practice principals to ensure their code is maintainable.

2 Prerequisites

If the following haven’t been completed as part of theIntroduction to Statistical Programming Pathway, then please do so before progressing:

  1. Best Practice in Programming
  2. Modular Programming

3 Reproducible Reporting

Reproducibility is an important aspect of analytical projects. There are two methods available to ONS for creating accessible, reproducible documents from R or Python code.

3.1 Rmarkdown

Rmarkdown is a package that can be installed using your installation of RSudio. It is simple to setup and works extremely well with R code. It does allow use of Python so is worth learning regardless of your choice of language.

Complete Reproducible Reporting in Rmarkdown to understand the importance of reproducibility in your work, gain experience of linting code in Python and using parameterised reports.

3.2 Quarto

Quarto is an evolution of Rmarkdown, built on similar syntax and processes, however is language agnostic due to being command line driven. This means it integrates with Python as well as it does with R. The official documentation is the best place to start: https://quarto.org/docs/get-started/

4 Editing and Imputation

Editing and imputation are both methods of data processing. Editing refers to the detection and correction of errors in the data. Imputation refers to estimating values for missing or inconsistent data items. One way in which you can correct for errors in the data is by applying imputation.

Complete one of either:

5 Version Control using the Command Line

Many projects within ONS use a version control software called Git to record changes to files and enable collaboration with colleagues. The command line interface is a powerful tool used for working with computers and is essential to getting started with Git.

Complete Command Line Basics followed by Introduction to Git to gain experience working in version control system both locally and in collaboration.

6 Unit Testing

Unit testing is crucial in guaranteeing the quality of your code and helps to increase efficiency in development. Complete Introduction to Unit Testing to gain experience designing, creating and executing tests for your code in both Python and R.

7 Continuous Integration

Complete Continuous Integration to learn more about it.

8 Object Oriented Programming in Python

Important

This course is only relevant to Python.

Object Orientated Programming is a fundamental part of Python, and one that every Python user, from those learning it for the first time, to those experienced performing complex data analysis and writing software packages, will have used - whether they know it or not.

Complete Objected Oriented Programming in Python to learn more about Python objects and classes in more detail.

9 Packaging and Documentation

Complete Packaging and Documentation to learn more about it!