Statistics in Python

Government Analysis Function and Data Science Campus Logos.

Welcome to the Statistics in Python course

General Information

This course introduces the basics of carrying out a statistical analysis in Python. It covers exploratory data analysis and constructing and interpreting linear and generalized linear models. Each chapter builds on the previous one, introducing progressively advanced topics while ensuring practical hands-on experience with relevant Python packages.

Course Materials

The course materials come in several formats:

  • HTML pages such as the one you are reading now

  • Data we will use during the course. It’s highly recommended you create a project with a ‘data’ folder and download all the required datasets before starting the course

You can also navigate to the course Github Repository and clone or fork the website structure for yourself. If you are new to programming and version control, we recommend you remain on the website to gain the best experience.

Software Requirements

  • Python (Version 3.7 or higher)
  • Anaconda
  • The main packages we will be using for this course are:
    - matplotlib==3.3.4
    - pandas==1.1.5
    - scikit-learn==0.24.2
    - seaborn==0.11.1
    - statsmodels==0.12.2

Course Overview

Chapter 1: Exploratory Data Analysis

In this chapter, we will delve into the principles of tidy data and explore the concepts of variables, values, and observations. Learners will use Python to analyze the structure of datasets and differentiate between continuous and categorical variables. We’ll examine the significance of variation and covariation in data and explore their roles within Exploratory Data Analysis (EDA). Visual tools will be leveraged to uncover patterns and relationships in data, enhancing our understanding of variable interactions.

Chapter 2 - Model Basics

This chapter introduces the foundations of statistical modeling. Learners will explore model families, fitted models, and the differences between response and explanatory variables. The process of constructing linear models in Python will be covered, with explanations of key components like slopes and intercepts. Learners will practice extracting parameters from model objects and interpreting tables generated by fitted models. Techniques for assessing model fit, including the use of residuals, Adjusted R-squared, and AIC for model comparison, will also be discussed.

Chapter 3 - Generalized Linear Models

Building on the basics of linear modeling, this chapter introduces generalized linear models (GLMs). Learners will explore fundamental probability concepts, including random variables and probability distributions, with a focus on their application to real-world data. Common probability distributions, such as binomial, normal, Poisson, and negative binomial, will be discussed. Finally, we will implement generalized linear models in Python, demonstrating their flexibility in handling various types of response variables.