Statistics in Python
Welcome to the Statistics in Python course
General Information
This course introduces the basics of carrying out a statistical analysis in Python. It covers exploratory data analysis and constructing and interpreting linear and generalized linear models. Each chapter builds on the previous one, introducing progressively advanced topics while ensuring practical hands-on experience with relevant Python packages.
Course Materials
The course materials come in several formats:
HTML pages such as the one you are reading now
Data we will use during the course. It’s highly recommended you create a project with a ‘data’ folder and download all the required datasets before starting the course
You can also navigate to the course Github Repository and clone or fork the website structure for yourself. If you are new to programming and version control, we recommend you remain on the website to gain the best experience.
Software Requirements
- Python (Version 3.7 or higher)
- Anaconda
- The main packages we will be using for this course are:
- matplotlib==3.3.4
- pandas==1.1.5
- scikit-learn==0.24.2
- seaborn==0.11.1
- statsmodels==0.12.2
Course Overview
Chapter 1: Exploratory Data Analysis
In this chapter, we will delve into the principles of tidy data and explore the concepts of variables, values, and observations. Learners will use Python to analyze the structure of datasets and differentiate between continuous and categorical variables. We’ll examine the significance of variation and covariation in data and explore their roles within Exploratory Data Analysis (EDA). Visual tools will be leveraged to uncover patterns and relationships in data, enhancing our understanding of variable interactions.
Chapter 2 - Model Basics
This chapter introduces the foundations of statistical modeling. Learners will explore model families, fitted models, and the differences between response and explanatory variables. The process of constructing linear models in Python will be covered, with explanations of key components like slopes and intercepts. Learners will practice extracting parameters from model objects and interpreting tables generated by fitted models. Techniques for assessing model fit, including the use of residuals, Adjusted R-squared, and AIC for model comparison, will also be discussed.
Chapter 3 - Generalized Linear Models
Building on the basics of linear modeling, this chapter introduces generalized linear models (GLMs). Learners will explore fundamental probability concepts, including random variables and probability distributions, with a focus on their application to real-world data. Common probability distributions, such as binomial, normal, Poisson, and negative binomial, will be discussed. Finally, we will implement generalized linear models in Python, demonstrating their flexibility in handling various types of response variables.