Statistics in Python
Welcome to the Statistics in Python course
General Information
This course introduces the basics of carrying out a statistical analysis in Python. It covers exploratory data analysis and constructing and interpreting linear and generalized linear models. Each chapter builds on the previous one, introducing progressively advanced topics while ensuring practical hands-on experience with relevant Python packages.
Software Requirements
- Python (Version 3.7 or higher)
- Anaconda
- The main packages we will be using for this course are:
- matplotlib==3.3.4
- pandas==1.1.5
- scikit-learn==0.24.2
- seaborn==0.11.1
- statsmodels==0.12.2
Course Overview
Chapter 1: Exploratory Data Analysis
In this chapter, we will delve into the principles of tidy data and explore the concepts of variables, values, and observations. Learners will use Python to analyze the structure of datasets and differentiate between continuous and categorical variables. We’ll examine the significance of variation and covariation in data and explore their roles within Exploratory Data Analysis (EDA). Visual tools will be leveraged to uncover patterns and relationships in data, enhancing our understanding of variable interactions.
Chapter 2 - Model Basics
This chapter introduces the foundations of statistical modeling. Learners will explore model families, fitted models, and the differences between response and explanatory variables. The process of constructing linear models in Python will be covered, with explanations of key components like slopes and intercepts. Learners will practice extracting parameters from model objects and interpreting tables generated by fitted models. Techniques for assessing model fit, including the use of residuals, Adjusted R-squared, and AIC for model comparison, will also be discussed.
Chapter 3 - Generalized Linear Models
Building on the basics of linear modeling, this chapter introduces generalized linear models (GLMs). Learners will explore fundamental probability concepts, including random variables and probability distributions, with a focus on their application to real-world data. Common probability distributions, such as binomial, normal, Poisson, and negative binomial, will be discussed. Finally, we will implement generalized linear models in Python, demonstrating their flexibility in handling various types of response variables.