# Loading in packages that we need for this chapter.
library(tidyverse) # For data manipulation
library(BSDA) # For Z-test
Chapter 3 - Statistical Tests
Learning Objectives
The goal of this session is to learn about
The structure of the statistical test
Hypothesis testing
The critical value approach
The p-value approach
Z-tests
T-tests
Chi-squared tests
1 Exercise
Load in the “tidyverse”and “BSDA” packages.
Import the
energy_data
andairquality_data
in your scripts and assign it to the variable names energy_data and airquality_data respectively.
- Load in the “tidyverse”and “BSDA” packages.
- Import the
energy_data
andairquality_data
in your scripts as assign it to the variable names energy_data and airquality_data respectively.
# Importing the energy_data (cleaned fake_energy_data)
<- readr::read_csv("../Data/energy_data.csv")
energy_data
# Importing cleaned airquality_data
<- readr::read_csv("../Data/airquality_data.csv") airquality_data
2 Structure of the statistical test
Inferential statistics is all about making decisions about the value of a particular observation or measurement. One of the most commonly used methods for making such decisions is to perform a hypothesis test. In this section, we will cover several statistical tests to aid in data inference.
There are seven steps to follow when you are using any statistical test:
- State the null hypothesis \(H_{0}\) and alternative hypothesis \(H_{a}\)
- Decide the significance level, \(\alpha\)
- Compute the observed test statistics
- Determine the critical value
- Determine the p-value
- Determine the confidence interval
- Interpret your results
3 Hypothesis testing
Any hypothesis test starts with the formulation of the null hypothesis which typically suggests that there is no statistical difference between groups and the alternative hypothesis suggests that there is a difference. Suppose we have two different medicines, the first medicine is in group A and other is group B. We denote the mean of group A medicine as \(\mu_{1}\) and mean of group B medicine is denoted as \(\mu_{2}\).
To find out if the mean of group A is different from the mean of group B we construct a two-sided test as shown below:
\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \neq \mu_{2}\]
To find out if the mean of group A is greater than the mean of group B we construct a right-tailed test (or one-sided test) as shown below:
\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\]
To find out if the mean of group A is less than the mean of group B we construct a left-tailed test (or one-sided test) as shown below:
\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \le \mu_{2}\]
After formatting your hypothesis you need to decide the significance level yourself (typically set to 5% or 1%). A significance level is a probability of rejecting the null hypothesis when it is true and it is denoted by \(\alpha\). There are two approaches to make your decision of rejecting the null hypothesis. These are:
- The critical value approach
- The p-value approach
3.1 Quiz
When do we construct a two-sided test?
- To find out if the mean of group A is greater than the mean of group B
- To find out if the mean of group A is different from the mean of group B
- To find out if the mean of group A is less than the mean of group B
Which statements show a one-sided test?
- \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \le \mu_{2}\)
- \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \neq \mu_{2}\)
- \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\)
What approaches do we take when deciding on rejecting the null hypothesis?
- Two-sided approach
- Critical value approach
- P-value approach
When do we construct a two-sided test?
- To find out if the mean of group A is different from the mean of group B
Which statements show a one-sided test?
\(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \le \mu_{2}\)
\(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\)
What approaches do we take when deciding on rejecting the null hypothesis?
3. P-value approach
3.2 Critical Value approach
By applying the critical value approach it is determined whether or not the observed test statistic is more extreme than a defined critical value - the observed test statistic (calculated based on sample data) is compared to the critical value. The following three figures show a right-tailed test, a left tailed test, and a two-sided test. The figures represent an idealised model(not all models follows a normal probability curve, this one is considered ideal and useful to explain the theory).
For a right-tailed test, the null hypothesis is rejected when the observed test statistics value is greater than the critical value.
\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\]