Chapter 3 - Statistical Tests

Author

Government Analysis Function and ONS Data Science Campus

Learning Objectives

The goal of this session is to learn about

  • The structure of the statistical test

  • Hypothesis testing

  • The critical value approach

  • The p-value approach

  • Z-tests

  • T-tests

  • Chi-squared tests

1 Exercise

  1. Load in the “tidyverse”and “BSDA” packages.

  2. Import the energy_data and airquality_data in your scripts and assign it to the variable names energy_data and airquality_data respectively.

  1. Load in the “tidyverse”and “BSDA” packages.
# Loading in packages that we need for this chapter.

library(tidyverse) # For data manipulation
library(BSDA)    # For Z-test
  1. Import the energy_data and airquality_data in your scripts as assign it to the variable names energy_data and airquality_data respectively.
# Importing the energy_data (cleaned fake_energy_data)

energy_data <- readr::read_csv("../Data/energy_data.csv")

# Importing cleaned airquality_data

airquality_data <- readr::read_csv("../Data/airquality_data.csv")

2 Structure of the statistical test

Inferential statistics is all about making decisions about the value of a particular observation or measurement. One of the most commonly used methods for making such decisions is to perform a hypothesis test. In this section, we will cover several statistical tests to aid in data inference.

There are seven steps to follow when you are using any statistical test:

  1. State the null hypothesis \(H_{0}\) and alternative hypothesis \(H_{a}\)
  2. Decide the significance level, \(\alpha\)
  3. Compute the observed test statistics
  4. Determine the critical value
  5. Determine the p-value
  6. Determine the confidence interval
  7. Interpret your results

3 Hypothesis testing

Any hypothesis test starts with the formulation of the null hypothesis which typically suggests that there is no statistical difference between groups and the alternative hypothesis suggests that there is a difference. Suppose we have two different medicines, the first medicine is in group A and other is group B. We denote the mean of group A medicine as \(\mu_{1}\) and mean of group B medicine is denoted as \(\mu_{2}\).

To find out if the mean of group A is different from the mean of group B we construct a two-sided test as shown below:

\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \neq \mu_{2}\]

To find out if the mean of group A is greater than the mean of group B we construct a right-tailed test (or one-sided test) as shown below:

\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\]

To find out if the mean of group A is less than the mean of group B we construct a left-tailed test (or one-sided test) as shown below:

\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \le \mu_{2}\]

After formatting your hypothesis you need to decide the significance level yourself (typically set to 5% or 1%). A significance level is a probability of rejecting the null hypothesis when it is true and it is denoted by \(\alpha\). There are two approaches to make your decision of rejecting the null hypothesis. These are:

  • The critical value approach
  • The p-value approach

3.1 Quiz

When do we construct a two-sided test?

  1. To find out if the mean of group A is greater than the mean of group B
  2. To find out if the mean of group A is different from the mean of group B
  3. To find out if the mean of group A is less than the mean of group B

Which statements show a one-sided test?

  1. \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \le \mu_{2}\)
  2. \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \neq \mu_{2}\)
  3. \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\)

What approaches do we take when deciding on rejecting the null hypothesis?

  1. Two-sided approach
  2. Critical value approach
  3. P-value approach

When do we construct a two-sided test?

  1. To find out if the mean of group A is different from the mean of group B

Which statements show a one-sided test?

  1. \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \le \mu_{2}\)

  2. \(H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\)

What approaches do we take when deciding on rejecting the null hypothesis?

       2. Critical value approach
       3. P-value approach

3.2 Critical Value approach

By applying the critical value approach it is determined whether or not the observed test statistic is more extreme than a defined critical value - the observed test statistic (calculated based on sample data) is compared to the critical value. The following three figures show a right-tailed test, a left tailed test, and a two-sided test. The figures represent an idealised model(not all models follows a normal probability curve, this one is considered ideal and useful to explain the theory).

For a right-tailed test, the null hypothesis is rejected when the observed test statistics value is greater than the critical value.

\[H_{0}: \mu_{1}= \mu_{2} ~~~~~ (vs) ~~~~~ H_{a}:\mu_{1} \ge \mu_{2}\]

Right-tailed test

Reuse

Open Government Licence 3.0 (View License)